我在早期的项目上应用的Cassandra Database的数据已经有几个GB了。由于之前一直没有留意,用nodetool 工具查看ring状发现负载很不平衡,多的50%,少的3%。
查了下cassandra wiki,http://wiki.apache.org/cassandra/Operations#Load_balancing 有一段话:
If you add nodes to your cluster your ring will be unbalanced and only way to get perfect balance is to compute new tokens for every node and assign them to each node manually by using nodetool move command.
Here's a python program which can be used to calculate new tokens for the nodes. There's more info on the subject at Ben Black's presentation at Cassandra Summit 2010. http://www.datastax.com/blog/slides-and-videos-cassandra-summit-2010
def tokens(nodes):
for x in xrange(nodes):
print 2 ** 127 / nodes * x
In versions of Cassandra 0.7.* and lower, there's also nodetool loadbalance: essentially a convenience over decommission + bootstrap, only instead of telling the target node where to move on the ring it will choose its location based on the same heuristic as Token selection on bootstrap. You should not use this as it doesn't rebalance the entire ring.
The status of move and balancing operations can be monitored using nodetool with the netstat argument. (Cassandra 0.6.* and lower use the streams argument).
原来负载有一个 生成token的规则,以上是python的语法
参考:http://www.datastax.com/docs/1.0/initialize/token_generation#token-gen-cassandra
建立一个python文件,内容如下:
#! /usr/bin/python
import sys
if (len(sys.argv) > 1):
num=int(sys.argv[1])
else:
num=int(raw_input("How many nodes are in your cluster? "))
for i in range(0, num):
print 'node %d: %d' % (i, (i*(2**127)/num))
用于生成一个集群的tokens。
逐台机用nodetool工具 move 操作改了tokens之后,cassandra 的负载就平衡了。
$>cassandra/bin/nodetool -h 127.0.0.1 -p 8080 move <new token>
$>cassandra/bin/nodetool -h 127.0.0.1 -p 8080 ring