我正在测试通过4节点集群旋转,以循环方式添加和删除节点,以便集群成员遵循以下重复序列
1 2 3 2 3 2 3 4 3 4 1 3 4 1 4 1 2 4 1 2 1 2 3 2 3 2 3 4 3 4 1 3 4 1 4 ...
通过停止cassandra,擦除/ var / lib / cassandra / *并重新启动cassandra(使用相同的cassandra.yaml文件,其中将节点1和2列为种子)来执行节点添加.通过停止cassandra然后从另一个节点发出nodetool removenode $nodeId来执行节点删除.在所有情况下,下一个操作直到前一个操作完成才开始.
上述节点成员序列重复多次,直到大约4次迭代后,我正在执行“添加节点”操作以从节点集群{1,2}转换到节点集群{1,2,3}.在此迭代中,我的自定义键空间无法传播到节点3. Nodetool状态看起来很好:
$nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.12.206 164.88 KB 256 66.2% 7018ef8a-af08-40e9-b3d3-065f4ba6eb0d rack1 UN 192.168.12.207 60.85 KB 256 63.2% ff18b636-6287-4c70-bf23-0a1a1814b864 rack1 UN 192.168.12.205 217.19 KB 256 70.6% 2bc38fa8-42a1-457f-84d7-35b3b46e1daa rack1
但是节点3上的cqlsh不知道我的密钥空间.我尝试运行nodetool修复,它似乎无限循环,同时在日志中喷出以下几个堆栈:
WARN [Thread-9781] 2014-09-16 19:34:30,081 IncomingTcpConnection.java (line 83) UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=08768b1d-97a1-3528-8191-9acee7b08ef4 at org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:178) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:103) at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:145) at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:134) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153) at org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) ERROR [Thread-9782] 2014-09-16 19:34:31,484 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-9782,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153) at org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
任何想法发生了什么以及如何解决这个问题(理想情况下,一个可靠的工作维修和避免首先进入这种状态的方法)?
如果存在模式版本不一致,则可以通过运行nodetool describecluster来判断
如果您在一个节点中看到不同的版本,请执行以下具有错误版本的节点:
停止Cassandra服务/进程,通常运行:nodetool drain
sudo服务cassandra停止或杀死< pid>.
在此过程结束时,提交日志目录(/ var / lib / cassandra / commitlog)应仅包含一个小文件.
删除系统密钥空间内的Schema *和Migration * sstables(如果使用默认值,则删除/ var / lib / cassandra / data / system).
再次启动Cassandra后,此节点将注意到缺少的信息,并从其他节点之一引入正确的模式.在版本1.0.X中,在模式一次应用一个突变之前.在应用它时,节点可能会记录无法找到列族的消息,例如下面的消息.这些消息可以忽略.
ERROR [MutationStage:1] 2012-05-18 16:23:15,664 RowMutationVerbHandler.java (line 61) Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1012 To confirm everything is on the same schema, verify that 'describe cluster;' only returns one schema version.