Zookeeper的一次迁移故障

前阶段同事迁移Zookeeper(是给Kafka使用的以及flume使用)后发现所有Flume-producer/consumer端集体报错:

1
2
3
4
07  Jan  2014  01 : 19 : 32 , 571  INFO  [conf-file-poller- 0 -SendThread(xxx: 2181 )] (org.apache.zookeeper.ClientCnxn$SendThread.startConnect: 1058 )  - Opening socket connection to server xxx: 2181
07  Jan  2014  01 : 19 : 32 , 572  INFO  [conf-file-poller- 0 -SendThread(xxx: 2181 )] (org.apache.zookeeper.ClientCnxn$SendThread.primeConnection: 947 )  - Socket connection established to xxx: 2181 , initiating session
07  Jan  2014  01 : 19 : 32 , 573  INFO  [conf-file-poller- 0 -SendThread(xxx: 2181 )] (org.apache.zookeeper.ClientCnxn$SendThread.run: 1183 )  - Unable to read additional data from server sessionid  0x142f42b91871911 , likely server has closed socket, closing socket connection and attempting reconnect
07  Jan  2014  01 : 19 : 32 , 845  INFO  [conf-file-poller- 0 -SendThread(xxx: 2181 )] (org.apache.zookeeper.ClientCnxn$SendThread.startConnect: 1058 )  - Opening socket connection to server xxx: 2181

一直在不断的重试连接失败再重试,问同事说:网路连通性早就验证过,然后查看server端日志发现:

1
2
3
4
5
6
7
8
2014 - 01 - 06  23 : 59 : 59 , 987  [myid: 1 ] - INFO  [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :NIOServerCnxnFactory @197 ] - Accepted socket connection from /xxx: 45282
2014 - 01 - 06  23 : 59 : 59 , 987  [myid: 1 ] - WARN  [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :ZooKeeperServer @793 ] - Connection request from old client xxx: 45282 ; will
be dropped  if  server is in r-o mode
2014 - 01 - 06  23 : 59 : 59 , 987  [myid: 1 ] - INFO  [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :ZooKeeperServer @812 ] - Refusing session request  for  client xxx: 45282  as it
has seen zxid  0x60fd15564  our last zxid is  0x10000000f  client must  try  another server
2014 - 01 - 06  23 : 59 : 59 , 987  [myid: 1 ] - INFO  [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :NIOServerCnxn @1001 ] - Closed socket connection  for  client xxx: 45282  (no se
ssion established  for  client)
2014 - 01 - 06  23 : 59 : 59 , 989  [myid: 1 ] - INFO  [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :NIOServerCnxnFactory @197 ] - Accepted socket connection from xxx: 45285

发现Flume还是保留原来的zxid,但是现在的zxid竟然是0,所以抛出异常!

1
2
3
4
5
6
7
8
9
10
11
if  (connReq.getLastZxidSeen() > zkDb.dataTree.lastProcessedZxid) {
             String msg =  "Refusing session request for client "
                 + cnxn.getRemoteSocketAddress()
                 " as it has seen zxid 0x"
                 + Long.toHexString(connReq.getLastZxidSeen())
                 " our last zxid is 0x"
                 + Long.toHexString(getZKDatabase().getDataTreeLastProcessedZxid())
                 " client must try another server" ;
             LOG.info(msg);
             throw  new  CloseRequestException(msg);
         }

   后来问同事是怎么做的迁移:先启动一套新的集群,然后关闭老的集群,同时在老集群的一个IP:2181起了一个haproxy代理新集群以为这样,可以做到透明迁移=。=,其实是触发了ZK的bug-832导致不停的重试连接,只有重启flume才可以解决

   正确的迁移方式是,把新集群加入老集群,然后修改Flume配置等一段时间(flume自动reconfig)后再关闭老集群就不会触发这个问题了.

本文出自 “MIKE老毕的海贼船” 博客,请务必保留此出处http://boylook.blog.51cto.com/7934327/1365364

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值