ZooKeeper - ‘Len error‘ 问题排查

 

公司的公共ZooKeeper集群(公共集群为多个业务提供服务,后续用zk代指ZooKeeper)在配置变更重启之后,某一个业务一直报zk相关的错误,直到业务方进程重启才恢复。使用的版本如下:

  • Server: Zookeeper 3.4.5
  • Client: Zookeeper 3.4.6, Curator 2.8.0

从Server上看到的报错日志诸如(以下日志为线下重现时的日志):

2016-11-28 23:35:24,440 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:50296
2016-11-28 23:35:24,441 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x158ab91e8910000 at /127.0.0.1:50296
2016-11-28 23:35:24,442 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x158ab91e8910000 with negotiated timeout 15000 for client /127.0.0.1:50296
2016-11-28 23:35:24,454 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x158ab91e8910000 due to java.io.IOException: Len error 1073113
2016-11-28 23:35:24,455 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:50296 which had sessionid 0x158ab91e8910000

先说结论:客户端升级到3.4.8修复。再说问题触发的原因:服务端重启后,客户端在重连时发送的setWatches请求包太大,超过了Server的jute.maxbuffer(默认为1M),Server报错Len error,然后关闭链接;客户端继续重连,如此往复。这个Bug对应的Jira链接为ZOOKEEPER-706

排查过程

正确的姿势

1. 根据报错查找对应的源码

想要的信息在NIOServerCnxn.java中:

ack 'Len error'
src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java
392:                            throw new IOException("Len error " + len);
src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java
540:            throw new IOException("Len error " + len);
vim src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java +540
539         if (len < 0 || len > BinaryInputArchive.maxBuffer) {
540             throw new IOException("Len error " + len);
541         }

继续dig maxBuffer:

ack maxBuffer
src/java/main/org/apache/jute/BinaryInputArchive.java
87:    static public final int maxBuffer = Integer.getInteger("jute.maxbuffer", 0xfffff);
122:    // Since this is a rough sanity check, add some padding to maxBuffer to
126:        if (len < 0 || len > maxBuffer + 1024) {
src/java/main/org/apache/zookeeper/ClientCnxn.java
107:     * jute.maxBuffer value. To avoid this we instead split the watch
110:     * with respect to the server's 1MB default for jute.maxBuffer.

可以看到maxBuffer和jute.maxbuffer这个环境变量相关,默认值是1048575,那么现在可以确认是客户端发出来的包太大,那么为什么会这样大呢?如果你足够幸运的话,会发现ack maxBuffer的第二个匹配,内容为:

104     /* ZOOKEEPER-706: If a session has a large number of watches set then
105      * attempting to re-establish those watches after a connection loss may
106      * fail due to the SetWatches request exceeding the server's configured
107      * jute.maxBuffer value. To avoid this we instead split the watch
108      * re-establishement across multiple SetWatches calls. This constant
109      * controls the size of each call. It is set to 128kB to be conservative
110      * with respect to the server's 1MB default for jute.maxBuffer.
111      */
112     private static final int SET_WATCHES_MAX_LENGTH = 128 * 1024;

这时候,如果足够幸运去Jira ZOOKEEPER-706看一看描述,会发现报错完全一致,剩下的就是重现、确认修复版本了。

2. 重现问题

触发条件里面比较重要的一点是:client注册的watcher比较多,所有watch的path长度超过1M。所以先创建一个Path,在这个Path下创建1024个child,每个child的name为1024字节。

private void prepareData(CuratorFramework curatorClient) throws Exception {
    int childNum = 1024;
    char[] chars = new char[1024];
    Arrays.fill(chars, 'x');
    String nameSuffix = new String(chars);
    curatorClient.create().forPath(PATH);
    for (int i = 0; i < childNum; i++) {
        String childPath = PATH + "/" + i + "-" + nameSuffix;
        curatorClient.create().forPath(childPath);
    }
}

准备好数据之后,添加watcher:

private void startWatch(CuratorFramework curatorClient) throws Exception {
    PathChildrenCache watcher = new PathChildrenCache(curatorClient, PATH, true);
    watcher.getListenable().addListener((client, event) -> {
        ChildData data = event.getData();
        if (data == null) {
            log.info("No data in event[" + event + "]");
        } else {
            log.info("Receive event: "
                    + "type=[" + event.getType() + "]"
                    + ", path=[" + data.getPath() + "]"
                    + ", data=[" + new String(data.getData()) + "]"
                    + ", stat=[" + data.getStat().toString().trim() + "]");
        }
    });
    watcher.start(PathChildrenCache.StartMode.BUILD_INITIAL_CACHE);
}

剩下的就是重启ZooKeeper Server,然后就可以看到熟悉的错误源源不断地打印出来了。

3. 确认修复版本

ZOOKEEPER-706里面看到修复的版本为3.4.7,对应的再看看ZooKeeper的ReleaseNote,3.4.7之后的Release版本是3.4.8。

4. 验证问题

业务方帮忙准备了两台预发的机器,A节点升级zk到3.4.8,B节点仍然使用3.4.6. 在两台机器上分别用iptables模拟zk重启:

sudo /sbin/iptables -A OUTPUT -d zkserver -j REJECT   
sleep 10
sudo /sbin/iptables -D OUTPUT -d zkserver -j REJECT

A节点重连后恢复,B节点持续报错,验证成功。

错误的姿势

如果使用上述正确的姿势,整个排查过程包括复现或许半小时就搞定了;然而LZ解决这个问题耗时良久(良久还只是个遮羞布式的用词),现在说说实现发生的错误的姿势。其实经历过的很多问题排查从事后看都走了或多或少的弯路,重要的是需要尽量多地使用工具尽快地把自己从弯路上掰过来。

1. 怀疑业务方代码问题

服务端是一个公共集群,服务于多个业务,重启之后只有一个业务报错,其它业务都没有问题,所以怀疑业务方代码有问题。从事后来看,这个是因为出错的业务方watch的path太多导致的,和数据有关系,而不是代码的原因。当然,看了许久,业务代码是没问题的。

2. 怀疑业务方使用的版本问题

还是之前的思路,代码没问题就怀疑是版本的问题,找了一个很牵强的点,Curator 2.8.0里面指定依赖的是ZooKeeper 3.4.5,而业务方使用的是3.4.6。于是试图沿着这个方向去重现。结果显而易见,无法重现问题,卡壳了。

3. tcpdump抓包分析问题

由于后续又重启了Server导致问题在线上复现,有几台预发的业务机器未重启,提供了错误现场。使用tcpdump抓包分析后,可以看到

1:14:33:37.331726 IP client-ip.58029 > server-ip.eforward: Flags [S], seq 844611479, win 14600, options [mss 1460,sackOK,TS val 1820004927 ecr 0], length 0
6:14:33:37.331748 IP server-ip.eforward > client-ip.58029: Flags [S.], seq 30091353, ack 844611480, win 14480, options [mss 1460,sackOK,TS val 3857636803 ecr 1820004927], length 0
11:14:33:37.332833 IP client-ip.58029 > server-ip.eforward: Flags [.], ack 1, win 14600, options [nop,nop,TS val 1820004928 ecr 3857636803], length 0
16:14:33:37.333766 IP client-ip.58029 > server-ip.eforward: Flags [P.], seq 1:50, ack 1, win 14600, options [nop,nop,TS val 1820004929 ecr 3857636803], length 49
24:14:33:37.333789 IP server-ip.eforward > client-ip.58029: Flags [.], ack 50, win 14480, options [nop,nop,TS val 3857636805 ecr 1820004929], length 0
29:14:33:37.334129 IP server-ip.eforward > client-ip.58029: Flags [P.], seq 1:42, ack 50, win 14480, options [nop,nop,TS val 3857636805 ecr 1820004929], length 41
36:14:33:37.335128 IP client-ip.58029 > server-ip.eforward: Flags [.], ack 42, win 14600, options [nop,nop,TS val 1820004930 ecr 3857636805], length 0
41:14:33:37.351116 IP client-ip.58029 > server-ip.eforward: Flags [P.], seq 50:14530, ack 42, win 14600, options [nop,nop,TS val 1820004946 ecr 3857636805], length 14480
951:14:33:37.351138 IP server-ip.eforward > client-ip.58029: Flags [.], ack 14530, win 17376, options [nop,nop,TS val 3857636822 ecr 1820004946], length 0
956:14:33:37.351586 IP server-ip.eforward > client-ip.58029: Flags [F.], seq 42, ack 14530, win 17376, options [nop,nop,TS val 3857636823 ecr 1820004946], length 0
961:14:33:37.351616 IP server-ip.eforward > client-ip.58029: Flags [R.], seq 43, ack 14530, win 17376, options [nop,nop,TS val 3857636823 ecr 1820004946], length 0
966:14:33:37.352159 IP client-ip.58029 > server-ip.eforward: Flags [.], seq 14530:15978, ack 42, win 14600, options [nop,nop,TS val 1820004947 ecr 3857636822], length 1448
1061:14:33:37.352173 IP server-ip.eforward > client-ip.58029: Flags [R], seq 30091395, win 0, length 0
1065:14:33:37.352207 IP client-ip.58029 > server-ip.eforward: Flags [P.], seq 15978:21770, ack 42, win 14600, options [nop,nop,TS val 1820004947 ecr 3857636822], length 5792
1432:14:33:37.352214 IP server-ip.eforward > client-ip.58029: Flags [R], seq 30091395, win 0, length 0
1436:14:33:37.352322 IP client-ip.58029 > server-ip.eforward: Flags [.], seq 21770:23218, ack 42, win 14600, options [nop,nop,TS val 1820004948 ecr 3857636822], length 1448
1531:14:33:37.352329 IP server-ip.eforward > client-ip.58029: Flags [R], seq 30091395, win 0, length 0
1535:14:33:37.352375 IP client-ip.58029 > server-ip.eforward: Flags [.], seq 23218:29010, ack 42, win 14600, options [nop,nop,TS val 1820004948 ecr 3857636822], length 5792
1902:14:33:37.352381 IP server-ip.eforward > client-ip.58029: Flags [R], seq 30091395, win 0, length 0
1906:14:33:37.352420 IP client-ip.58029 > server-ip.eforward: Flags [.], seq 29010:30458, ack 42, win 14600, options [nop,nop,TS val 1820004948 ecr 3857636822], length 1448
2001:14:33:37.352426 IP server-ip.eforward > client-ip.58029: Flags [R], seq 30091395, win 0, length 0

从行数来看:
第1、6、11行是三次握手;
第16行是客户端发ConnectRequest,第29行是服务端回ConnectResponse;
第41行是客户端发了一个很大的包,还被拆包了,服务端收到之后就发了FIN包,然后后面的请求就都RESET掉了

从内容看,第41行发的请求包,type为101(按照zk的协议,除了ConnectRequest,每个请求包的包头都会有type来标志请求包的类型),根据src/java/main/org/apache/zookeeper/ZooDefs.java中的代码显示,是setWatches请求。

这时候大概有些眉目了,应该是setWatches请求包太大导致的错误,于是转入了上述“正确的姿势”。

遗留问题

  • Q:为什么重启的时候不会有这个问题,而重连的时候会有这个问题?
  • A:重新启动的逻辑和重连的逻辑不一样。重启的时候会首先调用getChildren2去获取path的子节点,然后逐一去setWatches。重连时会把之前的watches批量set一遍。

总结

  1. 尽量先翻代码。先Google固然也是好习惯,但是容易陷入寻找类似问题的泥沼中,特别是在关键词挑选和组织得不是很好的情况下。
  2. 维护的开源产品还是需要多看看upstream的更新。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值