HBase Regionserver 批量掉线问题排查

环境

CDH集群版本:5.16.2 

HBase 1.2

Zookeeper  3.4.5

HBase集群主要用于JanusGraph 后端存储。

现象

2022年开始 regionserver 过一段就会出现批量掉线,日志报错如下。

regionserver 日志

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hiveserver2/serverUri=<servername>:10010;version=1.2.1000.2.6.1.0-129;sequence=0000000187
       at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
       at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
       at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239)
       at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234)
       at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
       at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
       at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215)
       at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42)
       at org.apache.curator.framework.recipes.nodes.PersistentEphemeralNode.deleteNode(PersistentEphemeralNode.java:315)
       at org.apache.curator.framework.recipes.nodes.PersistentEphemeralNode.close(PersistentEphemeralNode.java:274)
       at org.apache.hive.service.server.HiveServer2$DeRegisterWatcher.process(HiveServer2.java:334)
       at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:61)
       at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
       at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)

INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server xxxxx zookeeper connection closed.
INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/xxxx exiting
ERROR org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine:Region server exiting

原因

通过分析zookeeper日志,发现相同的session id被分配到不同的节点,导致部分节点session id失效。从而导致以上问题。

而该问题是由zk 生成sessionid bug引起:当System.currentTimeMillis()中的第40位为1时,符号扩展将填充nextSid的前8个字节,并且id不会使会话id唯一,因此当zk大量链接时,有存在生成重复id的可能性,建议将右移改为逻辑移位。(参考:ZOOKEEPER-1622 )

org.apache.zookeeper.server.SessionTrackerImpl$SessionImpl

   public static long initializeNextSession(long id) {
        long nextSid = 0;
        nextSid = (System.currentTimeMillis() << 24) >> 8;
        nextSid =  nextSid | (id <<56);
        return nextSid;
    }

修改为:

    public static long initializeNextSession(long id) {
    	LOG.info("initializeNextSession 1622 patch.");
        long nextSid = 0;		
		nextSid = (Time.currentElapsedTime() << 24) >>> 8;
        nextSid =  nextSid | (id <<56);
        if (nextSid == Long.MIN_VALUE) {
            ++nextSid;  // this is an unlikely edge case, but check it just in case
        }
        return nextSid;
    }

问题修复

因zk是CDH集群自带版本,升级zk影响较大,因此采用下载源码对该类进行修改编译后,单独打包,然后把补丁优先加载解决。

下载安装好ant,进入代码目录,执行ant命令即可打包

D:\zookeeper-release-3.4.5>ant
ANT_OPTS is set to  -Djava.security.manager=allow
Buildfile: D:\zookeeper-release-3.4.5\build.xml

init:

ivy-download:

ivy-taskdef:

ivy-init:

ivy-retrieve:
[ivy:retrieve] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:retrieve] :: loading settings :: file = D:\zookeeper-release-3.4.5\ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: org.apache.zookeeper#zookeeper;3.4.5
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  found org.slf4j#slf4j-api;1.6.1 in maven2
[ivy:retrieve]  found org.slf4j#slf4j-log4j12;1.6.1 in maven2
[ivy:retrieve]  found log4j#log4j;1.2.15 in maven2
[ivy:retrieve]  found jline#jline;0.9.94 in maven2
[ivy:retrieve]  found org.jboss.netty#netty;3.2.2.Final in maven2
[ivy:retrieve] :: resolution report :: resolve 169ms :: artifacts dl 23ms
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   5   |   0   |   0   |   0   ||   5   |   0   |
        ---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: org.apache.zookeeper#zookeeper
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  0 artifacts copied, 5 already retrieved (0kB/10ms)

clover.setup:

clover.info:

clover:

jute:

compile_jute_uptodate:

compile_jute:

ver-gen:

svn-revision:
     [exec]
     [exec] D:\zookeeper-release-3.4.5>echo off
     [exec] 'svn' 不是内部或外部命令,也不是可运行的程序
     [exec] 或批处理文件。
     [exec] Result: 255

version-info:
     [java] Unknown REVISION number, using -1

build-generated:
    [javac] Compiling 1 source file to D:\zookeeper-release-3.4.5\build\classes
    [javac] 警告: [options] 未与 -source 1.5 一起设置引导类路径
    [javac] 警告: [options] 源值1.5已过时, 将在未来所有发行版中删除
    [javac] 警告: [options] 目标值1.5已过时, 将在未来所有发行版中删除
    [javac] 警告: [options] 要隐藏有关已过时选项的警告, 请使用 -Xlint:-options。
    [javac] 4 个警告

compile:
    [javac] Compiling 151 source files to D:\zookeeper-release-3.4.5\build\classes
    [javac] 警告: [options] 未与 -source 1.5 一起设置引导类路径
    [javac] 警告: [options] 源值1.5已过时, 将在未来所有发行版中删除
    [javac] 警告: [options] 目标值1.5已过时, 将在未来所有发行版中删除
    [javac] 警告: [options] 要隐藏有关已过时选项的警告, 请使用 -Xlint:-options。
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\JLineZNodeCompletor.java:33: 警告: [rawtypes] 找到原始类型: List
    [javac]     public int complete(String buffer, int cursor, List candidates) {
    [javac]                                                    ^
    [javac]   缺少泛型类List<E>的类型参数
    [javac]   其中, E是类型变量:
    [javac]     E扩展已在接口 List中声明的Object
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\Shell.java:276: 警告: [serial] 可序列化类ExitCodeException没有 serialVersionUID 的定义
    [javac]   public static class ExitCodeException extends IOException {
    [javac]                 ^
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\ZooKeeperMain.java:305: 警告: [rawtypes] 找到原始类型: Class
    [javac]                 Class consoleC = Class.forName("jline.ConsoleReader");
    [javac]                 ^
    [javac]   缺少泛型类Class<T>的类型参数
    [javac]   其中, T是类型变量:
    [javac]     T扩展已在类 Class中声明的Object
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\ZooKeeperMain.java:306: 警告: [rawtypes] 找到原始类型: Class
    [javac]                 Class completorC =
    [javac]                 ^
    [javac]   缺少泛型类Class<T>的类型参数
    [javac]   其中, T是类型变量:
    [javac]     T扩展已在类 Class中声明的Object
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\jmx\ManagedUtil.java:62: 警告: [rawtypes] 找 到原始类型: Enumeration
    [javac]         Enumeration enumer = r.getCurrentLoggers();
    [javac]         ^
    [javac]   缺少泛型类Enumeration<E>的类型参数
    [javac]   其中, E是类型变量:
    [javac]     E扩展已在接口 Enumeration中声明的Object
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\server\ZooKeeperServer.java:502: 警告: [rawtypes] 找到原始类型: ArrayList
    [javac]                     acl == null ? new ArrayList<ACL>() : new ArrayList(acl));
    [javac]                                                              ^
    [javac]   缺少泛型类ArrayList<E>的类型参数
    [javac]   其中, E是类型变量:
    [javac]     E扩展已在类 ArrayList中声明的Object
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\server\quorum\QuorumPeer.java:576: 警告: [deprecation] org.apache.zookeeper.server.quorum中的LeaderElection已过时
    [javac]             le = new LeaderElection(this);
    [javac]                      ^
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\server\quorum\QuorumPeer.java:579: 警告: [deprecation] org.apache.zookeeper.server.quorum中的AuthFastLeaderElection已过时
    [javac]             le = new AuthFastLeaderElection(this);
    [javac]                      ^
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\server\quorum\QuorumPeer.java:582: 警告: [deprecation] org.apache.zookeeper.server.quorum中的AuthFastLeaderElection已过时
    [javac]             le = new AuthFastLeaderElection(this, true);
    [javac]                      ^
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\server\quorum\QuorumPeer.java:603: 警告: [deprecation] org.apache.zookeeper.server.quorum中的LeaderElection已过时
    [javac]             electionAlg = new LeaderElection(this);
    [javac]                               ^
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\server\util\KerberosUtil.java:39: 警告: [rawtypes] 找到原始类型: Class
    [javac]     getInstanceMethod = classRef.getMethod("getInstance", new Class[0]);
    [javac]                                                               ^
    [javac]   缺少泛型类Class<T>的类型参数
    [javac]   其中, T是类型变量:
    [javac]     T扩展已在类 Class中声明的Object
    [javac] D:\zookeeper-release-3.4.5\src\java\main\org\apache\zookeeper\server\util\KerberosUtil.java:42: 警告: [rawtypes] 找到原始类型: Class
    [javac]          new Class[0]);
    [javac]              ^
    [javac]   缺少泛型类Class<T>的类型参数
    [javac]   其中, T是类型变量:
    [javac]     T扩展已在类 Class中声明的Object
    [javac] 16 个警告

jar:
      [jar] Building jar: D:\zookeeper-release-3.4.5\build\zookeeper-3.4.5.jar

BUILD SUCCESSFUL
Total time: 7 seconds

把补丁包拷贝到zk目录下/lib/zookeeper/build 下,根据zk启动参数确认该路径包将优先加载。

通过日志确认该修改已加载,经过线上长时间运行验证,该问题解决。

  • 7
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 7
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

larry_seven

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值