hbase kerberos认证无法自动续约问题

一、背景

服务运行一段时间后(大概7天),hbase写入和读取报错,错误描述为:重试次数耗尽,原因是因为正在重新认证且失败了,不能接受请求。之前同事是使用crontab定时重启临时解决,自己刚好有空帮忙看看。

org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
Tue Feb 23 15:49:17 CST 2021, RpcRetryingCaller{globalStartTime=1614066557307, pause=100, maxAttempts=2}, javax.security.sasl.SaslException: Call to hadoopxxx8.xxx.com/192.168.xx.xx:16020 failed on local exception: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] [Caused by javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]]
Tue Feb 23 15:49:17 CST 2021, RpcRetryingCaller{globalStartTime=1614066557307, pause=100, maxAttempts=2}, java.io.IOException: Call to hadoopxxx.xxx.com/192.168.xx.xx:16020 failed on local exception: java.io.IOException: Can not send request because relogin is in progress.

	at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
	at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Call to hadoopxxx.xxx.com/192.168.xx.xx:16020 failed on local exception: java.io.IOException: Can not send request because relogin is in progress.
	at sun.reflect.GeneratedConstructorAccessor46.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:221)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
	at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
	at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:423)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:328)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:95)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)
	at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:42534)
	at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:332)
	at org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:242)
	at org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:58)
	at org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
	at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:387)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:361)
	at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
	... 4 more
Caused by: java.io.IOException: Can not send request because relogin is in progress.
	at org.apache.hadoop.hbase.ipc.NettyRpcConnection.sendRequest(NettyRpcConnection.java:301)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:421)
	... 16 more

二、分析

  1. 报错服务的hbase client包和其他线上跑的n个项目是一样的,但只有这个项目的hbase报续约失败的错误。
  2. hbase client的认证代码是在这个类UserGroupInformation,属于hadoop-common包

google一下,很快找到关键issue链接

stackoverflow.com/quest

大概意思:使用hadoop rpc的相关应用,无需关注kerberos认证续约问题,而web hdfs,yarn rest api等需要程序(非hadoop环境下的应用)需要自己起一个后台线程定时调用UserGroupInformation.getLoginUser().checkTGTAndReloginFromKeytab()完成续约。

stackoverflow.com/quest

大概意思:有两种方式解决这个问题

  1. 升级hadoop相关包(hadoop-auth, hadoop-mapreduce-client-core, hadoop-common)到2.6.5
  2. 创建一个线程定时调用如下语句完成续约UserGroupInformation.getLoginUser().checkTGTAndReloginFromKeytab()

三、修复上线

采用第二种方式:增加定时调度线程完成续约,解决了问题

Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(new Runnable() {
                @Override
                public void run() {
                    try {
                        UserGroupInformation.getLoginUser().checkTGTAndReloginFromKeytab();
                        logger.info("Check Kerberos Tgt And Relogin From Keytab Finish.");
                    } catch (IOException e) {
                        logger.error("Check Kerberos Tgt And Relogin From Keytab Error", e);
                    }
                }
            }, 0, 10, TimeUnit.MINUTES);
            logger.info("Start Check Keytab TGT And Relogin Job Success.");
        }

疑问:为什么其他项目没有这个问题?为什么要升级hadoop相关包到2.6.5?

四、再分析

既然和hadoop包版本有关系,查看pom.xml

v2-702d11c2dbe210d7a2f8aaef5b6d1ac4_b.jpg

solr依赖的hadoop是2.6.0,maven二级依赖;lz-async-hbase依赖的hadoop版本是2.7.7,maven三级依赖。二级>三级所以使用2.6.0,去jira查hadoop2.6.0的kerberos ticket renew问题。

主要查到以下issue:

1、issues.apache.org/jira/

大概意思:在jdk8环境中,<=2.6.0的hadoop版本,isKeyTab的判断始终为false,导致

UserGroupInformation.getLoginUser().reloginFromKeytab方法静默失败,无法续约。

2、Kerberos ticket isn't being renewed by Solr when storing indexes on HDFS

大概意思:jdk8,solr<6.2.0环境下,需要手动更新hadoop包到2.6.1+,否者会遇到kerberos票证自动续订有问题。【这个才是最根本原因!】

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值