报错详情:
yarn timeline service 2.0服务启动后查看后台日志有如下报错:
Wed May 06 16:47:10 CST 2020, RpcRetryingCaller{globalStartTime=1588754824588, pause=1000, maxAttempts=4}, java.net.ConnectException: Call to cjhdpnode21.zpepc.com.cn/21.48.35.21:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: cjhdpnode21.zpepc.com.cn/21.48.35.21:17020
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
... 3 more
Caused by: java.net.ConnectException: Call to cjhdpnode21.zpepc.com.cn/21.48.35.21:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: cjhdpnode21.zpepc.com.cn/21.48.35.21:17020
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
at org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
问题分析:
1.看报错信息知道是连接客户端hbase 拒绝连接异常,而17020与hbase regionserver服务默认端口16020不同。查看yarn配置文件搜索“17020”相关配置。
2.发现Advanced yarn-hbase-site选项卡下,有hbase相关配置,但所有配置项的值与集群hbase的hbase-site.xml对应配置项的值均不同。
3.然后在选项卡Advanced yarn-hbase-env有配置项:use_external_hbase,该项是没勾选的,即使用内置的hbase,而不是集群的hbase,所以2中配置文件的值会出现差异。
解决办法:
使用集群HBase的方案
1.关闭Yarn服务
2.打开选项卡Advanced yarn-hbase-env,勾选配置项use_external_hbase
3.打开Advanced yarn-hbase-site选项卡,修改配置项:hbase.regionserver.info.port,hbase.regionserver.port,hbase.rootdir,zookeeper.znode.parent 的值,使其与集群hbase中对应配置项值一致
4.重启Yarn服务
5.登陆到某台机器,执行下面的命令,创建需要的表:
[hbase@master1 ~]$ export HBASE_CLASSPATH_PREFIX={hdp-dir}/hadoop-yarn/timelineservice/*
[hbase@master1 ~]$ hbase org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -Dhbase.client.retries.number=35 -create -s
6.登陆 HBase shell,给 yarn 用户赋权限
[hbase@master1 ~]$ hbase shell
hbase(main):001:0>grant 'yarn', 'RWXCA'
问题解决。
如果集群开启了Kerberos,则日志中仍有报错如下:
Thu May 07 09:36:26 CST 2020, RpcRetryingCaller{globalStartTime=1588815377956, pause=100, maxAttempts=8}, java.io.IOException: Call to cjhdpnode18.zpepc.com.cn/21.48.35.18:16020 failed on local exception: java.io.IOException: org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Call to cjhdpnode18.zpepc.com.cn/21.48.35.18:16020 failed on local exception: java.io.IOException: org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:180)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
at org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
原因是Kerberos认证失败,即timelines service使用的principal无法通过认证。
解决办法:
修改Kerberos中yarn选项卡里配置项:hbase.master.kerberos.principal、hbase.master.keytab.file、hbase.regionserver.kerberos.principal、hbase.regionserver.keytab.file。