本文描述安全集群访问非安全集群遇到的问题及分析。
案例
使用Hive映射Phoenix表,其中Hive服务在启用kerberos的集群中,Phoenix在另一个未启用Kerberos的集群中。
报错及分析
HUE返回报错:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed after attempts=11, exceptions: Thu Dec 03 08:52:11 CST 2020, RpcRetryingCaller{globalStartTime=1606956693172, pause=100, maxAttempts=11}, org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=11, exceptions: Thu Dec 03 08:51:33 CST 2020, RpcRetryingCaller{globalStartTime=1606956693172, pause=100, maxAttempts=11}, javax.security.sasl.SaslException: Call to transfer01.bigdata.zxxk.com/10.111.118.166:16020 failed on local exception: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)] [Caused by javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]] Thu Dec 03 08:51:33 CST 2020, RpcRetryingCaller{globalStartTime=1606956693172, pause=100, maxAttempts=11}, java.io.IOException: Call to transfer01.bigdata.zxxk.com/10.111.118.166:16020 failed on local exception: java.io.IOException: Can not send request because relogin is in progress.
......
分析:
可以看出,这个是Kerberos认证问题,主要信息是:Server not found in Kerberos database
。
但是这里没有提示这个 Server
是啥。
经过观察/var/log/krb5kdc.log
,发现报错如下:
Dec 03 09:12:58 utility1.bigdata.zxxk.com krb5kdc[2740](info): AS_REQ (4 etypes {18 17 16 23}) 10.111.116.226: ISSUE: authtime 1606957978, etypes {rep=18 tkt=18 ses=18}, hive/gateway01.bigdata.zxxk.com@BIGDATA.ZXXK.COM for krbtgt/BIGDATA.ZXXK.COM@BIGDATA.ZXXK.COM
Dec 03 09:12:59 utility1.bigdata.zxxk.com krb5kdc[2739](info): TGS_REQ (4 etypes {18 17 16 23}) 10.111.116.226: LOOKING_UP_SERVER: authtime 0, hive/gateway01.bigdata.zxxk.com@BIGDATA.ZXXK.COM for hbase/transfer01.bigdata.zxxk.com@BIGDATA.ZXXK.COM, Server not found in Kerberos database
Dec 03 09:12:59 utility1.bigdata.zxxk.com krb5kdc[2739](info): TGS_REQ (4 etypes {18 17 16 23}) 10.111.116.226: LOOKING_UP_SERVER: authtime 0, hive/gateway01.bigdata.zxxk.com@BIGDATA.ZXXK.COM for hbase/transfer01.bigdata.zxxk.com@BIGDATA.ZXXK.COM, Server not found in Kerberos database
......
分析:
其中hbase/transfer01.bigdata.zxxk.com@BIGDATA.ZXXK.COM
,在Kerberos数据库中不存在,而 transfer01.bigdata.zxxk.com
是未启用Kerberos集群的一个主机。
解决思路(未验证)
本思路未验证,请勿在生产环境尝试。
由于Kerberos的机制,非认证主机和服务无法建立连接,所以如果想要解决上述问题,需要将目标主机加入Kerberos认证管理,并创建相应的服务。