SAP HANA Spark Controller(SHSC) Kerberos token失效问题

问题描述:

SAP HANA Spark Controller(2.4.4)连接HDFS集群失败,hana_controller.log 日志显示以下报错:
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1814

分析与建议

该问题是kerberos的问题,与SHSC无关
建议升级至 SHSC 最新(当前是2.6.0 版本)

SAP 开发的解释如下

Kerberos (SHSC)

The SHSC uses the SparkContext as the entry to leverage Spark and subsequently connecting to Hive. And the related Kerberos configuration in a Hadoop
cluster for a long running Spark applications is pretty simple. Please see below references for further details:

Long-Running Applications

Configuring Spark on YARN for Long-Running Applications
SHSC only has to provide the spark.yarn.keytab and spark.yarn.principal for the process.

However, since SHSC doesn’t utilize the spark-submit entry point, but SparkContext as an entry point (also a valid method), the tokens aren’t renewed
automatically. Unlike spark-submit jobs, SHSC has to renew it manually within the existing SparkContext which SHSC does periodically.

As stated explained in SAP Note 3004055 SHSC doesn’t perform any sort of authentication or authorization, but only passes on the required Kerberos information
such as keytab, principal, and tokens.

In addition, you may also want to read Hadoop Delegation Tokens Explained.

Error Message:

In regards to the error messages that can be found from the SHSC logs. Those are written/reported very clearly by hadoop not SHSC
as can be seen from the call stack information such as org.apache.hadoop.* 

    org.apache.hadoop.security.authentication.client.AuthenticationException 
    org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1814)

Please refer to the attchment called "SHSC_KerberosSequence_Error.png" illustrating the calling sequence and occurrance of the 
Kerberos issue. As one can see the authentication was already performed successfully from SHSC throughout the involved Hadoop components.
During query execution from Spark to Hadoop/HDFS the authentication issue occurrs with below call stack. 

22/11/08 11:53:23 DEBUG SAPHanaSpark-akka.actor.default-dispatcher-32 security.UserGroupInformation: PrivilegedActionException as:hanaes (auth:PROXY) via 
    hanaes/sunpaphmp03.sun.nec.co.jp@SUN.NEC.CO.JP (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
22/11/08 11:53:23 ERROR SAPHanaSpark-akka.actor.default-dispatcher-32 network.AsyncExecutor: Exception happened in Async Executor. Forwarding to Handler
    java.io.IOException: DestHost:destPort sunpaphmp02.sun.nec.co.jp:8020 , LocalHost:localPort SUNPAPHMP03.sun.nec.co.jp/10.179.8.32:0. Failed on local exception: 
    java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1501)
    at org.apache.hadoop.ipc.Client.call(Client.java:1443)
    at org.apache.hadoop.ipc.Client.call(Client.java:1353)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
    at com.sun.proxy.$Proxy9.getDelegationToken(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1071)
    at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
    at com.sun.proxy.$Proxy10.getDelegationToken(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:700)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1814)
    at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
    at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:216)
    at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.listStatus(AvroContainerInputFormat.java:42)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.sql.hana.DistributedDataSetImpl.coalesceToMaxPartitions(DistributedDataSetFactoryImpl.scala:178)
    at org.apache.spark.sql.hana.DistributedDataSetImpl.transferDatafromPartitions(DistributedDataSetFactoryImpl.scala:189)
    at com.sap.hana.spark.SparkFacade.dispatchQueryResultFn(SparkFacade.scala:302)
    at com.sap.hana.spark.SparkFacade.executeQueryAsync(SparkFacade.scala:395)
    at com.sap.hana.network.AsyncExecutor.executeJob(RequestRouter.scala:97)
    at com.sap.hana.network.AsyncExecutor$$anonfun$receive$1$$anon$2.run(RequestRouter.scala:53)
    at com.sap.hana.network.AsyncExecutor$$anonfun$receive$1$$anon$2.run(RequestRouter.scala:50)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:360)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
    at com.sap.hana.network.AsyncExecutor$$anonfun$receive$1.applyOrElse(RequestRouter.scala:50)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
    at com.sap.hana.network.AsyncExecutor.aroundReceive(RequestRouter.scala:40)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
    at akka.actor.ActorCell.invoke(ActorCell.scala:487)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
    at akka.dispatch.Mailbox.run(Mailbox.scala:220)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:757)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:720)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:813)
    at org.apache.hadoop.ipc.Client$Connection.access$3600(Client.java:410)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1558)
    at org.apache.hadoop.ipc.Client.call(Client.java:1389)
    ... 81 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
    at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:173)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:410)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:800)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:796)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:796)
    ... 84 more

If there would be an issue with missing tokens in a cache (which SHSC also doesn’t have) there should be related error messages
such as:

HDFS_DELEGATION_TOKEN can’t be found in cache
And the call stack also actually shows:

    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)t 
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值