In my server application I'm connecting to Kerberos secured Hadoop cluster from my java application. On the application startup I do call
UserGroupInformation.loginUserFromKeytabAndReturnUGI( ... );
I'm doing basic File operations using native FileSystem API like FileSystem.exists() and FileSystem.delete()
My application throws the following error after 24H. That's the expiry for Kerberos ticket.
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:690)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:653)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:740)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
... 27 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563)
From this answer, the Kerberos ticket should be auto renewed.
My application is using Java 8 and I came across this bug.
But then looks like the hadoop-common-2.7.1.2.4.2.12-1.jar used by my application already has the fix. The source can be found here.
But still got the same error as the auto renewal was not happening. It was resolved only after calling UserGroupInformation.checkTGTAndReloginFromkeytab() before each action as suggested in the above answer .But that was suggested only when using Rest APIs and not for RPCs and I hope native Java APIs use RPC only.
Why is the auto renewal not happening as suggested in the above answer?
解决方案
Unfortunately, there is a known issue with automatic renewal not working correctly when using the UserGroupInformation#loginUserFromKeytabAndReturnUGI method. I am not aware of any known code fix within Apache Hadoop at this time.
Your solution to add a call to UserGroupInformation#checkTGTAndReloginFromKeytab is a viable workaround. I recommend that you stick with that for now and keep an eye on Apache Hadoop release notes to see if there is a fix committed in the future.