关于 CliDriver, 参考 Hive源码分析:CLI入口类
这个入口天生是为 Hive 的 shell 提供的,当我在自己的应用里想提交一个 Hive 任务时,却发现不能直接使用(之前 MR 的 RunJar 就可以)。
正如上面的 Hive 源码分析讲的, CliDriver 做了很多的工作,那我只能 hack 一下了。
拷贝了 CliDriver 的源码后,要做的工作有
- hack log4j 为了使用自己的配置
- 重新定义输出流以获取执行 HQL 的结果
hack log4j 很容易做到,有这么一段代码,重新初始化了log4j
boolean logInitFailed = false;
String logInitDetailMessage;
try {
logInitDetailMessage = LogUtils.initHiveLog4j();
} catch (LogInitializationException e) {
logInitFailed = true;
logInitDetailMessage = e.getMessage();
}
这个导致我们在外边无论怎么捣腾都没法配置日志系统,注释掉这段代码就 OK 了。
重新定义输出流需要看这里
CliSessionState ss = new CliSessionState(new HiveConf(SessionState.class));
ss.in = System.in;
try {
ss.out = new PrintStream(System.out, true, "UTF-8");
ss.info = new PrintStream(System.err, true, "UTF-8");
ss.err = new CachingPrintStream(System.err, true, "UTF-8");
} catch (UnsupportedEncodingException e) {
return 3;
}
重新定义下
ss.out = new YourHivePrintStream();
这些做完后,并不能执行需要 MR 的Hive, 问题是你必须解决 UGI 的冲突,不然你会遇到各种没有权限的异常,比如
[25-11:04:58,499] [ERROR] [main] [hive.ql.Driver] Authorization failed:No privilege 'Select' found for inputs { database:db, table:tb}. Use show grant to get more details.
这里有一个大坑,你想找到权限异常的原因,根本无法理解这个逻辑,权限这个东西是在 HDFS 端定义好的,登录到 HDFS 看到权限配置都正常的啊,而且直接使用 Hive 命令行都能正常执行的好吧
最后,盯着日志从头看,发现有个警告
[25-11:04:53,106] [WARN ] [main] [hadoop.security.UserGroupInformation] No groups available for user gdpi
[25-11:04:53,108] [WARN ] [main] [hadoop.security.UserGroupInformation] No groups available for user gdpi
还连续警告了两次,只能查看源码来找原因了,首先找到这句警告信息的出处
public synchronized String[] getGroupNames() {
ensureInitialized();
try {
List<String> result = groups.getGroups(getShortUserName());
return result.toArray(new String[result.size()]);
} catch (IOException ie) {
LOG.warn("No groups available for user " + getShortUserName());
return new String[0];
}
}
是在 UserGroupInformation 中找到的,然后一层层找到 Hive 中
CliDriver:
SessionState.start(ss);
// execute cli driver work
int ret = 0;
try {
ret = executeDriver(ss, conf, oproc);
} catch (Exception e) {
ss.close();
throw e;
}
看看 SessionState.start(ss) 做了什么
try {
startSs.authenticator = HiveUtils.getAuthenticator(
startSs.getConf(),HiveConf.ConfVars.HIVE_AUTHENTICATOR_MANAGER);
startSs.authorizer = HiveUtils.getAuthorizeProviderManager(
startSs.getConf(), HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
startSs.authenticator);
startSs.createTableGrants = CreateTableAutomaticGrant.create(startSs
.getConf());
} catch (HiveException e) {
throw new RuntimeException(e);
}
HiveUtils.getAuthenticator() 获取配置的授权管理器的类名,然后实例化
if (cls != null) {
ret = ReflectionUtils.newInstance(cls, conf);
}
实例化就实例化呗,但是居然又调用了
setConf(result, conf);
public static void setConf(Object theObject, Configuration conf) {
if (conf != null) {
if (theObject instanceof Configurable) {
((Configurable) theObject).setConf(conf);
}
setJobConf(theObject, conf);
}
}
授权管理器默认值是org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator, 它的setConf 是这样实现的
@Override
public void setConf(Configuration conf) {
this.conf = conf;
UserGroupInformation ugi = null;
try {
ugi = ShimLoader.getHadoopShims().getUGIForConf(conf);
} catch (Exception e) {
throw new RuntimeException(e);
}
if (ugi == null) {
throw new RuntimeException(
"Can not initialize HadoopDefaultAuthenticator.");
}
this.userName = ShimLoader.getHadoopShims().getShortUserName(ugi);
if (ugi.getGroupNames() != null) {
this.groupNames = Arrays.asList(ugi.getGroupNames());
}
}
好吧,调用了两次
ugi.getGroupNames()
原因就是没有获取到期望的用户组,因为在我的环境里根本就不存在这个用户(用户身份的问题参见前面一篇文章【
Hadoop UserGroupInformation 的那些 login】)。
再看看 UGI 获取用户组的途径
public synchronized String[] getGroupNames() {
ensureInitialized();
try {
List<String> result = groups.getGroups(getShortUserName());
return result.toArray(new String[result.size()]);
} catch (IOException ie) {
LOG.warn("No groups available for user " + getShortUserName());
return new String[0];
}
}
这个是依赖于 org.apache.hadoop.security.Groups#getGroups()
public List<String> getGroups(String user) throws IOException {
// No need to lookup for groups of static users
List<String> staticMapping = staticUserToGroupsMap.get(user);
if (staticMapping != null) {
return staticMapping;
}
// Return cached value if available
CachedGroups groups = userToGroupsMap.get(user);
long startMs = Time.monotonicNow();
// if cache has a value and it hasn't expired
if (groups != null && (groups.getTimestamp() + cacheTimeout > startMs)) {
if(LOG.isDebugEnabled()) {
LOG.debug("Returning cached groups for '" + user + "'");
}
return groups.getGroups();
}
// Create and cache user's groups
List<String> groupList = impl.getGroups(user);
long endMs = Time.monotonicNow();
long deltaMs = endMs - startMs ;
UserGroupInformation.metrics.addGetGroups(deltaMs);
if (deltaMs > warningDeltaMs) {
LOG.warn("Potential performance problem: getGroups(user=" + user +") " +
"took " + deltaMs + " milliseconds.");
}
groups = new CachedGroups(groupList, endMs);
if (groups.getGroups().isEmpty()) {
throw new IOException("No groups found for user " + user);
}
userToGroupsMap.put(user, groups);
if(LOG.isDebugEnabled()) {
LOG.debug("Returning fetched groups for '" + user + "'");
}
return groups.getGroups();
}
关键之处在
// Create and cache user's groups
List<String> groupList = impl.getGroups(user);
这个 impl 是在 Groups 实例化时被初始化的
public Groups(Configuration conf) {
impl =
ReflectionUtils.newInstance(
conf.getClass(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING,
ShellBasedUnixGroupsMapping.class,
GroupMappingServiceProvider.class),
conf);
cacheTimeout =
conf.getLong(CommonConfigurationKeys.HADOOP_SECURITY_GROUPS_CACHE_SECS,
CommonConfigurationKeys.HADOOP_SECURITY_GROUPS_CACHE_SECS_DEFAULT) * 1000;
warningDeltaMs =
conf.getLong(CommonConfigurationKeys.HADOOP_SECURITY_GROUPS_CACHE_WARN_AFTER_MS,
CommonConfigurationKeys.HADOOP_SECURITY_GROUPS_CACHE_WARN_AFTER_MS_DEFAULT);
parseStaticMapping(conf);
if(LOG.isDebugEnabled())
LOG.debug("Group mapping impl=" + impl.getClass().getName() +
"; cacheTimeout=" + cacheTimeout + "; warningDeltaMs=" +
warningDeltaMs);
}
所以,存在一个提供用户组映射服务的工具,是有默认值的,代码里默认是 org.apache.hadoop.security.ShellBasedUnixGroupsMapping, 而在 core-site.xml 里默认值是 org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback. 不管如何,这个用户组是依赖于当前操作系统的,必须 hack. 我就实现了一个
public class MyUserGroupsMapping implements GroupMappingServiceProvider {
@Override
public List<String> getGroups(String user) throws IOException {
return Lists.newArrayList(user);
}
@Override
public void cacheGroupsRefresh() throws IOException {
// does nothing in this provider of user to groups mapping
}
@Override
public void cacheGroupsAdd(List<String> groups) throws IOException {
// does nothing in this provider of user to groups mapping
}
}
然后修改了 core-site.xml . 终于成功提交的 Hive 所需执行的 MR 作业,但是为什么都是 Failed
去查看集群上的日志,原来是 ClassNotFound , 我的自定义实现类 MyUserGroupsMapping 又不在集群上
那么,只有偷梁换柱之策了,先修改配置指定用我的 GroupMappingServiceProvider,待本地 Hive 准备提交 MR 之前,再恢复原貌
// set all properties specified via command line
HiveConf conf = ss.getConf();
/*hack start*/
//set hadoop.security.group.mapping to return the group of user, cause the user does not exist
String hadoopSecurityGroupMappingClass = conf.get(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING);
conf.setClass(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING, MeepoUserGroupsMapping.class, GroupMappingServiceProvider.class);
console.printInfo("set HADOOP_SECURITY_GROUP_MAPPING......");
resetGroupsMapping(conf);
/*hack end*/
for (Map.Entry<Object, Object> item : ss.cmdProperties.entrySet()) {
conf.set((String) item.getKey(), (String) item.getValue());
ss.getOverriddenConfigurations().put((String) item.getKey(), (String) item.getValue());
}
// read prompt configuration and substitute variables.
prompt = conf.getVar(HiveConf.ConfVars.CLIPROMPT);
prompt = new VariableSubstitution().substitute(conf, prompt);
prompt2 = spacesForString(prompt);
SessionState.start(ss);
/*hack start*/
//prevent submit mr to rm with my mapping class value, restore the old value
if (StringUtils.isEmpty(hadoopSecurityGroupMappingClass)) {
conf.unset(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING);
} else {
conf.set(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING, hadoopSecurityGroupMappingClass);
}
console.printInfo("reset HADOOP_SECURITY_GROUP_MAPPING......");
resetGroupsMapping(conf);
/*hack end*/
// execute cli driver work
int ret = 0;
try {
ret = executeDriver(ss, conf, oproc);
} catch (Exception e) {
ss.close();
throw e;
}
ss.close();
这里还有个陷阱,光改变 conf 是远远不够的,还需要这个
private void resetGroupsMapping(Configuration conf) {
console.printInfo(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING + ": " +
conf.get(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING));
Groups.getUserToGroupsMappingServiceWithLoadedConfiguration(conf);
UserGroupInformation.setConfiguration(conf);
}
只有这样,才会清空缓存,重新
Returning fetched groups
done.