运行SQL时出了个错:
SQL: INSERT OVERWRITE DIRECTORY 'result/testConsole' select count(1) from nutable;
错误信息:
Failed with exception Unable to rename: hdfs://indigo:8020/tmp/hive-root/hive_2013-08-22_17-35-05_006_3570546713731431770/-ext-10000 to: result/testConsoleFAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
另一个SQL的错误,这是日志中的:
1042 2013-08-22 17:08:54,411 INFO exec.Task (SessionState.java:printInfo(412)) - Moving data to: result/userName831810250/54cbcd2980a64fe78cf54abb3116d2dc from hdfs://indigo:8020/tmp/hive-hive/hive_2013-08-22_17-08-40_062_3976325306495167351/-ext-10000
1043 2013-08-22 17:08:54,414 ERROR exec.Task (SessionState.java:printError(421)) - Failed with exception Unable to rename: hdfs://indigo:8020/tmp/hive-hive/hive_2013-08-22_17-08-40_062_3976325306495167351/-ext-10000 to: result/userName831810250/54cbcd2980a64fe78cf54abb3116d2dc
下面看看出现异常的地方。
执行SQL时,最后一个任务是MoveTask,它的作用是将运行SQL生成的Mapeduce任务结果文件放到SQL中指定的存储查询结果的路径中,具体方法就是重命名
下面是 org.apache.hadoop.hive.ql.exec.MoveTask 中对结果文件重命名的一段代码:
//这个sourcePath参数就是存放Mapeduce结果文件的目录,所以它的值可能是
//hdfs://indigo:8020/tmp/hive-root/hive_2013-08-22_18-42-03_218_2856924886757165243/-ext-10000
if (fs.exists(sourcePath)) {
Path deletePath = null;
// If it multiple level of folder are there fs.rename is failing so first
// create the targetpath.getParent() if it not exist
if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
deletePath = createTargetPath(targetPath, fs);
}
//这里targetPath的值就是指定的放置结果文件的目录,值可能是 result/userName154122639/4e574b5d9f894a70b074ccd3981ca0f1
if (!fs.rename(sourcePath, targetPath)) {//上面产生的异常就是因为这里rename失败,进了if,throw了个异常
try {
if (deletePath != null) {
fs.delete(deletePath, true);
}
} catch (IOException e) {
LOG.info("Unable to delete the path created for facilitating rename"
+ deletePath);
}
throw new HiveException("Unable to rename: " + sourcePath
+ " to: " + targetPath);
}
}
rename的targetPath必须存在。
其实之前已经检查和创建targetPath了:
private Path createTargetPath(Path targetPath, FileSystem fs) throws IOException {
Path deletePath = null;
Path mkDirPath = targetPath.getParent();
if (mkDirPath != null & !fs.exists(mkDirPath)) {
Path actualPath = mkDirPath;
while (actualPath != null && !fs.exists(actualPath)) {
deletePath = actualPath;
actualPath = actualPath.getParent();
}
fs.mkdirs(mkDirPath);
}
return deletePath;//返回新创建的最顶层的目录,万一失败用来删除用
}
Apache出现过这个 问题,已经解决掉了
CDH 竟然加了个参数 hive.insert.into.multilevel.dirs,默认是false,意思是我还有这BUG呢哈。
当你被坑了,想打个patch时,会发现改个配置就可以了。
意思是我保留这个BUG,但你要是被坑了也不能说我有BUG,自己改配置好了。$+@*^.!"?......
目前还没发现其他地方用到了这个参数,在这里唯一作用就是限制SQL中指定存放结果文件不存在的目录的深度不能大于1.
不过也没发现这有什么好处。
折腾半天,加个配置就可以了:
<property>
<name>hive.insert.into.multilevel.dirs</name>
<value>true</value>
</property>