本文记述一些本人在用hive时,所遇到的一些大坑小坑,一是防止自己再次落坑,二是避免旁人也掉入进去。文章会不定时更新。
1 注释写在第一行时所报IO错误
-- 这是一段很长很长的hql注释,看看hive能不能运行通过,黄梅时节家家雨,青草池塘处处蛙,有约不来夜过半,闲敲棋子落灯花。
select 'Hello World' from dual
-- 继续写hql注释
union all
select 'Hello Hive' from dual
所报错误
java.io.IOException: Job status not available
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:334)
at org.apache.hadoop.mapreduce.Job.getJobState(Job.java:359)
at org.apache.hadoop.mapred.JobClient$NetworkedJob.getJobState(JobClient.java:296)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:244)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:435)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1645)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1404)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1190)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:702)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
报错原因(猜测):
hive在进行查询的时候,会先将查询结果存储到一个临时表中,临时表的名字是以第一行的来命名的。当第一行是中文注释,并且注释很长的时候,会超出临时表命名长度范围,导致出错。
避免方法:
在hql语句的第一行别写注释,也不能将注释添加到第一行的hql语句后面。
2 hive中 ctrl+C不能 kill job
在hive运行的过程中,采用 ctrl + C 的方法只是退出客户端,hive的job仍在后台运行。要真正kill掉job,需要执行指令 hadoop job -kill job_id 指令。例如:对于Job = job_1467707859270_1000260,可以
hadoop job -kill job_1467707859270_1000260