1、 Failed with exception java.io.IOException:java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 6
Failed with exception java.io.IOException:java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 6
原因是低版本的hive不能读取高版本hive的orc格式的文件
解决办法:
a、用都兼容的hive版本进行转换
b、修改源码
2、Error: Error while compiling statement: FAILED: SemanticException [Error 10016]: Line 299:10 Argument type mismatch ‘create_time’: regexp only takes STRING_GROUP types as 1st argument, got DATE (state=42000,code=10016)
Error: Error while compiling statement: FAILED: SemanticException [Error 10016]: Line 299:10 Argument type mismatch 'create_time': regexp only takes STRING_GROUP types as 1st argument, got DATE (state=42000,code=10016)
原因:2.1.1版本的to_date函数的返回值是date类型,而1.2.1是string类型,并且2.1.0版本之后regexp函数只能传入string类型(之前会自动转换,所以在使用regexp函数的地方要确保两个参数都是string类型),所以报错
解决方法:
a、将使用to_date的地方通过 cast 强转为string
b、修改底层to_date函数的实现,改为返回string类型
3、Error: Error while compiling statement: FAILED: ParseException line 63:18 cannot recognize input near ‘regexp’ ‘(’ ‘lower’ in expression specification (state=42000,code=40000)
Error: Error while compiling statement: FAILED: ParseException line 63:18 cannot recognize input near 'regexp' '(' 'lower' in expression specification (state=42000,code=40000)
原因:hive在2.0.0以后将 REGEXP, RLIKE 作为了保留关键字,默认只支持 A regexp B的用法,不支持 regexp(A, B)
参考:https://issues.apache.org/jira/browse/HIVE-11703
解决方法:
a、关键字转义,将 regexp(A, B) 改为 `regexp`(A, B)
b、将 regexp(A, B) 改为 A regexp B
c、设置参数 set hive.support.sql11.reserved.keywords=false
4、hive.exec.stagingdir跨域问题
ERROR : Failed with exception Unable to move source viewfs://user/tsk/tmp/tsk/hive/.stagingdir_hive_2021-01-22_14-55-52_750_1753605795304611751-12208/-ext-10001 to destination viewfs://user/tsk/hive/warehouse/f_ads.db/white_card_num
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source viewfs://qunarcluster/user/tsk/tmp/tsk/hive/.stagingdir_hive_2021-01-22_14-55-52_750_1753605795304611751-12208/-ext-10001 to destination viewfs://user/tsk/hive/warehouse/f_ads.db/white_card_num
跨域问题:原因是该脚本运行的用户是triumphsk,但是hive的参数hive.exec.stagingdir
设置的为:/user/tsk/tmp/tsk/hive/.stagingdir 就导致了跨域的问题
解决方法是:运行脚本的时候,设置hive.exec.stagingdir参数
5、map join产生的问题
Error: GC overhead limit exceeded
两表数据都小的时候,会出现GC错误,解决办法:关闭map join
set hive.auto.convert.join=false;
6、Error during job, obtaining debugging information… running beyond physical memory limits. Current usage: 4.1 GB of 4 GB physical memory used;
ERROR : Ended Job = job_1631679144970_1574917 with errors
ERROR : Error during job, obtaining debugging information...
ERROR :
Task with the most failures(4):
-----
Task ID:
task_1631679144970_1574917_m_000158
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1631679144970_1574917&tipid=task_1631679144970_1574917_m_000158
-----
Diagnostic Messages for this Task:
Container [pid=18442,containerID=container_1631679144970_1574917_01_003316] is running beyond physical memory limits. Current usage: 4.1 GB of 4 GB physical memory used; 5.8 GB of 80 GB virtual memory used. Killing container.
原因是物理内存大小不够
set mapreduce.map.memory.mb=8192;
set mapreduce.reduce.memory.mb=8192;