最近执行Hive任务时遇到如下错误:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {“key”:{“reducesinkkey0”:”00.26.37.E3.07.D3”,”reducesinkkey1”:”2014-07-07 12:51:46”},”value”:{“_col2”:515,”_col3”:”515999000056662_00.26.37.E3.07.D3”,”_col5”:”00.26.37.E3.07.D3”,”_col6”:”2014-07-07 12:51:46”},”alias”:0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:274)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {“key”:{“reducesinkkey0”:”00.26.3
经排查,在Hive中单独执行该语句时就会报错:
select colx,coly,col_par
from
(
select colx,coly,col_par,
row_number() over(partition by col_par order by create_time) rn
from test.table_name
) t
where rn =1;
总表的数据量很小,才几百万。
怀疑是数据的问题,于是
select * from
(
select col_par,count(1) rn
from test.table_name group by col_par
) t
where rn>1000;
发现col_par 为”空串,相同的记录居然有100多万,明显会造成严重的数据倾斜!!
通过执行的过程可以发现,错误是在reduce 阶段失败的,反复尝试3次均以失败告终。
而这个语句生成的reduce数只有1个。
解决办法:
1.当然由于业务的关系,正好为空串的数据是可以排除的,于是
select colx,coly,col_par
from
(
select colx,coly,col_par,
row_number() over(partition by col_par order by create_time) rn
from test.table_name
where col_par<>”
) t
where rn =1;
这样既可以解决数据倾斜的问题,又可以保证reduce 洗牌阶段不会失败。
2.设置参数,增加reduce数