错误信息
执行Sql:
create table xxx.gqj_test_1 as
select applseq,
max(case when rule_id='2005302' then result end) as 6_month,
max(case when rule_id='2005304' then result end) as 12_month,
max(case when rule_id='2005306' then result end) as 18_month,
max(case when rule_id='2005308' then result end) as 24_month,
max(case when rule_id='2005310' then result end) as 60_month
from xxx.xxx
group by applseq;
执行的日志如下:
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 9
2020-07-28 10:58:14,413 Stage-1 map = 0%, reduce = 0%
2020-07-28 10:59:15,267 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 874.88 sec
2020-07-28 11:00:16,107 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 115.55 sec
2020-07-28 11:01:16,513 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 1388.63 sec
2020-07-28 11:02:16,915 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 1002.31 sec
2020-07-28 11:03:17,553 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 1212.87 sec
2020-07-28 11:04:17,975 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 369.32 sec
2020-07-28 11:05:18,281 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 1677.58 sec
2020-07-28 11:05:23,402 Stage-1 map = 100%, reduce = 100%
MapReduce Total cumulative CPU time: 27 minutes 57 seconds 580 msec
Ended Job = job_1589431406497_162391 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1589431406497_162391_m_000001 (and more) from job job_1589431406497_162391
Task with the most failures(4):
-----
Task ID:
task_1589431406497_162391_m_000001
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1589431406497_162391&tipid=task_1589431406497_162391_m_000001
-----
Diagnostic Messages for this Task:
Error: GC overhead limit exceeded
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 Reduce: 9 Cumulative CPU: 1677.58 sec HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0 FAIL
Total MapReduce CPU Time Spent: 27 minutes 57 seconds 580 msec
日志中的错误:
ve.ql.exec.MapOperator: MAP[0]: records read - 1000001
2020-07-28 10:07:12,441 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 2000001
2020-07-28 10:07:13,910 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 3000001
2020-07-28 10:07:21,638 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 4000001
2020-07-28 10:07:24,024 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 5000001
2020-07-28 10:07:28,496 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 6000001
2020-07-28 10:07:39,237 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 7000001
2020-07-28 10:08:45,822 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: Total records read - 7247916. abort - true
2020-07-28 10:08:45,822 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0, RECORDS_IN:7247916,
2020-07-28 10:08:45,822 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: RS[3]: Total records written - 0. abort - true
2020-07-28 10:08:45,822 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: RECORDS_OUT_INTERMEDIATE:0,
2020-07-28 10:08:46,311 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
at org.apache.hadoop.io.Text.decode(Text.java:389)
at org.apache.hadoop.io.Text.toString(Text.java:280)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:412)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$GenericUDAFMaxEvaluator.merge(GenericUDAFMax.java:125)
查看了SQL的执行计划:
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: yk_zjbt_7
Statistics: Num rows: 28163209 Data size: 529005690 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: applseq (type: string), rule_id (type: string), result (type: string)
outputColumnNames: applseq, rule_id, result
Statistics: Num rows: 28163209 Data size: 529005690 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: max(CASE WHEN ((rule_id = '2005302')) THEN (result) END), max(CASE WHEN ((rule_id = '2005304')) THEN (result) END), max(CASE WHEN ((rule_id = '2005306')) THEN (result) END), max(CASE WHEN ((rule_id = '2005308')) THEN (result) END), max(CASE WHEN ((rule_id = '2005310')) THEN (result) END)
keys: applseq (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
Statistics: Num rows: 28163209 Data size: 529005690 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 28163209 Data size: 529005690 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string)
文件的大小:
[dp@hdop-applb12 ~]$ hdfs dfs -ls /user/hive/warehouse/xxxx.db/yk_zjbt_7/
Found 1 items
-rwxrwxr-x 3 hive hive 557168899 2020-07-27 18:39 /user/hive/warehouse/xxxx.db/yk_zjbt_7/000000_0
[dp@hdop-applb12 ~]$ hdfs dfs -du -s -h /user/hive/warehouse/xxxx.db/yk_zjbt_7/
531.4 M 1.6 G /user/hive/warehouse/xxxx.db/yk_zjbt_7
比较特殊的地方是 Map阶段执行了aggregation
根据错误显示,错误是发生在map阶段的。错误的原因是内存过小导致的。
数据文件大约是531M,3个Map任务,每个大约是180M不到的数据,普通的Map任务不应该会导致内存不足。
解决方法
解决方法有3个:
- 增加内存
set mapreduce.map.java.opts=-Xmx1024m;
Time taken: 293.026 seconds - 增加map的数量
set mapred.max.split.size=5000000;(设置了set mapred.max.split.size=50000000; 增加到9个map,还是报错)
Time taken: 619.508 seconds - 禁止map的aggr
hive.map.aggr=false;(禁用在map中会做部分聚集操作,这样map阶段使用的内存降低,但效率会降低,如果上面的方法能解决问题,不建议使用这种方法)