HIVE执行过程中java.lang.ArrayIndexOutOfBoundsException
表tmp_1
+-----------------------+-------------+
| tmp_1.id | tmp_1.cl | tmp_1.plat |
+-----------------------+-------------+
| 293 | IOS_i01 | IOS |
| 553 | IOS_i01 | IOS |
| 559 | AND_a01 | AND |
| 711 | AND_a01 | AND |
+---------------+---------------------+
简单通过plat进行id关联
=============
示例1
=============
select
a.plat,a.cnt,b.cnt
from
(select
plat,count(*) cnt
from tmp_1
group by plat)a
left join
(select split(cl,'_')[0] as plat,count(*) cnt
from tmp_1 group by split(cl,'_')[0])b
on a.plat=b.plat;
查询结果:
+-------------+---------+---------+--+
| a.plat | a.cnt | b.cnt |
+-------------+---------+---------+--+
| ALIPAY | 371 | 371 |
| AND | 783199 | 783199 |
| IOS | 659319 | 659319 |
| WECHAT | 2054 | 2054 |
+-------------+---------+---------+--+
执行正常,且查看执行过程
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Explain |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Plan not optimized by CBO. | |
| Vertex dependency in root stage |
| Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 4 (BROADCAST_EDGE) |
| Reducer 4 <- Map 3 (SIMPLE_EDGE) | |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Reducer 2 |
| File Output Operator [FS_6313316] |
| compressed:false |
| Statistics:Num rows: 794718 Data size: 27599734 Basic stats: COMPLETE Column stats: NONE |
| table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} |
| Select Operator [SEL_6313315] |
| outputColumnNames:["_col0","_col1","_col2"] |
| Statistics:Num rows: 794718 Data size: 27599734 Basic stats: COMPLETE Column stats: NONE |
| Map Join Operator [MAPJOIN_6313319] |
| | condition map:[{"":"Left Outer Join0 to 1"}] |
| | HybridGraceHashJoin:true |
| | keys:{"Reducer 2":"_col0 (type: string)","Reducer 4":"_col0 (type: string)"} |
| | outputColumnNames:["_col0","_col1","_col3"] |
| | Statistics:Num rows: 794718 Data size: 27599734 Basic stats: COMPLETE Column stats: NONE |
| |<-Reducer 4 [BROADCAST_EDGE] |
| | Reduce Output Operator [RS_6313313] |
| | key expressions:_col0 (type: string) |
| | Map-reduce partition columns:_col0 (type: string) |
| | sort order:+ |
| | Statistics:Num rows: 722471 Data size: 25090667 Basic stats: COMPLETE Column stats: NONE |
| | value expressions:_col1 (type: bigint) |
| | Group By Operator [GBY_6313310] |
| | | aggregations:["count(VALUE._col0)"] |
| | | keys:KEY._col0 (type: string) |
| | | outputColumnNames:["_col0","_col1"] |
| | | Statistics:Num rows: 722471 Data size: 25090667 Basic stats: COMPLETE Column stats: NONE |
| | |<-Map 3 [SIMPLE_EDGE] |
| | Reduce Output Operator [RS_6313309] |
| | key expressions:_col0 (type: string) |
| | Map-reduce partition columns:_col0 (type: string) |
| | sort order:+ |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| | value expressions:_col1 (type: bigint) |
| | Group By Operator [GBY_6313308] |
| | aggregations:["count()"] |
| | keys:split(cl, '_')[0] (type: string) |
| | outputColumnNames:["_col0","_col1"] |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| | Select Operator [SEL_6313307] |
| | outputColumnNames:["cl"] |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| | TableScan [TS_6313306] |
| | alias:tmp_1 |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| |<-Group By Operator [GBY_6313304] |
| | aggregations:["count(VALUE._col0)"] |
| | keys:KEY._col0 (type: string) |
| | outputColumnNames:["_col0","_col1"] |
| | Statistics:Num rows: 722471 Data size: 25090667 Basic stats: COMPLETE Column stats: NONE |
| |<-Map 1 [SIMPLE_EDGE] |
| Reduce Output Operator [RS_6313303] |
| key expressions:_col0 (type: string) |
| Map-reduce partition columns:_col0 (type: string) |
| sort order:+ |
| Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| value expressions:_col1 (type: bigint) |
| Group By Operator [GBY_6313302] |
| aggregations:["count()"] |
| keys:platform (type: string) |
| outputColumnNames:["_col0","_col1"] |
| Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| Select Operator [SEL_6313301] |
| outputColumnNames:["plat"] |
| Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| TableScan [TS_6313300] alias:tmp_1 Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
示例2 ,依然进行plat进行关联,只不过多加一个子句---会报错,越界的错误,很奇怪
=============
示例2
=============
select
a.plat,a.cnt,b.cnt,c.cnt
from
(select
plat,count(*) cnt
from tmp_db.tmp_1
group by plat)a
left join
(select split(cl,'_')[0] as plat,count(*) cnt
from tmp_db.tmp_1 group by split(cl,'_')[0])b
on a.plat=b.plat
left join
(select split(cl,'_')[0] as plat,count(*) cnt
from tmp_db.tmp_1 group by split(cl,'_')[0])c
on a.plat=c.plat
;
报错:
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"_col0":"ALIPAY"},"value":{"_col0":27}}
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:408)
... 26 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"_col0":"ALIPAY"},"value":{"_col0":27}}
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
... 27 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:714)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
... 28 more
示例2的执行计划
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Explain |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Plan not optimized by CBO due to missing statistics. Please check log for more details. |
| |
| Vertex dependency in root stage |
| Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE) |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Reducer 2 |
| File Output Operator [FS_6313426] |
| compressed:false |
| Statistics:Num rows: 1589436 Data size: 55199468 Basic stats: COMPLETE Column stats: NONE |
| table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} |
| Select Operator [SEL_6313425] |
| outputColumnNames:["_col0","_col1","_col2","_col3"] |
| Statistics:Num rows: 1589436 Data size: 55199468 Basic stats: COMPLETE Column stats: NONE |
| Merge Join Operator [MERGEJOIN_6313431] |
| | condition map:[{"":"Left Outer Join0 to 1"},{"":"Left Outer Join0 to 2"}] |
| | keys:{"0":"_col0 (type: string)","1":"_col0 (type: string)","2":"_col0 (type: string)"} |
| | outputColumnNames:["_col0","_col1","_col3","_col5"] |
| | Statistics:Num rows: 1589436 Data size: 55199468 Basic stats: COMPLETE Column stats: NONE |
| | |
| |<-Group By Operator [GBY_6313413] |
| | aggregations:["count(VALUE._col0)"] |
| | keys:KEY._col0 (type: string) |
| | outputColumnNames:["_col0","_col1"] |
| | Statistics:Num rows: 722471 Data size: 25090667 Basic stats: COMPLETE Column stats: NONE |
| | |
| |<-Group By Operator [GBY_6313419] |
| | aggregations:["count(VALUE._col0)"] |
| | keys:KEY._col0 (type: string) |
| | outputColumnNames:["_col0","_col1"] |
| | Statistics:Num rows: 722471 Data size: 25090667 Basic stats: COMPLETE Column stats: NONE |
| |<-Group By Operator [GBY_6313407] |
| | aggregations:["count(VALUE._col0)"] |
| | keys:KEY._col0 (type: string) |
| | outputColumnNames:["_col0","_col1"] |
| | Statistics:Num rows: 722471 Data size: 25090667 Basic stats: COMPLETE Column stats: NONE |
| |<-Map 1 [SIMPLE_EDGE] |
| | Reduce Output Operator [RS_6313406] |
| | key expressions:_col0 (type: string) |
| | Map-reduce partition columns:_col0 (type: string) |
| | sort order:+ |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| | value expressions:_col1 (type: bigint) |
| | Group By Operator [GBY_6313405] |
| | aggregations:["count()"] |
| | keys:plat (type: string) |
| | outputColumnNames:["_col0","_col1"] |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| | Select Operator [SEL_6313404] |
| | outputColumnNames:["plat"] |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| | TableScan [TS_6313403] |
| | alias:tmp_1 |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| |<-Map 3 [SIMPLE_EDGE] |
| | Reduce Output Operator [RS_6313412] |
| | key expressions:_col0 (type: string) |
| | Map-reduce partition columns:_col0 (type: string) |
| | sort order:+ |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| | value expressions:_col1 (type: bigint) |
| | Group By Operator [GBY_6313411] |
| | aggregations:["count()"] |
| | keys:split(cl, '_')[0] (type: string) |
| | outputColumnNames:["_col0","_col1"] |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| | Select Operator [SEL_6313410] |
| | outputColumnNames:["cl"] |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| | TableScan [TS_6313409] |
| | alias:tmp_1 |
| | Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| |<-Map 5 [SIMPLE_EDGE] |
| Reduce Output Operator [RS_6313418] |
| key expressions:_col0 (type: string) |
| Map-reduce partition columns:_col0 (type: string) |
| sort order:+ |
| Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| value expressions:_col1 (type: bigint) |
| Group By Operator [GBY_6313417] |
| aggregations:["count()"] |
| keys:split(cl, '_')[0] (type: string) |
| outputColumnNames:["_col0","_col1"] |
| Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| Select Operator [SEL_6313416] |
| outputColumnNames:["cl"] |
| Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| TableScan [TS_6313415] |
| alias:tmp_1 |
| Statistics:Num rows: 1444943 Data size: 50181369 Basic stats: COMPLETE Column stats: NONE |
| |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+