hive的join 有一种优化的方式:map join
但是,使用这种优化的时候要小心一点,先说一下优化配置的参数:
set hive.optimize.correlation=true
set hive.auto.convert.join=true
当运行一个比较大的join时候,出现了下面的问题:
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:216)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:197)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:234)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:652)
... 9 more
网上查了一圈,貌似这还是个bug:
https://issues.apache.org/jira/i#browse/HIVE-4502
将 hive.auto.convert.join 设置成false,重新运行,问题就不出现了。
有一篇文件可以看一下:
http://www.gemini5201314.net/hadoop/hadoop-%E4%B8%AD%E7%9A%84%E4%B8%A4%E8%A1%A8join.html
hive 0.11 版的bug 也要注意一下。