我的hive版本是2.3.1
map join 默认开启
不管大小表放左边还是右边,自动将小表放入内存,然后在map端顺序扫描大表跟内存中的数据进行join
a 大表
user_id oid
0001,B
0001,A
0002,C
0003,D
0001,E
0001,F
0004,W
0001,Z
b 小表
user_id name
0001,王
0002,刘
0003,赵
0004,高
sql
select
a.user_id,
a.oid,
b.name
from a
join b
on a.user_id=b.user_id
默认开启
set hive.auto.convert.join=true;
执行计划 没有reduce
0: jdbc:hive2://wxt01:10000> explain select
0: jdbc:hive2://wxt01:10000> a.user_id,
0: jdbc:hive2://wxt01:10000> a.oid,
0: jdbc:hive2://wxt01:10000> b.name
0: jdbc:hive2://wxt01:10000> from a
0: jdbc:hive2://wxt01:10000> join b
0: jdbc:hive2://wxt01:10000> on a.user_id=b.user_id;
OK
+-------------------------------------------------------------------------------------------------------+--+
| Explain |
+-------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-4 is a root stage |
| Stage-3 depends on stages: Stage-4 |
| Stage-0 depends on stages: Stage-3 |
| |
| STAGE PLANS: |
| Stage: Stage-4 |
| Map Reduce Local Work |
| Alias -> Map Local Tables: |
| $hdt$_1:b |
| Fetch Operator |
| limit: -1 |
| Alias -> Map Local Operator Tree: |
| $hdt$_1:b |
| TableScan |
| alias: b |
| Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: user_id is not null (type: boolean) |
| Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: user_id (type: string), name (type: string) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE |
| HashTable Sink Operator |
| keys: |
| 0 _col0 (type: string) |
| 1 _col0 (type: string) |
| |
| Stage: Stage-3 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: a |
| Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: user_id is not null (type: boolean) |
| Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: user_id (type: string), oid (type: string) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE |
| Map Join Operator |
| condition map: |
| Inner Join 0 to 1 |
| keys: |
| 0 _col0 (type: string) |
| 1 _col0 (type: string) |
| outputColumnNames: _col0, _col1, _col3 |
| Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: _col0 (type: string), _col1 (type: string), _col3 (type: string) |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| Local Work: |
| Map Reduce Local Work |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+-------------------------------------------------------------------------------------------------------+--+
70 rows selected (0.241 seconds)
关闭map join
set hive.auto.convert.join=false;
执行计划 有reduce阶段在reduce阶段join,不是map join
0: jdbc:hive2://wxt01:10000> explain select
0: jdbc:hive2://wxt01:10000> a.user_id,
0: jdbc:hive2://wxt01:10000> a.oid,
0: jdbc:hive2://wxt01:10000> b.name
0: jdbc:hive2://wxt01:10000> from a
0: jdbc:hive2://wxt01:10000> join b
0: jdbc:hive2://wxt01:10000> on a.user_id=b.user_id;
OK
+---------------------------------------------------------------------------------------------------+--+
| Explain |
+---------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: a |
| Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: user_id is not null (type: boolean) |
| Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: user_id (type: string), oid (type: string) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: string) |
| sort order: + |
| Map-reduce partition columns: _col0 (type: string) |
| Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col1 (type: string) |
| TableScan |
| alias: b |
| Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: user_id is not null (type: boolean) |
| Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: user_id (type: string), name (type: string) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: string) |
| sort order: + |
| Map-reduce partition columns: _col0 (type: string) |
| Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col1 (type: string) |
| Reduce Operator Tree: |
| Join Operator |
| condition map: |
| Inner Join 0 to 1 |
| keys: |
| 0 _col0 (type: string) |
| 1 _col0 (type: string) |
| outputColumnNames: _col0, _col1, _col3 |
| Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: _col0 (type: string), _col1 (type: string), _col3 (type: string) |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+---------------------------------------------------------------------------------------------------+--+
67 rows selected (0.122 seconds)