hive map join

我的hive版本是2.3.1
map join 默认开启

不管大小表放左边还是右边,自动将小表放入内存,然后在map端顺序扫描大表跟内存中的数据进行join

a 大表

user_id  oid
0001,B
0001,A
0002,C
0003,D
0001,E
0001,F
0004,W
0001,Z

b 小表

user_id name
0001,王
0002,刘
0003,赵
0004,高

sql

select
a.user_id,
a.oid,
b.name
from a
join b
on a.user_id=b.user_id

默认开启

set hive.auto.convert.join=true;

执行计划 没有reduce

0: jdbc:hive2://wxt01:10000> explain select
0: jdbc:hive2://wxt01:10000> a.user_id,
0: jdbc:hive2://wxt01:10000> a.oid,
0: jdbc:hive2://wxt01:10000> b.name
0: jdbc:hive2://wxt01:10000> from a
0: jdbc:hive2://wxt01:10000> join b
0: jdbc:hive2://wxt01:10000> on a.user_id=b.user_id;
OK
+-------------------------------------------------------------------------------------------------------+--+
|                                                Explain                                                |
+-------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                                   |
|   Stage-4 is a root stage                                                                             |
|   Stage-3 depends on stages: Stage-4                                                                  |
|   Stage-0 depends on stages: Stage-3                                                                  |
|                                                                                                       |
| STAGE PLANS:                                                                                          |
|   Stage: Stage-4                                                                                      |
|     Map Reduce Local Work                                                                             |
|       Alias -> Map Local Tables:                                                                      |
|         $hdt$_1:b                                                                                     |
|           Fetch Operator                                                                              |
|             limit: -1                                                                                 |
|       Alias -> Map Local Operator Tree:                                                               |
|         $hdt$_1:b                                                                                     |
|           TableScan                                                                                   |
|             alias: b                                                                                  |
|             Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE            |
|             Filter Operator                                                                           |
|               predicate: user_id is not null (type: boolean)                                          |
|               Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE          |
|               Select Operator                                                                         |
|                 expressions: user_id (type: string), name (type: string)                              |
|                 outputColumnNames: _col0, _col1                                                       |
|                 Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE        |
|                 HashTable Sink Operator                                                               |
|                   keys:                                                                               |
|                     0 _col0 (type: string)                                                            |
|                     1 _col0 (type: string)                                                            |
|                                                                                                       |
|   Stage: Stage-3                                                                                      |
|     Map Reduce                                                                                        |
|       Map Operator Tree:                                                                              |
|           TableScan                                                                                   |
|             alias: a                                                                                  |
|             Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE            |
|             Filter Operator                                                                           |
|               predicate: user_id is not null (type: boolean)                                          |
|               Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE          |
|               Select Operator                                                                         |
|                 expressions: user_id (type: string), oid (type: string)                               |
|                 outputColumnNames: _col0, _col1                                                       |
|                 Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE        |
|                 Map Join Operator                                                                     |
|                   condition map:                                                                      |
|                        Inner Join 0 to 1                                                              |
|                   keys:                                                                               |
|                     0 _col0 (type: string)                                                            |
|                     1 _col0 (type: string)                                                            |
|                   outputColumnNames: _col0, _col1, _col3                                              |
|                   Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE      |
|                   Select Operator                                                                     |
|                     expressions: _col0 (type: string), _col1 (type: string), _col3 (type: string)     |
|                     outputColumnNames: _col0, _col1, _col2                                            |
|                     Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE    |
|                     File Output Operator                                                              |
|                       compressed: false                                                               |
|                       Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE  |
|                       table:                                                                          |
|                           input format: org.apache.hadoop.mapred.SequenceFileInputFormat              |
|                           output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat    |
|                           serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   |
|       Local Work:                                                                                     |
|         Map Reduce Local Work                                                                         |
|                                                                                                       |
|   Stage: Stage-0                                                                                      |
|     Fetch Operator                                                                                    |
|       limit: -1                                                                                       |
|       Processor Tree:                                                                                 |
|         ListSink                                                                                      |
|                                                                                                       |
+-------------------------------------------------------------------------------------------------------+--+
70 rows selected (0.241 seconds)

关闭map join

set hive.auto.convert.join=false;

执行计划 有reduce阶段在reduce阶段join,不是map join

0: jdbc:hive2://wxt01:10000> explain select
0: jdbc:hive2://wxt01:10000> a.user_id,
0: jdbc:hive2://wxt01:10000> a.oid,
0: jdbc:hive2://wxt01:10000> b.name
0: jdbc:hive2://wxt01:10000> from a
0: jdbc:hive2://wxt01:10000> join b
0: jdbc:hive2://wxt01:10000> on a.user_id=b.user_id;
OK
+---------------------------------------------------------------------------------------------------+--+
|                                              Explain                                              |
+---------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                               |
|   Stage-1 is a root stage                                                                         |
|   Stage-0 depends on stages: Stage-1                                                              |
|                                                                                                   |
| STAGE PLANS:                                                                                      |
|   Stage: Stage-1                                                                                  |
|     Map Reduce                                                                                    |
|       Map Operator Tree:                                                                          |
|           TableScan                                                                               |
|             alias: a                                                                              |
|             Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE        |
|             Filter Operator                                                                       |
|               predicate: user_id is not null (type: boolean)                                      |
|               Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE      |
|               Select Operator                                                                     |
|                 expressions: user_id (type: string), oid (type: string)                           |
|                 outputColumnNames: _col0, _col1                                                   |
|                 Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE    |
|                 Reduce Output Operator                                                            |
|                   key expressions: _col0 (type: string)                                           |
|                   sort order: +                                                                   |
|                   Map-reduce partition columns: _col0 (type: string)                              |
|                   Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE  |
|                   value expressions: _col1 (type: string)                                         |
|           TableScan                                                                               |
|             alias: b                                                                              |
|             Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE        |
|             Filter Operator                                                                       |
|               predicate: user_id is not null (type: boolean)                                      |
|               Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE      |
|               Select Operator                                                                     |
|                 expressions: user_id (type: string), name (type: string)                          |
|                 outputColumnNames: _col0, _col1                                                   |
|                 Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE    |
|                 Reduce Output Operator                                                            |
|                   key expressions: _col0 (type: string)                                           |
|                   sort order: +                                                                   |
|                   Map-reduce partition columns: _col0 (type: string)                              |
|                   Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE  |
|                   value expressions: _col1 (type: string)                                         |
|       Reduce Operator Tree:                                                                       |
|         Join Operator                                                                             |
|           condition map:                                                                          |
|                Inner Join 0 to 1                                                                  |
|           keys:                                                                                   |
|             0 _col0 (type: string)                                                                |
|             1 _col0 (type: string)                                                                |
|           outputColumnNames: _col0, _col1, _col3                                                  |
|           Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE          |
|           Select Operator                                                                         |
|             expressions: _col0 (type: string), _col1 (type: string), _col3 (type: string)         |
|             outputColumnNames: _col0, _col1, _col2                                                |
|             Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE        |
|             File Output Operator                                                                  |
|               compressed: false                                                                   |
|               Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE      |
|               table:                                                                              |
|                   input format: org.apache.hadoop.mapred.SequenceFileInputFormat                  |
|                   output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat        |
|                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                       |
|                                                                                                   |
|   Stage: Stage-0                                                                                  |
|     Fetch Operator                                                                                |
|       limit: -1                                                                                   |
|       Processor Tree:                                                                             |
|         ListSink                                                                                  |
|                                                                                                   |
+---------------------------------------------------------------------------------------------------+--+
67 rows selected (0.122 seconds)

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值