hive map join

最新推荐文章于 2024-05-12 09:00:00 发布

小涛手记

最新推荐文章于 2024-05-12 09:00:00 发布

阅读量254

点赞数 2

分类专栏： hive 文章标签： hive

本文链接：https://blog.csdn.net/weixin_47699191/article/details/119207666

版权

hive 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

我的hive版本是2.3.1
map join 默认开启

不管大小表放左边还是右边,自动将小表放入内存,然后在map端顺序扫描大表跟内存中的数据进行join

a 大表

user_id  oid
0001,B
0001,A
0002,C
0003,D
0001,E
0001,F
0004,W
0001,Z

b 小表

user_id name
0001,王
0002,刘
0003,赵
0004,高

sql

select
a.user_id,
a.oid,
b.name
from a
join b
on a.user_id=b.user_id

默认开启

set hive.auto.convert.join=true;

执行计划没有reduce

0: jdbc:hive2://wxt01:10000> explain select
0: jdbc:hive2://wxt01:10000> a.user_id,
0: jdbc:hive2://wxt01:10000> a.oid,
0: jdbc:hive2://wxt01:10000> b.name
0: jdbc:hive2://wxt01:10000> from a
0: jdbc:hive2://wxt01:10000> join b
0: jdbc:hive2://wxt01:10000> on a.user_id=b.user_id;
OK
+-------------------------------------------------------------------------------------------------------+--+
|                                                Explain                                                |
+-------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                                   |
|   Stage-4 is a root stage                                                                             |
|   Stage-3 depends on stages: Stage-4                                                                  |
|   Stage-0 depends on stages: Stage-3                                                                  |
|                                                                                                       |
| STAGE PLANS:                                                                                          |
|   Stage: Stage-4                                                                                      |
|     Map Reduce Local Work                                                                             |
|       Alias -> Map Local Tables:                                                                      |
|         $hdt$_1:b                                                                                     |
|           Fetch Operator                                                                              |
|             limit: -1                                                                                 |
|       Alias -> Map Local Operator Tree:                                                               |
|         $hdt$_1:b                                                                                     |
|           TableScan                                                                                   |
|             alias: b                                                                                  |
|             Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE            |
|             Filter Operator                                                                           |
|               predicate: user_id is not null (type: boolean)                                          |
|               Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE          |
|               Select Operator                                                                         |
|                 expressions: user_id (type: string), name (type: string)                              |
|                 outputColumnNames: _col0, _col1                                                       |
|                 Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE        |
|                 HashTable Sink Operator                                                               |
|                   keys:                                                                               |
|                     0 _col0 (type: string)                                                            |
|                     1 _col0 (type: string)                                                            |
|                                                                                                       |
|   Stage: Stage-3                                                                                      |
|     Map Reduce                                                                                        |
|       Map Operator Tree:                                                                              |
|           TableScan                                                                                   |
|             alias: a                                                                                  |
|             Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE            |
|             Filter Operator                                                                           |
|               predicate: user_id is not null (type: boolean)                                          |
|               Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE          |
|               Select Operator                                                                         |
|                 expressions: user_id (type: string), oid (type: string)                               |
|                 outputColumnNames: _col0, _col1                                                       |
|                 Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE        |
|                 Map Join Operator                                                                     |
|                   condition map:                                                                      |
|                        Inner Join 0 to 1                                                              |
|                   keys:                                                                               |
|                     0 _col0 (type: string)                                                            |
|                     1 _col0 (type: string)                                                            |
|                   outputColumnNames: _col0, _col1, _col3                                              |
|                   Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE      |
|                   Select Operator                                                                     |
|                     expressions: _col0 (type: string), _col1 (type: string), _col3 (type: string)     |
|                     outputColumnNames: _col0, _col1, _col2                                            |
|                     Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE    |
|                     File Output Operator                                                              |
|                       compressed: false                                                               |
|                       Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE  |
|                       table:                                                                          |
|                           input format: org.apache.hadoop.mapred.SequenceFileInputFormat              |
|                           output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat    |
|                           serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   |
|       Local Work:                                                                                     |
|         Map Reduce Local Work                                                                         |
|                                                                                                       |
|   Stage: Stage-0                                                                                      |
|     Fetch Operator                                                                                    |
|       limit: -1                                                                                       |
|       Processor Tree:                                                                                 |
|         ListSink                                                                                      |
|                                                                                                       |
+-------------------------------------------------------------------------------------------------------+--+
70 rows selected (0.241 seconds)

关闭map join

set hive.auto.convert.join=false;

执行计划有reduce阶段在reduce阶段join,不是map join

0: jdbc:hive2://wxt01:10000> explain select
0: jdbc:hive2://wxt01:10000> a.user_id,
0: jdbc:hive2://wxt01:10000> a.oid,
0: jdbc:hive2://wxt01:10000> b.name
0: jdbc:hive2://wxt01:10000> from a
0: jdbc:hive2://wxt01:10000> join b
0: jdbc:hive2://wxt01:10000> on a.user_id=b.user_id;
OK
+---------------------------------------------------------------------------------------------------+--+
|                                              Explain                                              |
+---------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                               |
|   Stage-1 is a root stage                                                                         |
|   Stage-0 depends on stages: Stage-1                                                              |
|                                                                                                   |
| STAGE PLANS:                                                                                      |
|   Stage: Stage-1                                                                                  |
|     Map Reduce                                                                                    |
|       Map Operator Tree:                                                                          |
|           TableScan                                                                               |
|             alias: a                                                                              |
|             Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE        |
|             Filter Operator                                                                       |
|               predicate: user_id is not null (type: boolean)                                      |
|               Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE      |
|               Select Operator                                                                     |
|                 expressions: user_id (type: string), oid (type: string)                           |
|                 outputColumnNames: _col0, _col1                                                   |
|                 Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE    |
|                 Reduce Output Operator                                                            |
|                   key expressions: _col0 (type: string)                                           |
|                   sort order: +                                                                   |
|                   Map-reduce partition columns: _col0 (type: string)                              |
|                   Statistics: Num rows: 1 Data size: 56 Basic stats: COMPLETE Column stats: NONE  |
|                   value expressions: _col1 (type: string)                                         |
|           TableScan                                                                               |
|             alias: b                                                                              |
|             Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE        |
|             Filter Operator                                                                       |
|               predicate: user_id is not null (type: boolean)                                      |
|               Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE      |
|               Select Operator                                                                     |
|                 expressions: user_id (type: string), name (type: string)                          |
|                 outputColumnNames: _col0, _col1                                                   |
|                 Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE    |
|                 Reduce Output Operator                                                            |
|                   key expressions: _col0 (type: string)                                           |
|                   sort order: +                                                                   |
|                   Map-reduce partition columns: _col0 (type: string)                              |
|                   Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE  |
|                   value expressions: _col1 (type: string)                                         |
|       Reduce Operator Tree:                                                                       |
|         Join Operator                                                                             |
|           condition map:                                                                          |
|                Inner Join 0 to 1                                                                  |
|           keys:                                                                                   |
|             0 _col0 (type: string)                                                                |
|             1 _col0 (type: string)                                                                |
|           outputColumnNames: _col0, _col1, _col3                                                  |
|           Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE          |
|           Select Operator                                                                         |
|             expressions: _col0 (type: string), _col1 (type: string), _col3 (type: string)         |
|             outputColumnNames: _col0, _col1, _col2                                                |
|             Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE        |
|             File Output Operator                                                                  |
|               compressed: false                                                                   |
|               Statistics: Num rows: 1 Data size: 61 Basic stats: COMPLETE Column stats: NONE      |
|               table:                                                                              |
|                   input format: org.apache.hadoop.mapred.SequenceFileInputFormat                  |
|                   output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat        |
|                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                       |
|                                                                                                   |
|   Stage: Stage-0                                                                                  |
|     Fetch Operator                                                                                |
|       limit: -1                                                                                   |
|       Processor Tree:                                                                             |
|         ListSink                                                                                  |
|                                                                                                   |
+---------------------------------------------------------------------------------------------------+--+
67 rows selected (0.122 seconds)

小涛手记

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive map join

hive map joinmap join 默认开启不管大小表放左边还是右边,自动将小表放入内存,然后在map端顺序扫描大表跟内存中的数据进行joina 大表user_id oid0001,B0001,A0002,C0003,D0001,E0001,F0004,W0001,Zb 小表user_id name0001,王0002,刘0003,赵0004,高sqlselecta.user_id,a.oid,b.namefrom ajoin bon a.u
复制链接

扫一扫