hive-join的执行计划观察

 

        看下join的执行过程,还是看explain计划。

sql代码如下:

explain
select  t1.prov_id
        ,t2.deep
from    (
            select  prov_id
                    ,deep
            from    dim.dim_city
            where   prov_id = 110000
        ) t1
join    (
            select  prov_id
                    ,deep
            from    dim.dim_city
            where   deep = 1
        ) t2
on      t1.prov_id = t2.prov_id
;

执行计划

STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3

STAGE PLANS:
//对应第一段子查询
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
t1:dim_city
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
t1:dim_city
TableScan
alias: dim_city
filterExpr: (prov_id = 110000) (type: boolean)
Statistics: Num rows: 3775 Data size: 522191 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (prov_id = 110000) (type: boolean)
Statistics: Num rows: 1887 Data size: 261026 Basic stats: COMPLETE Column stats: NONE
Select Operator
Statistics: Num rows: 1887 Data size: 261026 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 110000 (type: int)
1 _col0 (type: int)

//对应第二段子查询
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: dim_city
filterExpr: ((deep = 1) and prov_id is not null) (type: boolean)
Statistics: Num rows: 3775 Data size: 522191 Basic stats: COMPLETE Column stats: NONE
Filter Operator
//这里有个细节,join时null值不能关联,在取数据时直接就过滤掉了
predicate: ((deep = 1) and prov_id is not null) (type: boolean)
Statistics: Num rows: 1887 Data size: 261026 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: prov_id (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1887 Data size: 261026 Basic stats: COMPLETE Column stats: NONE


Map Join Operator
condition map:
//声明连接算法是内连接
Inner Join 0 to 1
//指定两表连接的条件
keys:
//第一个表输出的数据集
0 110000 (type: int),110000是关联的条件
//第二个表输出的数据集,_col0是关联的条件
1 _col0 (type: int)
Statistics: Num rows: 2075 Data size: 287128 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: 110000 (type: int), 1 (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 2075 Data size: 287128 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 2075 Data size: 287128 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work

Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值