Hive sql exlain读取详解

参考:https://www.cnblogs.com/NightPxy/p/9230253.html

1.hive 语句执行顺序

from... where.... select...group by... having ... order by...

执行计划

Map Operator Tree:
TableScan
            alias: 表名   -- 相当于 from ...

Filter Operator
              predicate: where中的字段  -- 过滤where字段的条件

Select Operator
                expressions: select 中的字段 + 类型type    -- select
                outputColumnNames: 输出的字段  -- select
Group By Operator
                  aggregations: 聚合函数 sum(age)/count(age)等 
                  keys: group by 中的字段 + 类型type 
Reduce Output Operator
                 map端本地的reduce

Reduce Operator Tree:
Group By Operator
          aggregations: sum(VALUE._col0)   --reduce 聚合函数 对虚拟列
          keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string) 
          mode: mergepartial
              outputColumnNames: _col0, _col1, _col2, _col3 --为临时结果字段按规则起的临时字段名
Select Operator
            expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: bigint) --select
            outputColumnNames: _col0, _col1, _col2, _col3 -- 输出虚拟列(为临时结果字段按规则起的临时字段名)
            
File Output Operator
              compressed: false

2.HiveSQL的MR转换

2.1 不跑MapReduce的情况

    HiveSQL不是每种情况都会跑MapReduce的.基本查询,或者是不涉及计算(比如查询分区表)的查询,是不会启动MapReduce任务的

    explain select * from dept_et limit 1; 

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: 1
      Processor Tree:
        TableScan
          alias: dept_et
          Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: id (type: int), name (type: string), city (type: string)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
            Limit
              Number of rows: 1
              Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
              ListSink

2.2 join

explain select * from dept_et et join dept_mg mg on et.id= mg.id  

<!--构筑MR作业流 4=>3=>0(结束) -->
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  <!--第一步MR  表扫描mg(dept_mg mg) 自带一个基础过滤谓词(id is not null)
      这里可以看出 join的基准表是后表
      Map Reduce Local 本地化的MapReduce
        因为测试表的数据量非常小,所以Hive最终选择将数据拉取到本地直接操作,而不是去执行一个完整的分布式MapReduce-->
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        mg 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        mg 
          TableScan
            alias: mg
            Statistics: Num rows: 1 Data size: 79 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 79 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
id (type: int)
id (type: int)
  <!--第二步的MapReduce任务 表扫描 
        执行一个 Map Join 
        输出_col0, _col1, _col2, _col6, _col7, _col8(也就是语句中的*,全部共6个字段) 
        输出结果为 File Output 临时文件(compressed: false 不压缩)-->
  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: et
            Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
id (type: int)
id (type: int)
                outputColumnNames: _col0, _col1, _col2, _col6, _col7, _col8
                Statistics: Num rows: 1 Data size: 354 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string), _col6 (type: int), _col7 (type: string), _col8 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                  Statistics: Num rows: 1 Data size: 354 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 354 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

2.3 group by

    explain select city,sum(id) from dept_et group by city;

    执行计划如下:

 

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  <!--stage定义,一个stage对应一个MapReduce-->
  Stage: Stage-1
    <!--Map过程-->
    Map Reduce
      Map Operator Tree:
          TableScan //表扫描
            alias: dept_et
            Statistics: Num rows: 3 Data size: 322 Basic stats: COMPLETE Column stats: NONE //表dept_et的统计数据预估
            Select Operator //查询列裁剪,表示只需要 city (type: string), id (type: int) 两列
              expressions: city (type: string), id (type: int)
              outputColumnNames: city, id
              Statistics: Num rows: 3 Data size: 322 Basic stats: COMPLETE Column stats: NONE
              <!--map操作定义
                    是以city (type: string)取hash作为key,执行函数sum(id),结果为_col0, _col1(hash(city),sum(id))-->
              Group By Operator 
                aggregations: sum(id) //分组执行函数=>sum(id)
                keys: city (type: string) 
                mode: hash 
                outputColumnNames: _col0, _col1 
                Statistics: Num rows: 3 Data size: 322 Basic stats: COMPLETE Column stats: NONE
                <!--map端的输出-->
                Reduce Output Operator 
                  key expressions: _col0 (type: string) //Map端输出的Key是_col0(hash(city))
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Statistics: Num rows: 3 Data size: 322 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col1 (type: bigint) //Map端输出的Value是_col1(sum(id))
      <!--Reduce过程 合并多个Map的输出
            以_col0(也就是map输出的hash(city))为key 执行sum(VALUE._col0(也就是map输出的sum(id))),执行结果也是_col0, _col1(hash(city),sum(sum(id)))-->
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0
          keys: KEY._col0 (type: string)
          mode: mergepartial //partial(多个map的输出)merge(合并)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 107 Basic stats: COMPLETE Column stats: NONE
          <!--Reduce端的输出 输出为一个临时文件,不压缩-->
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 107 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

    2.4.1 distinct一个

      select city,count(distinct(name)) from dept_et group by city;

      只有一个distinct,将group字段和distinct字段一起组合为Map的输出Key,然后把group字段作为Reduce的Key,在Reduce阶段保存LastKey 

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      <!--Map端定义
            输入: 表扫描 dept_et 原值查询city,name
            执行过程: 以group列(city),distinct列(name)做为Key,执行表达式count(DISTINCT name)
            输出:_col0, _col1, _col2 (city,name,count(DISTINCT name))-->
      Map Operator Tree:
          TableScan
            alias: dept_et
            Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: city (type: string), name (type: string) //没有计算函数,直接是查询原值
              outputColumnNames: city, name
              Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: count(DISTINCT name)
                keys: city (type: string), name (type: string)
                mode: hash
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string), _col1 (type: string)
                  sort order: ++
                  Map-reduce partition columns: _col0 (type: string)
                  Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
      <!--Reduce端定义
            接收Map端的输出,再以_col0作为Key,再做一次聚合(对city.name做一次去重计数) 结果输出到临时文件-->
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(DISTINCT KEY._col1:0._col0)
          keys: KEY._col0 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 322 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

 

 

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值