hive执行计划

Hql:

insert overwrite TABLE lpx SELECT t1.bar, t1.foo, t2.foo FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) ;
OK

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME pokes) t1) (TOK_TABREF (TOK_TABNAME invites) t2) (= (. (TOK_TABLE_OR_COL t1) bar) (. (TOK_TABLE_OR_COL t2) bar)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME lpx))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL t1) bar)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL t1) foo)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL t2) foo)))))


STAGE DEPENDENCIES:
  Stage-1 is a root stage   /根
  Stage-0 depends on stages: Stage-1 /0依赖1
  Stage-2 depends on stages: Stage-0 /2依赖0

STAGE PLANS:
  Stage: Stage-1
    Map Reduce  
// 这个阶段是一个 mapreduce 作业
      Alias -> Map Operator Tree:    //map 操作树,对应 map 阶段
        t1
          TableScan   // 扫描表获取数据     from加载表,描述中有行数和大小等
            alias: t1     // 表别名
            Reduce Output Operator  // 这里描述 map 的输出,也就是 reduce 的输入。比如 key partition sort 等信息 
              key expressions:   //t1 表输出到 reduce 阶段的 key 信息
                    expr: bar
                    type: string
              sort order: +  //
一个排序字段,这个排序字段是 key=bar ,多个排序字段多个 +
              Map-reduce partition columns:  //
partition 的信息,由此也可以看出 hive join 的时候会以 join on 后的列作为 partition 的列,以保证具有相同此列的值的行被分到同一个 reduce 中去
                    expr: bar
                    type: string
              tag: 0                         //
t1 表打标签
              value expressions:   //
t1 表输出到 reduce 阶段的 value 信息
                    expr: foo
                    type: int
                    expr: bar
                    type: string
        t2
         
TableScan
            alias: t2
           
Reduce Output Operator
              key expressions:
                    expr: bar
                    type: string
              sort order: +
              Map-reduce partition columns:
                    expr: bar
                    type: string
              tag: 1
              value expressions:
                    expr: foo
                    type: int
      Reduce Operator Tree: 
//reduce 操作树,相当于 reduce 阶段
       
Join Operator
          condition map:
               Inner Join 0 to 1
          condition expressions:
            0 {VALUE._col0} {VALUE._col1} //对应前面 t1.bar, t1.foo
            1 {VALUE._col0} // 对应前面t2.foo
          handleSkewJoin: false //是否处理倾斜join
          outputColumnNames: _col0, _col1, _col5
          Select Operator // 筛选列,描述中有列名、类型,输出类型、大小等。
            expressions:
                  expr: _col1
                  type: string
                  expr: _col0
                  type: int
                  expr: _col5
                  type: int
            outputColumnNames: _col0, _col1, _col2   // 为临时结果字段按规则起的临时字段名
            File Output Operator // 输出结果到临时文件中,描述介绍了压缩格式、输出文件格式。
              compressed: false
              GlobalTableId: 1
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: default.lpx

  Stage: Stage-0
    Move Operator // Stage-0简单把结果从临时目录,移动到表 lpx 相关的目录。
      tables:
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.lpx

  Stage: Stage-2
    Stats-Aggr Operator
从信息头:
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1
  Stage-2 depends on stages: Stage-0
从这里可以看出 Plan 计划的 Job 任务结构,整个任务会分为 3 Job 执行,
第一个 Job 将由 Stage-1 构成 ;
第二个 Job 处理由 Stage-0 构成, Stage-0 的处理必须依赖 Stage-1 阶段的结果 ;
第三个 Job 处理由 Stage-2 构成, Stage-2 的处理必须依赖 Stage-0 阶段的结果。

下面分别解释 Stage-1 Stage-0 ,执行 SQL 可以分成两步:
(1)
SELECT t1.bar, t1.foo, t2.foo FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar);
(2)
insert overwrite TABLE lpx;
    Stage: Stage-1
对应一次完整的 Map Reduce 任务,包括: Map Operator Tree Reduce Operator Tree 两步操作 ,Map Operator Tree 对应 Map 任务, Reduce Operator Tree 对应 Reduce 任务。
       
Map Operator Tree 阶段可以看出进行了两个并列的操作 t1 t2 ,分别 SELECT t1.bar, t1.foo FROM t1; SELECT t2.foo FROM t2; 而且两个 Map 任务分别产生了 Reduce 阶段的输入 [Reduce Output Operator]
    
Reduce Operator Tree 分析可以看到如下信息,条件连接 Map 的输出以及通过预定义的输出格式生成符合 default.lpx 的存储格式的数据存储到 HDFS 中。在我们创建 lpx
的时候,没有指定该表的存储格式,默认会以 Text 为存储格式,输入输出会以 TextInputFormat TextOutputFormat 进行读写:
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: default.lpx
input format
的值对应 org.apache.hadoop.mapred.TextInputFormat
这是因为在开始的 Map 阶段产生的临时输出文件是以 TextOutputFormat 格式保存的,自然 Reduce 的读取是由 TextInputFormat 格式处理读入数据。这些是由 Hadoop MapReduce 处理细节来控制,而 Hive 只需要指定处理格式即可。
    Serde
值为 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 类,这时这个对象的保存的值为 _col0, _col1, _col2 ,也就是我们预期要查询的 t1.bar, t1.foo, t2.foo ,这个值具体的应该为 _col0+ lpx 设置的列分割符 +_col1+ lpx 设置的列分割符 +_col2 outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 可以知道 output 的处理是使用该类来处理的。
    Stage-0
对应上面提到的第二步操作。这时 stage-1 产生的临时处理文件举例如 tmp, 需要经过 stage-0 阶段操作处理到 lpx 表中。 Move Operator 代表了这并不是一个 MapReduce 任务,只需要调用 MoveTask 的处理就行,在处理之前会去检查输入文件是否符合 lpx 表的存储格式。

select a.*, b.cust_uid
from ods_ad_bid_deliver_info b join mds_ad_algo_feed_monitor_data_table a
where a.dt<=20140101 and a.dt<=20140108 and key='deliver_id_bucket_id' and a.dt=b.dt and a.key_slice=b.deliver_id
执行计划:
抽象语法树:
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME ods_ad_bid_deliver_info) b) (TOK_TABREF (TOK_TABNAME mds_ad_algo_feed_monitor_data_table) a))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) cust_uid))) (TOK_WHERE (and (and (and (and (<= (. (TOK_TABLE_OR_COL a) dt) 20140101) (<= (. (TOK_TABLE_OR_COL a) dt) 20140108)) (= (TOK_TABLE_OR_COL key) 'deliver_id_bucket_id')) (= (. (TOK_TABLE_OR_COL a) dt) (. (TOK_TABLE_OR_COL b) dt))) (= (. (TOK_TABLE_OR_COL a) key_slice) (. (TOK_TABLE_OR_COL b) deliver_id))))))
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        a
          TableScan
            alias: a
            Filter Operator
              predicate:
                  expr: (key = 'deliver_id_bucket_id')  // key 指定值在 map 阶段过滤
                  type: boolean
              Reduce Output Operator
                sort order:
                tag: 1
                value expressions:  //select * 导致输出到 reduce 的数据是全部的列信息
                      expr: key
                      type: string
                      expr: key_slice
                      type: string
                      expr: billing_mode_slice
                      type: string
                      expr: bucket_id
                      type: string
                      expr: ctr
                      type: string
                      expr: ecpm
                      type: string
                      expr: auc
                      type: string
                      expr: pctr
                      type: string
                      expr: pctr_ctr
                      type: string
                      expr: total_pv
                      type: string
                      expr: total_click
                      type: string
                      expr: dt
                      type: string
        b
          TableScan
            alias: b
            Reduce Output Operator
              sort order:
              tag: 0
              value expressions:
                    expr: deliver_id
                    type: string
                    expr: cust_uid
                    type: string
                    expr: dt
                    type: string
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          condition expressions:
            0 {VALUE._col0} {VALUE._col6} {VALUE._col35}
            1 {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} {VALUE._col5} {VALUE._col6} {VALUE._col7} {VALUE._col8} {VALUE._col9} {VALUE._col10} {VALUE._col11}
          handleSkewJoin: false
          outputColumnNames: _col0, _col6, _col35, _col38, _col39, _col40, _col41, _col42, _col43, _col44, _col45, _col46, _col47, _col48, _col49
          Filter Operator
            predicate:
                expr: (((((_col49 <= 20140101) and (_col49 <= 20140108)) and (_col38 = 'deliver_id_bucket_id')) and (_col49 = _col35)) and (_col39 = _col0))
                type: boolean
            Select Operator
              expressions:
                    expr: _col38
                    type: string
                    expr: _col39
                    type: string
                    expr: _col40
                    type: string
                    expr: _col41
                    type: string
                    expr: _col42
                    type: string
                    expr: _col43
                    type: string
                    expr: _col44
                    type: string
                    expr: _col45
                    type: string
                    expr: _col46
                    type: string
                    expr: _col47
                    type: string
                    expr: _col48
                    type: string
                    expr: _col49
                    type: string
                    expr: _col6
                    type: string
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  Stage: Stage-0
    Fetch Operator
      limit: -1

修改后
select a.*, b.cust_uid
from ods_ad_bid_deliver_info b
join mds_ad_algo_feed_monitor_data_table a
on(a.dt<=20140101 and a.dt<=20140108 and a.dt=b.dt and a.key_slice=b.deliver_id and a.key='deliver_id_bucket_id')
执行计划:
抽象语法树:
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        a
           TableScan
            alias: a
            Filter Operator
              predicate:
                  expr: (key = 'deliver_id_bucket_id')
                  type: boolean
              Filter Operator
                predicate:
                    expr: (dt <= 20140101)  // 分区过滤条件在 map 端生效
                    type: boolean
                Filter Operator
                  predicate:
                      expr: (dt <= 20140108)  // 分区过滤条件在 map 端生效
                      type: boolean
                  Filter Operator
                    predicate:
                        expr: (key = 'deliver_id_bucket_id')
                        type: boolean
                    Reduce Output Operator
                      key expressions:
                            expr: dt
                            type: string
                            expr: key_slice
                            type: string
                      sort order: ++
                      Map-reduce partition columns:
                            expr: dt
                            type: string
                            expr: key_slice
                            type: string
                      tag: 1
                      value expressions:
                            expr: key
                            type: string
                            expr: key_slice
                            type: string
                            expr: billing_mode_slice
                            type: string
                            expr: bucket_id
                            type: string
                            expr: ctr
                            type: string
                            expr: ecpm
                            type: string
                            expr: auc
                            type: string
                            expr: pctr
                            type: string
                            expr: pctr_ctr
                            type: string
                            expr: total_pv
                            type: string
                            expr: total_click
                            type: string
                            expr: dt
                            type: string
        b
          TableScan
            alias: b
            Reduce Output Operator
              key expressions:
                    expr: dt
                    type: string
                    expr: deliver_id
                    type: string
              sort order: ++
              Map-reduce partition columns:
                    expr: dt
                    type: string
                    expr: deliver_id
                    type: string
              tag: 0
              value expressions:
                    expr: cust_uid
                    type: string
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          condition expressions:
            0 {VALUE._col6}
            1 {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} {VALUE._col5} {VALUE._col6} {VALUE._col7} {VALUE._col8} {VALUE._col9} {VALUE._col10} {VALUE._col11}
          handleSkewJoin: false
          outputColumnNames: _col6, _col38, _col39, _col40, _col41, _col42, _col43, _col44, _col45, _col46, _col47, _col48, _col49
          Select Operator
            expressions:
                  expr: _col38
                  type: string
                  expr: _col39
                  type: string
                  expr: _col40
                  type: string
                  expr: _col41
                  type: string
                  expr: _col42
                  type: string
                  expr: _col43
                 type: string
                  expr: _col44
                  type: string
                  expr: _col45
                  type: string
                  expr: _col46
                  type: string
                  expr: _col47
                  type: string
                  expr: _col48
                  type: string
                  expr: _col49
                  type: string
                  expr: _col6
                  type: string
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
            File Output Operator
              compressed: false
              GlobalTableId: 0
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  Stage: Stage-0
    Fetch Operator
      limit: -1


栗子:
select * from emp e
left join dept d on e.deptno=d.deptno
where d.dt='2018-06-04';

花费时间:Time taken: 44.401 seconds, Fetched: 17 row(s)

执行计划:
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3
STAGE PLANS:
  Stage: Stage-4
//1、将启用本地MR读取小表

 Map Reduce Local Work  /本地执行

      Alias -> Map Local Tables:
        d
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 168 Basic stats: PARTIAL Column stats: PARTIAL

//2、写入哈希表文件

HashTable Sink Operator 

              keys:

                0 deptno (type: string)
                1 deptno (type: string)
//3、上传至hdfs DC中 执行计划中没有
2018-06-07 11:04:40	Uploaded 1 File to: file:/tmp/dw/fd4d06b7-523f-41a0-a8ff-ee4b13995b7e/hive_2018-06-07_11-04-35_149_2329359575876874876-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (583 bytes)

//4、执行一个map作业,读取大表,并与缓存中的小表连接操作

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: e
            Statistics: Num rows: 1 Data size: 757 Basic stats: PARTIAL Column stats: PARTIAL
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 deptno (type: string)
                1 deptno (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col12, _col13, _col14, _col15
              Statistics: Num rows: 1 Data size: 832 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: (_col15 = '2018-06-04') (type: boolean)
                Statistics: Num rows: 1 Data size: 832 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: str
ing), _col12 (type: string), _col13 (type: string), _col14 (type: string), '2018-06-04' (type: string)                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
                  Statistics: Num rows: 1 Data size: 832 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 832 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

--修改后
select * from emp e
left join dept d on (e.deptno=d.deptno and  d.dt='2018-06-04');
花费时间:Time taken: 23.804 seconds, Fetched: 17 row(s)

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d
          TableScan
            alias: d
            filterExpr: (dt = '2018-06-04') (type: boolean)
            Statistics: Num rows: 1 Data size: 84 Basic stats: PARTIAL Column stats: PARTIAL
            HashTable Sink Operator
              keys:
                0 deptno (type: string)
                1 deptno (type: string)
  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: e
            Statistics: Num rows: 1 Data size: 757 Basic stats: PARTIAL Column stats: PARTIAL
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 deptno (type: string)
                1 deptno (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col12, _col13, _col14, _col15
              Statistics: Num rows: 1 Data size: 832 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: strin
g), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string)                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
                Statistics: Num rows: 1 Data size: 832 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 832 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


资料来源:

https://tech.meituan.com/hive-sql-to-mapreduce.html 美团技术博客

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain hive文档

秦老师


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值