Hive执行顺序

最新推荐文章于 2023-12-12 07:15:00 发布

嘉平11

最新推荐文章于 2023-12-12 07:15:00 发布

阅读量720

点赞数 2

分类专栏： Hive

本文链接：https://blog.csdn.net/zgm12/article/details/104584005

版权

Hive 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

探究一下hql的执行顺序

from … on … join … where … group by … having … select … distinct … order by … limit

之前有个疑惑：

explain 
select sid,min(score) as ms
from sc2 
where sid>10
group by sid
having ms>60 and sid>20;
order by ms 
;

以上语句编译可以通过，为什么select 后面的别名，可以在having 中使用

后来看到这句话：HiveSQL基于MySQL存储的元数据信息，HAVING后可使用SELECT指定的别名；

执行计划：

hive (myhive2)>  explain select 100,count(*) as cc from sc2 where sid>10 group by cid having cc>2;
执行计划如下：


OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:  
          TableScan                                  //这里是from
            alias: sc2
            Statistics: Num rows: 23 Data size: 190 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (sid > 10) (type: boolean)     //这里是where
              Statistics: Num rows: 7 Data size: 57 Basic stats: COMPLETE Column stats: NONE
              Select Operator   //之前还以为这里是最终的select，其实不是，但我也不知道为什么这里有个select
                expressions: cid (type: int)
                outputColumnNames: cid
                Statistics: Num rows: 7 Data size: 57 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: count()
                  keys: cid (type: int)
                  mode: hash
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 7 Data size: 57 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: int)
                    sort order: +
                    Map-reduce partition columns: _col0 (type: int)
                    Statistics: Num rows: 7 Data size: 57 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col1 (type: bigint)
      Reduce Operator Tree:
        Group By Operator                                  //这里是groupby
          aggregations: count(VALUE._col0)
          keys: KEY._col0 (type: int)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE Column stats: NONE
          Filter Operator
            predicate: (_col1 > 2) (type: boolean)         //这里是having
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            Select Operator                                 //这里是select
              expressions: 100 (type: int), _col1 (type: bigint)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

嘉平11

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Hive执行顺序

探究一下hql的执行顺序from … on … join … where … group by … having … select … distinct … order by … limit之前有个疑惑：explain select sid,min(score) as msfrom sc2 where sid>10group by sidhaving ms>60...
复制链接

扫一扫