探究一下hql的执行顺序
from … on … join … where … group by … having … select … distinct … order by … limit
之前有个疑惑:
explain
select sid,min(score) as ms
from sc2
where sid>10
group by sid
having ms>60 and sid>20;
order by ms
;
以上语句编译可以通过,为什么select 后面的别名,可以在having 中使用
后来看到这句话:HiveSQL基于MySQL存储的元数据信息,HAVING后可使用SELECT指定的别名;
执行计划:
hive (myhive2)> explain select 100,count(*) as cc from sc2 where sid>10 group by cid having cc>2;
执行计划如下:
OK
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan //这里是from
alias: sc2
Statistics: Num rows: 23 Data size: 190 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (sid > 10) (type: boolean) //这里是where
Statistics: Num rows: 7 Data size: 57 Basic stats: COMPLETE Column stats: NONE
Select Operator //之前还以为这里是最终的select,其实不是,但我也不知道为什么这里有个select
expressions: cid (type: int)
outputColumnNames: cid
Statistics: Num rows: 7 Data size: 57 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
keys: cid (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 7 Data size: 57 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 7 Data size: 57 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reduce Operator Tree:
Group By Operator //这里是groupby
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (_col1 > 2) (type: boolean) //这里是having
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Select Operator //这里是select
expressions: 100 (type: int), _col1 (type: bigint)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink