执行计划code
explain
--your sql code
select
*
from
xxx.xxxx as x
left join
yyy.yyyy as y
on
x.a=y.b
group by
c,d
having
e
order by
f
limit
1000
举例
拿一个实际生产中的复杂sql为例,涉及多表join,开窗函数,子查询,多维分组聚合grouping sets的执行计划为例,可以看出:
- stage之间的依赖关系
- 各个stage的执行计划
- sql语句的执行顺序,from(TableScan)→join on(Reduce Output Operator)→where(Filter Operator)→select(Select Operator)→group by(Group By Operator)→having→order by→limit
- hive sql 转换为map reduce的过程,及其中间的key-value值
Explain
STAGE DEPENDENCIES: # 各个stage之间的依赖性
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1, Stage-3, Stage-5
Stage-3 is a root stage
Stage-4 is a root stage
Stage-5 depends on stages: Stage-4
Stage-0 depends on stages: Stage-2
STAGE PLANS: # 各个stage的执行计划
Stage: Stage-1 # 阶段一执行计划
Map Reduce # Map Reduce中的内容即为抽象语法树AST
Map Operator Tree: # MAP端的执行计划树
TableScan # map端第一个操作肯定是加载表,所以就是 TableScan 表扫描操作
alias: t_ads_indicator_wxvideoaccountevaluate_detailfeature_d_i # 表名称
Statistics: Num rows: 1486973 Data size: 32713406 Basic stats: COMPLETE Column stats: NONE # 表统计信息,包含表中数据条数,数据大小等
Filter Operator # 过滤操作
predicate: ((operator is not null and (operator <> '')) and (operator <> 'sys')) (type: boolean) # 过滤条件
Statistics: Num rows: 743487 Data size: 16356714 Basic stats: COMPLETE Column stats: NONE
Select Operator # 选取操作
# expressions:需要的字段名称及字段类型
expressions: day (type: bigint), hour (type: bigint), half_hour (type: bigint), operator (type: string), scene_id (type: bigint), scene_layer (type: bigint), action_type (type: bigint), action_result (type: bigint), evaluate_id (type: string)
outputColumnNames: day, hour, half_hour, operator, scene_id, scene_layer, action_type, action_result, evaluate_id # 输出的列名称
Statistics: Num rows: 743487 Data size: 16356714 Basic stats: COMPLETE Column stats: NONE
Group By Operator # 分组聚合操作
# aggregations: 显示聚合函数信息
aggregations: count(DISTINCT CASE WHEN (((action_type = 3) and ((action_result = 0) or (action_result = 1)))) THEN (evaluate_id) ELSE (null) END), count(DISTINCT CASE WHEN (((action_type = 4) and (action_result = 0))) THEN (evaluate_id) ELSE (null) END), count(DISTINCT CASE WHEN (((action_type = 3) and (action_result = 0))) THEN (evaluate_id) ELSE (null) END), count(DISTINCT CASE WHEN (((action_type = 3) and (action_result = 1))) THEN (evaluate_id) ELSE (null) END)
# keys: 分组的字段,如果没有分组,则没有此字段
keys: day (type: bigint), hour (type: bigint), half_hour (type: bigint), operator (type: string), scene_id (type: bigint), scene_layer (type: bigint), '0' (type: string), CASE WHEN (((action_type = 3) and ((action_result = 0) or (action_result = 1)))) THEN (evaluate_id) ELSE (null) END (type: string), CASE WHEN (((action_type = 4) and (action_result = 0))) THEN (evaluate_id) ELSE (null) END (type: string), CASE WHEN (((action_type = 3) and (action_result = 0))) THEN (evaluate_id) ELSE (null) END (type: string), CASE WHEN (((action_type = 3) and (action_result = 1))) THEN (evaluate_id) ELSE (null) END (type: string)
mode: hash # 聚合模式,值有 hash:随机聚合,就是hash partition;partial:局部聚合;final:最终聚合
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14 # 聚合之后输出列名
Statistics: Num rows: 2230461 Data size: 49070142 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator # 输出到reduce操作
# key expressions: 依据什么进行排序,即用于排序的列,即map阶段的输出字段(临时字段),也就是reduce的key
key expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: string), _col4 (type: bigint), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (

本文通过一个实际的Hive SQL查询例子,详细解释了执行计划的各个阶段,包括Stage之间的依赖关系、执行顺序、MapReduce转换过程以及分组聚合操作。示例SQL涉及多表JOIN、窗口函数、子查询和多维分组,展示了Hive如何优化查询并转换为MapReduce任务进行执行。
最低0.47元/天 解锁文章
403

被折叠的 条评论
为什么被折叠?



