Hive Explain 执行计划参数分析

本文通过一个实际的Hive SQL查询例子,详细解释了执行计划的各个阶段,包括Stage之间的依赖关系、执行顺序、MapReduce转换过程以及分组聚合操作。示例SQL涉及多表JOIN、窗口函数、子查询和多维分组,展示了Hive如何优化查询并转换为MapReduce任务进行执行。
摘要由CSDN通过智能技术生成

执行计划code

explain
--your sql code
select
	*
from
	xxx.xxxx as x
left join
	yyy.yyyy as y
on
	x.a=y.b
group by 
	c,d
having 
	e
order by 
	f
limit 
	1000 

举例

拿一个实际生产中的复杂sql为例,涉及多表join,开窗函数,子查询,多维分组聚合grouping sets的执行计划为例,可以看出:

  1. stage之间的依赖关系
  2. 各个stage的执行计划
  3. sql语句的执行顺序,from(TableScan)→join on(Reduce Output Operator)→where(Filter Operator)→select(Select Operator)→group by(Group By Operator)→having→order by→limit
  4. hive sql 转换为map reduce的过程,及其中间的key-value值
Explain
STAGE DEPENDENCIES: # 各个stage之间的依赖性
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1, Stage-3, Stage-5
  Stage-3 is a root stage
  Stage-4 is a root stage
  Stage-5 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-2

STAGE PLANS: # 各个stage的执行计划
  Stage: Stage-1 # 阶段一执行计划
    Map Reduce # Map Reduce中的内容即为抽象语法树AST
      Map Operator Tree: # MAP端的执行计划树
          TableScan # map端第一个操作肯定是加载表,所以就是 TableScan 表扫描操作
            alias: t_ads_indicator_wxvideoaccountevaluate_detailfeature_d_i # 表名称
            Statistics: Num rows: 1486973 Data size: 32713406 Basic stats: COMPLETE Column stats: NONE # 表统计信息,包含表中数据条数,数据大小等
            Filter Operator # 过滤操作
              predicate: ((operator is not null and (operator <> '')) and (operator <> 'sys')) (type: boolean) # 过滤条件
              Statistics: Num rows: 743487 Data size: 16356714 Basic stats: COMPLETE Column stats: NONE
              Select Operator # 选取操作
                # expressions:需要的字段名称及字段类型
                expressions: day (type: bigint), hour (type: bigint), half_hour (type: bigint), operator (type: string), scene_id (type: bigint), scene_layer (type: bigint), action_type (type: bigint), action_result (type: bigint), evaluate_id (type: string)
                outputColumnNames: day, hour, half_hour, operator, scene_id, scene_layer, action_type, action_result, evaluate_id # 输出的列名称
                Statistics: Num rows: 743487 Data size: 16356714 Basic stats: COMPLETE Column stats: NONE
                Group By Operator # 分组聚合操作
                  # aggregations: 显示聚合函数信息
                  aggregations: count(DISTINCT CASE WHEN (((action_type = 3) and ((action_result = 0) or (action_result = 1)))) THEN (evaluate_id) ELSE (null) END), count(DISTINCT CASE WHEN (((action_type = 4) and (action_result = 0))) THEN (evaluate_id) ELSE (null) END), count(DISTINCT CASE WHEN (((action_type = 3) and (action_result = 0))) THEN (evaluate_id) ELSE (null) END), count(DISTINCT CASE WHEN (((action_type = 3) and (action_result = 1))) THEN (evaluate_id) ELSE (null) END)
                  # keys: 分组的字段,如果没有分组,则没有此字段
                  keys: day (type: bigint), hour (type: bigint), half_hour (type: bigint), operator (type: string), scene_id (type: bigint), scene_layer (type: bigint), '0' (type: string), CASE WHEN (((action_type = 3) and ((action_result = 0) or (action_result = 1)))) THEN (evaluate_id) ELSE (null) END (type: string), CASE WHEN (((action_type = 4) and (action_result = 0))) THEN (evaluate_id) ELSE (null) END (type: string), CASE WHEN (((action_type = 3) and (action_result = 0))) THEN (evaluate_id) ELSE (null) END (type: string), CASE WHEN (((action_type = 3) and (action_result = 1))) THEN (evaluate_id) ELSE (null) END (type: string)
                  mode: hash # 聚合模式,值有 hash:随机聚合,就是hash partition;partial:局部聚合;final:最终聚合
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14 # 聚合之后输出列名
                  Statistics: Num rows: 2230461 Data size: 49070142 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator # 输出到reduce操作
                    # key expressions: 依据什么进行排序,即用于排序的列,即map阶段的输出字段(临时字段),也就是reduce的key
                    key expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: string), _col4 (type: bigint), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值