1、hive显示执行计划语法
EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] hql
末尾hql是你要执行的hive语句 中间[]中为可选参数
2、执行计划示例
贴一个很简单的hql语句执行计划示例,用了extended子句,感觉是比较详细的一个输出了,如果不加extended,输出会显示mr过程以及stage的依赖关系,对于排查基本的语法错误来说其实已经足够了(首先申明我很菜,其实没太看明白,先把过程记下来 0_0)
> explain extended select ugc_id, count(distinct if(topic_id is null,null,topic_id)) as topic_num from data_warehouse.dwd_publish_ugc where ugc_id is not null and stat_date = '2021-07-01' and stat_hour = '12' group by ugc_id;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 1)
DagName: bigdata_20210805151741_94407807-3b69-4851-b303-b1eb07ec537e:7
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: dwd_publish_ugc
Statistics: Num rows: 3 Data size: 753 Basic stats: COMPLETE Column stats: NONE
GatherStats: false
Filter Operator
isSamplingPred: false
predicate: ugc_id is not null (type: boolean)
Statistics: Num rows: 3 Data size: 753 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count(DISTINCT if(topic_id is null, null, topic_id))
keys: ugc_id (type: string), if(topic_id is null, null, topic_id) (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 3 Data size: 753 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type