可以用explain查看执行计划
比如
explain select deptno `dept`,
year(hiredate) `year`,
sum(sal)
from tb_emp
group by deptno, year(hiredate);
1 可以先看有几个stage
比如这个例子有2个
+------------------------------------+
|Explain |
+------------------------------------+
|STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1|
+------------------------------------+
stage 0 依赖于stage1,就是说先执行stage1,再执行stage 0
1查看stage1的map阶段
可以看出map阶段主要做了
- 表的扫描
- 表数据量的统计
- 检索的字段 就是expressions那块
- aggregations
+-------------------------------------------------------------------------------------------------+
|Explain |
+-------------------------------------------------------------------------------------------------+
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: tb_emp |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: deptno (type: int), year(hiredate) (type: int), sal (type: float) |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: sum(_col2) |
| keys: _col0 (type: int), _col1 (type: int) |
| mode: hash |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: int), _col1 (type: int) |
| sort order: ++ |
| Map-reduce partition columns: _col0 (type: int), _col1 (type: int) |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE|
| value expressions: _col2 (type: double) |
+-------------------------------------------------------------------------------------------------+
3看reduce阶段
- 确定输入与输出格式
+-------------------------------------------------------------------------------------------+
|Explain |
+-------------------------------------------------------------------------------------------+
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: sum(VALUE._col0) |
| keys: KEY._col0 (type: int), KEY._col1 (type: int) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE|
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
+-------------------------------------------------------------------------------------------+
参考
Hive实验5:查看Hql执行计划及关键步骤说明_heroicpoem的专栏-CSDN博客_hive查看执行计划
LanguageManual Explain - Apache Hive - Apache Software Foundation