- Hive是通过把sql转换成对应mapreduce程序,然后提交到Hadoop上执行,查看具体的执行计划可以通过执行explain sql知晓
- 一条sql会被转化成由多个阶段组成的步骤,每个步骤有执行顺序和依赖关系,可以称之为有向无环图(DAG:Directed Acyclic Graph)
- 这些步骤可能包含:元数据的操作,文件系统的操作,map/reduce计算等
- 语法格式:
EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION] query
- explain输出内容包括:
- 抽象语法树
- 执行计划不同阶段的依赖关系
- 各个阶段的描述
- extended输出更加详细的信息
- denpendency输出依赖的数据源
- authorization输出执行sql授权信息
- locks 输出锁情况
- vectorization相关
- Adds detail to the EXPLAIN output showing why Map and Reduce work is not vectorized.
- Syntax: EXPLAIN VECTORIZATION [ONLY] [SUMMARY|OPERATOR|EXPRESSION|DETAIL]
- ONLY option suppresses most non-vectorization elements.
- SUMMARY (default) shows vectorization information for the PLAN (is vectorization enabled) and a summary of Map and Reduce work.
- OPERATOR shows vectorization information for operators. E.g. Filter Vectorization. Includes all information of SUMMARY.
- EXPRESSION shows vectorization information for expressions. E.g. predicateExpression. Includes all information of SUMMARY and OPERATOR.
- DETAIL shows detail-level vectorization information. It includes all information of SUMMARY, OPERATOR, and EXPRESSION.
- 带上FORMATTED 关键子,可以json格式输出
- sort order: +表示升序 -表示降序
- 大概了解一下相关的执行情况
# explain默认
0: jdbc:hive2://> explain select * from sort_test sort by id desc limit 10;
+--------------------------------------------------------------------------------------------------+--+
| Explain |
+--------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-2 depends on stages: Stage-1 |
| Stage-0 depends on stages: Stage-2 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: sort_test |
| Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: id (type: int), name (type: string) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: int) |
| sort order: - |
| Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col1 (type: string) |
| Reduce Operator Tree: |
| Select Operator |
| expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE |
| Limit |
| Number of rows: 10 |
| Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe |
| |
| Stage: Stage-2 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| Reduce Output Operator |
| key expressions: _col0 (type: int) |
| sort order: - |
| Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col1 (type: string) |
| Reduce Operator Tree: |
| Select Operator |
| expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE |
| Limit |
| Number of rows: 10 |
| Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: 10 |
| Processor Tree: |
| ListSink |
| |
+--------------------------------------------------------------------------------------------------+--+
# authorization
0: jdbc:hive2://> explain formatted authorization select * from sort_test sort by id desc limit 10;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Explain |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| {"CURRENT_USER":"root","OPERATION":"SWITCHDATABASE","INPUTS":["badou@sort_test"],"OUTPUTS":["hdfs://master:9000/tmp/hive/root/fac1e10c-babb-4927-886e-411b3e9190fb/hive_2018-10-18_11-04-47_534_1155924552647075339-1/-mr-10000"]} |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
参考资料
【0】Hive wiki - LanguageManual Explain
【1】hive入门学习:explain执行计划的理解