Hive explain详解

最新推荐文章于 2024-07-01 13:16:25 发布

huimingBall

最新推荐文章于 2024-07-01 13:16:25 发布

阅读量6.4k

点赞数

分类专栏：大数据文章标签： Hive explain 优化

本文链接：https://blog.csdn.net/fover717/article/details/69213095

版权

大数据专栏收录该内容

21 篇文章 0 订阅

订阅专栏

使用explain关键字，了解Hive的工作原理

SQL：

select count(1) from dw.fact_ord_arranged where dt = '20160101'

Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree: --------------- Map阶段
          TableScan
            alias: fact_ord_arranged --------------- 扫描的表
            Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
            Select Operator
              Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
              Group By Operator
                aggregations: count(1) --------------- 聚合函数
                mode: hash
                outputColumnNames: _col0 --------------- 临时字段
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  value expressions: _col0 (type: bigint)
      Reduce Operator Tree: --------------- Reduce阶段
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
          Select Operator
            expressions: _col0 (type: bigint)
            outputColumnNames: _col0
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
            File Output Operator
              compressed: false
              Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat --------------- 输出文件格式
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1                 --------------- job没有limit，所有没有操作

查询过程：

CliDriver update main thread name to da9ea076-e1ce-4384-bdb2-e62af5482003
17/04/04 21:33:11 INFO CliDriver: CliDriver update main thread name to da9ea076-e1ce-4384-bdb2-e62af5482003

Logging initialized using configuration in file:/opt/my/versions/hive_components/all_conf/querier_cli_0.13_write/conf/hive-log4j.properties
OK
Time taken: 0.459 seconds
OK
Time taken: 0.01 seconds
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1489485669600_6760508, Tracking URL = http://rz-data-hdp-rm01.rz.my.com:8088/proxy/application_1489485669600_6760508/
Kill Command = /opt/my/hadoop/bin/hadoop job  -kill job_1489485669600_6760508
Hadoop job information for Stage-1: number of mappers: 18; number of reducers: 1
2017-04-04 21:33:32,207 Stage-1 map = 0%,  reduce = 0%
2017-04-04 21:33:40,441 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 5.28 sec
2017-04-04 21:33:41,470 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 26.97 sec
2017-04-04 21:33:42,498 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 43.8 sec
2017-04-04 21:33:43,526 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 49.44 sec
2017-04-04 21:33:44,556 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 55.82 sec
2017-04-04 21:33:45,584 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 62.08 sec
2017-04-04 21:33:46,611 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 65.99 sec
2017-04-04 21:33:47,639 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 70.29 sec
2017-04-04 21:33:55,852 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 72.34 sec
MapReduce Total cumulative CPU time: 1 minutes 12 seconds 340 msec
Ended Job = job_1489485669600_6760508
Copying data to local directory /opt/my/data/talos/raw_data/hive_3e456a52193b11e79f0ba4dcbe04f8c6
Copying data to local directory /opt/my/data/talos/raw_data/hive_3e456a52193b11e79f0ba4dcbe04f8c6
MapReduce Jobs Launched: 
Job 0: Map: 18  Reduce: 1   Cumulative CPU: 72.34 sec   HDFS Read: 1381720966 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 12 seconds 340 msec
OK
Time taken: 42.318 seconds

SQL:

select dt, count(1) as num from dw.fact_ord_arranged where dt = '20160101' group by dt limit 10

Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: fact_ord_arranged
            Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
            Select Operator
              expressions: dt (type: string)
              outputColumnNames: dt
              Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
              Group By Operator
                aggregations: count(1)
                keys: dt (type: string)
                mode: hash
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
                  value expressions: _col1 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          keys: KEY._col0 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
          Select Operator
            expressions: _col0 (type: string), _col1 (type: bigint)
            outputColumnNames: _col0, _col1
            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
            Limit
              Number of rows: 10
              Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
              File Output Operator
                compressed: false
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: 10

查询过程：

CliDriver update main thread name to 81e0f131-a86c-4424-97b2-8e63264d5964
17/04/04 21:33:47 INFO CliDriver: CliDriver update main thread name to 81e0f131-a86c-4424-97b2-8e63264d5964

Logging initialized using configuration in file:/opt/my/versions/hive_components/all_conf/querier_cli_0.13_write/conf/hive-log4j.properties
OK
Time taken: 1.031 seconds
OK
Time taken: 0.014 seconds
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1489485669600_6760579, Tracking URL = http://rz-data-hdp-rm01.rz.my.com:8088/proxy/application_1489485669600_6760579/
Kill Command = /opt/my/hadoop/bin/hadoop job  -kill job_1489485669600_6760579
Hadoop job information for Stage-1: number of mappers: 18; number of reducers: 2
2017-04-04 21:34:07,571 Stage-1 map = 0%,  reduce = 0%
2017-04-04 21:34:14,962 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 1.83 sec
2017-04-04 21:34:16,012 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 12.85 sec
2017-04-04 21:34:17,063 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 47.82 sec
2017-04-04 21:34:19,162 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 57.43 sec
2017-04-04 21:34:20,213 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 61.19 sec
2017-04-04 21:34:25,463 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 66.16 sec
2017-04-04 21:34:26,513 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 73.7 sec
2017-04-04 21:34:35,946 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 77.91 sec
MapReduce Total cumulative CPU time: 1 minutes 17 seconds 910 msec
Ended Job = job_1489485669600_6760579
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1489485669600_6760647, Tracking URL = http://rz-data-hdp-rm01.rz.my.com:8088/proxy/application_1489485669600_6760647/
Kill Command = /opt/my/hadoop/bin/hadoop job  -kill job_1489485669600_6760647
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-04-04 21:34:46,186 Stage-2 map = 0%,  reduce = 0%
2017-04-04 21:34:56,611 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.42 sec
2017-04-04 21:35:05,949 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 3.82 sec
MapReduce Total cumulative CPU time: 3 seconds 820 msec
Ended Job = job_1489485669600_6760647
Copying data to local directory /opt/my/data/talos/raw_data/hive_5344c542193b11e783fca4dcbe04f8c6
Copying data to local directory /opt/my/data/talos/raw_data/hive_5344c542193b11e783fca4dcbe04f8c6
MapReduce Jobs Launched: 
Job 0: Map: 18  Reduce: 2   Cumulative CPU: 77.91 sec   HDFS Read: 1381720966 HDFS Write: 222 SUCCESS
Job 1: Map: 1  Reduce: 1   Cumulative CPU: 3.82 sec   HDFS Read: 915 HDFS Write: 17 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 21 seconds 730 msec
OK
Time taken: 71.645 seconds