Hive 优化（通用版）

最新推荐文章于 2022-07-14 21:45:52 发布

A6-母婴小店-第6分店

最新推荐文章于 2022-07-14 21:45:52 发布

阅读量737

点赞数

分类专栏： HIVE 文章标签：大数据

本文链接：https://blog.csdn.net/chooser007/article/details/118435737

版权

本文介绍了Hive的优化方法，包括设置本地模式、并行计算、启用严格模式、Join优化、Map-Side聚合、小文件合并以及去重统计等策略。优化核心是将Hive SQL视为MapReduce程序，通过调整参数和使用技巧提高查询效率。

摘要由CSDN通过智能技术生成

hive优化

Hive 优化核心思想：把Hive SQL 当做Mapreduce程序去优化

以下SQL不会转为Mapreduce来执行：

select仅查询本表字段

where仅对本表字段做条件过滤

Explain 显示执行计划：EXPLAIN [EXTENDED] query

hive> explain extended select * from student;
OK
Explain
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: student
          Statistics: Num rows: 1 Data size: 618 Basic stats: COMPLETE Column stats: NONE
          GatherStats: false
          Select Operator
            expressions: id (type: int), name (type: string), likes (type: array<string>), address (type: map<string,string>)
            outputColumnNames: _col0, _col1, _col2, _col3
            Statistics: Num rows: 1 Data size: 618 Basic stats: COMPLETE Column stats: NONE
            ListSink

Time taken: 0.231 seconds, Fetched: 18 row(s)

========================================================

hive> explain extended select count(*) from student;
OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: student
            Statistics: Num rows: 1 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE
            GatherStats: false
            Select Operator
              Statistics: Num rows: 1 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE
              Group By Operator
                aggregations: count()
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Reduce Output Operator
                  null sort order: 
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  tag: -1
                  value expressions: _col0 (type: bigint)
                  auto parallelism: false
      Path -> Alias:
        hdfs://mycluster/user/hive/warehouse/student [student]
      Path -> Partition:
        hdfs://mycluster/user/hive/warehouse/student 
          Partition
            base file name: student
            input format: org.apache.hadoop.mapred.TextInputFormat
            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
            properties:
              bucket_count -1
              colelction.delim -
              column.name.delimiter ,
              columns id,name,likes,address
              columns.comments 
              columns.types int:string:array<string>:map<string,string>
              field.delim ,
              file.inputformat org.apache.hadoop.mapred.TextInputFormat
              file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              location hdfs://mycluster/user/hive/warehouse/student
              mapkey.delim :
              name default.student
              numFiles 1
              numRows 0
              rawDataSize 0
              serialization.ddl struct student { i32 id, string name, list<string> likes, map<string,string> address}
              serialization.format ,
              serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              totalSize 618
              transient_lastDdlTime 1624695643
            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              properties:
                bucket_count -1
                colelction.delim -
                column.name.delimiter ,
                columns id,name,likes,address
                columns.comments 
                columns.types int:string:array<string>:map<string,string>
                field.delim ,
                file.inputformat org.apache.hadoop.mapred.TextInputFormat
                file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                location hdfs://mycluster/user/hive/warehouse/student
                mapkey.delim :
                name default.student
                numFiles 1
                numRows 0
                rawDataSize 0
                serialization.ddl struct student { i32 id, string name, list<string> likes, map<string,string> address}
                serialization.format ,
                serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                totalSize 618
                transient_lastDdlTime 1624695643
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.student
            name: default.student
      Truncated Path -> Alias:
        /student [student]
      Needs Tagging: false
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
          File Output Operator
            compressed: false
            GlobalTableId: 0
            directory: hdfs://mycluster/tmp/hive/root/e7e5657c-f366-4391-a117-3838e2f530ba/hive_2021-07-03_09-26-39_045_671258483012844310-1/-mr-10001/.hive-staging_hive_2021-07-03_09-26-39_045_671258483012844
310-1/-ext-10002            NumFilesPerFileSink: 1
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
            Stats Publishing Key Prefix: hdfs://mycluster/tmp/hive/root/e7e5657c-f366-4391-a117-3838e2f530ba/hive_2021-07-03_09-26-39_045_671258483012844310-1/-mr-10001/.hive-staging_hive_2021-07-03_09-26-39_0
45_671258483012844310-1/-ext-10002/            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                properties:
                  columns _col0
                  columns.types bigint
                  escape.delim \
                  hive.serialization.extend.additional.nesting.levels true
                  serialization.escape.crlf true
                  serialization.format 1
                  serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            TotalFiles: 1
            GatherStats: false
            MultiFileSpray: false

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.183 seconds, Fetched: 121 row(s)

Hive抓取策略

Hive中对某些情况的查询不需要使用MapReduce计算

set hive.fetch.task.conversion=none/more;

1、hive的默认抓取策略是more，如果抓取策略设置为none则每次都需

最低0.47元/天解锁文章

A6-母婴小店-第6分店

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Hive 优化（通用版）

hive优化Hive 优化核心思想：把Hive SQL 当做Mapreduce程序去优化以下SQL不会转为Mapreduce来执行：select仅查询本表字段where仅对本表字段做条件过滤Explain 显示执行计划：EXPLAIN [EXTENDED] queryhive> explain extended select * from student;OKExplainSTAGE DEPENDENCIES: ...
复制链接

扫一扫