Hive 优化(通用版)

本文介绍了Hive的优化方法,包括设置本地模式、并行计算、启用严格模式、Join优化、Map-Side聚合、小文件合并以及去重统计等策略。优化核心是将Hive SQL视为MapReduce程序,通过调整参数和使用技巧提高查询效率。
摘要由CSDN通过智能技术生成

hive优化

Hive 优化核心思想:Hive SQL 当做Mapreduce程序去优化

以下SQL不会转为Mapreduce来执行:        

        select仅查询本表字段 

        where仅对本表字段做条件过滤

Explain 显示执行计划:EXPLAIN [EXTENDED] query

hive> explain extended select * from student;
OK
Explain
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: student
          Statistics: Num rows: 1 Data size: 618 Basic stats: COMPLETE Column stats: NONE
          GatherStats: false
          Select Operator
            expressions: id (type: int), name (type: string), likes (type: array<string>), address (type: map<string,string>)
            outputColumnNames: _col0, _col1, _col2, _col3
            Statistics: Num rows: 1 Data size: 618 Basic stats: COMPLETE Column stats: NONE
            ListSink

Time taken: 0.231 seconds, Fetched: 18 row(s)

========================================================

hive> explain extended select count(*) from student;
OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: student
            Statistics: Num rows: 1 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE
            GatherStats: false
            Select Operator
              Statistics: Num rows: 1 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE
              Group By Operator
                aggregations: count()
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Reduce Output Operator
                  null sort order: 
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  tag: -1
                  value expressions: _col0 (type: bigint)
                  auto parallelism: false
      Path -> Alias:
        hdfs://mycluster/user/hive/warehouse/student [student]
      Path -> Partition:
        hdfs://mycluster/user/hive/warehouse/student 
          Partition
            base file name: student
            input format: org.apache.hadoop.mapred.TextInputFormat
            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
            properties:
              bucket_count -1
              colelction.delim -
              column.name.delimiter ,
              columns id,name,likes,address
              columns.comments 
              columns.types int:string:array<string>:map<string,string>
              field.delim ,
              file.inputformat org.apache.hadoop.mapred.TextInputFormat
              file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              location hdfs://mycluster/user/hive/warehouse/student
              mapkey.delim :
              name default.student
              numFiles 1
              numRows 0
              rawDataSize 0
              serialization.ddl struct student { i32 id, string name, list<string> likes, map<string,string> address}
              serialization.format ,
              serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              totalSize 618
              transient_lastDdlTime 1624695643
            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              properties:
                bucket_count -1
                colelction.delim -
                column.name.delimiter ,
                columns id,name,likes,address
                columns.comments 
                columns.types int:string:array<string>:map<string,string>
                field.delim ,
                file.inputformat org.apache.hadoop.mapred.TextInputFormat
                file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                location hdfs://mycluster/user/hive/warehouse/student
                mapkey.delim :
                name default.student
                numFiles 1
                numRows 0
                rawDataSize 0
                serialization.ddl struct student { i32 id, string name, list<string> likes, map<string,string> address}
                serialization.format ,
                serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                totalSize 618
                transient_lastDdlTime 1624695643
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.student
            name: default.student
      Truncated Path -> Alias:
        /student [student]
      Needs Tagging: false
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
          File Output Operator
            compressed: false
            GlobalTableId: 0
            directory: hdfs://mycluster/tmp/hive/root/e7e5657c-f366-4391-a117-3838e2f530ba/hive_2021-07-03_09-26-39_045_671258483012844310-1/-mr-10001/.hive-staging_hive_2021-07-03_09-26-39_045_671258483012844
310-1/-ext-10002            NumFilesPerFileSink: 1
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
            Stats Publishing Key Prefix: hdfs://mycluster/tmp/hive/root/e7e5657c-f366-4391-a117-3838e2f530ba/hive_2021-07-03_09-26-39_045_671258483012844310-1/-mr-10001/.hive-staging_hive_2021-07-03_09-26-39_0
45_671258483012844310-1/-ext-10002/            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                properties:
                  columns _col0
                  columns.types bigint
                  escape.delim \
                  hive.serialization.extend.additional.nesting.levels true
                  serialization.escape.crlf true
                  serialization.format 1
                  serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            TotalFiles: 1
            GatherStats: false
            MultiFileSpray: false

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.183 seconds, Fetched: 121 row(s)

 Hive抓取策略

Hive中对某些情况的查询不需要使用MapReduce计算 

set hive.fetch.task.conversion=none/more; 

1、hive的默认抓取策略是more,如果抓取策略设置为none则每次都需
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值