Hive大白话(●三●)

最新推荐文章于 2023-02-06 01:26:06 发布

Jodie大白话

最新推荐文章于 2023-02-06 01:26:06 发布

阅读量515

点赞数

分类专栏：大数据文章标签：大数据 hive

本文链接：https://blog.csdn.net/qq_41847894/article/details/126674238

版权

大数据专栏收录该内容

14 篇文章 0 订阅

订阅专栏

🧡EXPLAIN

🧡Fetch抓取

🧡本地模式

💟这里是CS大白话专场，让枯燥的学习变得有趣！

💟没有对象不要怕，我们new一个出来，每天对ta说不尽情话！

💟好记性不如烂键盘，自己总结不如收藏别人！

💌Hive优化是非常关键的，将分多篇进行总结~

🧡EXPLAIN

💌Hive提供了EXPLAIN命令来展示一个查询的执行计划，可以提前预计查询需要的时间，语法如下，括号里为可选参数：

EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query

🍠我们用explain展示一下上节用到的查询语句：

Explain
STAGE DEPENDENCIES: //各Stage之间的依赖性
  Stage-1 is a root stage             //Stage-1为根stage
  Stage-0 depends on stages: Stage-1  //Stage-0依赖Stage-1

STAGE PLANS:        //各Stage的执行计划
  Stage: Stage-1    //先执行Stage-1
    Map Reduce
      Map Operator Tree:  //Map阶段的执行计划树
          TableScan   //表扫描，加载表
            alias: business  //表名称
            Statistics: Num rows: 1 Data size: 2970 Basic stats: COMPLETE Column stats: NONE  //表统计信息（数据条数、数据大小等）
            Reduce Output Operator
              key expressions: name (type: string) //分组的字段
              sort order: +  //值为 + 正序排序，值为 - 倒序排序，值为空不排序
              Map-reduce partition columns: name (type: string) //partition by name
              Statistics: Num rows: 1 Data size: 2970 Basic stats: COMPLETE Column stats: NONE
              value expressions: id (type: string), cost (type: int) //查询的字段名称及类型
      Execution mode: vectorized
      Reduce Operator Tree:  //Reduce阶段的执行计划树
        Select Operator  
          expressions: VALUE._col0 (type: string), KEY.reducesinkkey0 (type: string), VALUE._col2 (type: int) //需要的字段名称及字段类型
          outputColumnNames: _col0, _col1, _col3 //map阶段输出的字段
          Statistics: Num rows: 1 Data size: 2970 Basic stats: COMPLETE Column stats: NONE
          PTF Operator
            Function definitions:
                Input definition
                  input alias: ptf_0
                  output shape: _col0: string, _col1: string, _col3: int
                  type: WINDOWING
                Windowing table definition
                  input alias: ptf_1
                  name: windowingtablefunction
                  order by: _col1 ASC NULLS FIRST
                  partition by: _col1
                  raw input shape:
                  window functions:
                      window function definition
                        alias: sum_window_0
                        arguments: _col3
                        name: sum
                        window function: GenericUDAFSumLong
                        window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
            Statistics: Num rows: 1 Data size: 2970 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), sum_window_0 (type: bigint)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 2970 Basic stats: COMPLETE Column stats: NONE
              File Output Operator //文件输出操作
                compressed: false //是否压缩
                Statistics: Num rows: 1 Data size: 2970 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat //输入文件格式化方式
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat //输出文件格式化方式
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe //序列化方式

  Stage: Stage-0    //再执行Stage-0
    Fetch Operator  //客户端获取数据操作
      limit: -1     //值为 -1 不限制条数
      Processor Tree:
        ListSink

💌不同的SQL语句的执行计划是不同的，可以详细参考：http://t.csdn.cn/anONF。本文为带窗口函数的SQL查询。

🧡Fetch抓取

💌通过设置hive-default.xml.template文件中hive.fetch.task.conversion属性可以减少MR操作：

none : 只要用到HDFS都要进行MR。

minimal : 在select *，partition分区，limit查询时不用MR。

more（默认）: 在minimal基础上添加TABLESAMPLE （时间戳）and 虚拟字段（别名）

🧡本地模式

💌对于数据量小、文件数少的情况，Hive可以通过本地模式在单台机器上处理所有的任务，减少网络传输，设置如下：

set hive.exec.mode.local.auto=true;  //开启本地mr
set hive.exec.mode.local.auto.inputbytes.max=50000000;  //设置local mr的最大输入数据量，当输入数据量小于这个值时采用local  mr的方式，默认为134217728，即128M
set hive.exec.mode.local.auto.input.files.max=10;  //设置local mr的最大输入文件个数，当输入文件个数小于这个值时采用local mr的方式，默认为4