hive的overwrite

最新推荐文章于 2023-12-06 18:18:11 发布

randee_luo

最新推荐文章于 2023-12-06 18:18:11 发布

阅读量3.7k

点赞数 1

文章标签： hive overwrite

本文链接：https://blog.csdn.net/jxlhc09/article/details/17996369

版权

这几天有个朋友问我 hive的overwrite是怎么执行重写，假如重写执行到一半报错，会不会导致丢失数据呢？

一开始没有反应过来，后来想想，其实这个可以从 explain 上看到的。

hive (temp)> explain insert overwrite table ods.ods_memberext_dd select * from temp.lhc_memberext_20130926;
OK
Explain
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME temp lhc_memberext_20130926))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME ods ods_memberext_dd))) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
  Stage-2 depends on stages: Stage-0
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        lhc_memberext_20130926 
          TableScan
            alias: lhc_memberext_20130926
            Select Operator
              expressions:
                    expr: id
                    type: int
                    expr: bloodtype
                    type: string
                    expr: regdate
                    type: string
                    expr: termtype
                    type: string
                    expr: channel
                    type: int
                    expr: ip
                    type: string
                    expr: clientid
                    type: string
                    expr: imei
                    type: string
                    expr: version
                    type: string
                    expr: platform
                    type: string
                    expr: model
                    type: string
                    expr: systemname
                    type: string
                    expr: systemversion
                    type: string
                    expr: channelid
                    type: int
                    expr: resolution
                    type: string
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14
              File Output Operator
                compressed: false
                GlobalTableId: 1
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: ods.ods_memberext_dd

  Stage: Stage-7
    Conditional Operator

  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10000

  Stage: Stage-0
    Move Operator
      tables:
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: ods.ods_memberext_dd

  Stage: Stage-2
    Stats-Aggr Operator

  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
        hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10002 
            File Output Operator
              compressed: false
              GlobalTableId: 0
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: ods.ods_memberext_dd

  Stage: Stage-5
    Map Reduce
      Alias -> Map Operator Tree:
        hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10002 
            File Output Operator
              compressed: false
              GlobalTableId: 0
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: ods.ods_memberext_dd

  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10000

对于中间数据文件为什么放到/tmp/hive-hadoop 这个是可以在 hive-site.xml 文件中配置参数 hive.exec.scratchdir 的。

hive.exec.scratchdir 
HDFS路径，用于存储不同 map/reduce 阶段的执行计划和这些阶段的中间输出结果。

<property>
  <name>hive.exec.scratchdir</name>
  <value>/tmp/hive-${user.name}</value>
  <description>Scratch space for Hive jobs</description>
</property>

对于 hive_2014-01-08_10-58-52_023_7835826938243226729/这个文件夹，在job执行完之后就会被自己删除掉的。