hive的overwrite

    这几天有个朋友问我 hive的overwrite是怎么执行重写,假如重写执行到一半报错,会不会导致丢失数据呢?

一开始没有反应过来,后来想想,其实这个可以从 explain 上看到的。

hive (temp)> explain insert overwrite table ods.ods_memberext_dd select * from temp.lhc_memberext_20130926;
OK
Explain
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME temp lhc_memberext_20130926))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME ods ods_memberext_dd))) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
  Stage-2 depends on stages: Stage-0
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        lhc_memberext_20130926 
          TableScan
            alias: lhc_memberext_20130926
            Select Operator
              expressions:
                    expr: id
                    type: int
                    expr: bloodtype
                    type: string
                    expr: regdate
                    type: string
                    expr: termtype
                    type: string
                    expr: channel
                    type: int
                    expr: ip
                    type: string
                    expr: clientid
                    type: string
                    expr: imei
                    type: string
                    expr: version
                    type: string
                    expr: platform
                    type: string
                    expr: model
                    type: string
                    expr: systemname
                    type: string
                    expr: systemversion
                    type: string
                    expr: channelid
                    type: int
                    expr: resolution
                    type: string
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14
              File Output Operator
                compressed: false
                GlobalTableId: 1
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: ods.ods_memberext_dd

  Stage: Stage-7
    Conditional Operator

  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10000

  Stage: Stage-0
    Move Operator
      tables:
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: ods.ods_memberext_dd

  Stage: Stage-2
    Stats-Aggr Operator

  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
        hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10002 
            File Output Operator
              compressed: false
              GlobalTableId: 0
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: ods.ods_memberext_dd

  Stage: Stage-5
    Map Reduce
      Alias -> Map Operator Tree:
        hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10002 
            File Output Operator
              compressed: false
              GlobalTableId: 0
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: ods.ods_memberext_dd

  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10000

对于中间数据文件为什么放到/tmp/hive-hadoop  这个是可以在 hive-site.xml 文件中配置参数 hive.exec.scratchdir 的。

hive.exec.scratchdir 
HDFS路径,用于存储不同 map/reduce 阶段的执行计划和这些阶段的中间输出结果。 
<property>
  <name>hive.exec.scratchdir</name>
  <value>/tmp/hive-${user.name}</value>
  <description>Scratch space for Hive jobs</description>
</property>

对于 hive_2014-01-08_10-58-52_023_7835826938243226729/这个文件夹,在job执行完之后就会被自己删除掉的。


  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值