hive一个数据源产生多个数据集(一个隐藏的调优方法)

一个数据源产生多个数据集

语法:

from table
insert into table1 partition(dt)
select 
...
insert into table2 partition(dt)
select 
...;

案例:

hive> explain FROM dw_center.dwb_open_srcqid_os_ver_user
    > INSERT INTO TABLE dw_center.dwb_open_srcqid_os_user PARTITION(dt)
    > select
    >   apptypeid,
    >   uid,
    >   srcqid,
    >   os,
    >   isnewuser,
    >   pv,
    >   dt
    > where dt ='20210727' 
    > 
    > INSERT INTO TABLE dw_center.dwb_open_srcqid_ver_user PARTITION(dt)
    > select
    >   apptypeid,
    >   uid,
    >   srcqid,
    >   ver,
    >   isnewuser,
    >   pv,
    >   dt
    > where dt ='20210727';
OK
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-8 depends on stages: Stage-2 , consists of Stage-5, Stage-4, Stage-6
  Stage-5
  Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
  Stage-3 depends on stages: Stage-0
  Stage-4
  Stage-6
  Stage-7 depends on stages: Stage-6
  Stage-14 depends on stages: Stage-2 , consists of Stage-11, Stage-10, Stage-12
  Stage-11
  Stage-1 depends on stages: Stage-11, Stage-10, Stage-13
  Stage-9 depends on stages: Stage-1
  Stage-10
  Stage-12
  Stage-13 depends on stages: Stage-12

STAGE PLANS:
  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: dwb_open_srcqid_os_ver_user
            Statistics: Num rows: 177821 Data size: 91044719 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: apptypeid (type: string), uid (type: string), srcqid (type: string), os (type: string), isnewuser (type: int), pv (type: bigint), '20210727' (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
              Statistics: Num rows: 177821 Data size: 91044719 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 177821 Data size: 91044719 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: dw_center.dwb_open_srcqid_os_user
            Select Operator
              expressions: apptypeid (type: string), uid (type: string), srcqid (type: string), ver (type: string), isnewuser (type: int), pv (type: bigint), '20210727' (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
              Statistics: Num rows: 177821 Data size: 91044719 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 177821 Data size: 91044719 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: dw_center.dwb_open_srcqid_ver_user

  Stage: Stage-8
    Conditional Operator

  Stage: Stage-5
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://HDFS/usr/hive/warehouse/dw_center.db/dwb_open_srcqid_os_user/.hive-staging_hive_2021-07-29_11-54-31_612_1147423460507385636-1/-ext-10000

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            dt 
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: dw_center.dwb_open_srcqid_os_user

  Stage: Stage-3
    Stats-Aggr Operator

  Stage: Stage-4
    Merge File Operator
      Map Operator Tree:
          ORC File Merge Operator
      merge level: stripe
      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat

  Stage: Stage-6
    Merge File Operator
      Map Operator Tree:
          ORC File Merge Operator
      merge level: stripe
      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat

  Stage: Stage-7
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://HDFS/usr/hive/warehouse/dw_center.db/dwb_open_srcqid_os_user/.hive-staging_hive_2021-07-29_11-54-31_612_1147423460507385636-1/-ext-10000

  Stage: Stage-14
    Conditional Operator

  Stage: Stage-11
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://HDFS/usr/hive/warehouse/dw_center.db/dwb_open_srcqid_ver_user/.hive-staging_hive_2021-07-29_11-54-31_612_1147423460507385636-1/-ext-10002

  Stage: Stage-1
    Move Operator
      tables:
          partition:
            dt 
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: dw_center.dwb_open_srcqid_ver_user

  Stage: Stage-9
    Stats-Aggr Operator

  Stage: Stage-10
    Merge File Operator
      Map Operator Tree:
          ORC File Merge Operator
      merge level: stripe
      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat

  Stage: Stage-12
    Merge File Operator
      Map Operator Tree:
          ORC File Merge Operator
      merge level: stripe
      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat

  Stage: Stage-13
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://HDFS/usr/hive/warehouse/dw_center.db/dwb_open_srcqid_ver_user/.hive-staging_hive_2021-07-29_11-54-31_612_1147423460507385636-1/-ext-10002

Time taken: 0.177 seconds, Fetched: 137 row(s)

执行计划分析:

  TableScan
    alias: dwb_open_srcqid_os_ver_user

扫描dwb_open_srcqid_os_ver_user只做了一次

总结: 对于同一个数据源只用扫描一次,大大提高效率

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值