hive优化(1)之mapjoin

本文介绍了在Hive中使用MapJoin优化查询性能的过程。通过创建`lpx_mapjoin`表并执行MapJoin操作,减少了Reduce任务,显著提升了处理速度。在Map阶段即完成两个表的JOIN,避免了大量数据的shuffle过程,从而提高了大数据处理的效率。
摘要由CSDN通过智能技术生成

hive> create table lpx_mapjoin as

   > SELECT '2012-04-17' as stat_date

   >        ,b.admin_member_id

   >        ,a.category_level2_id

   >        ,b.keywords

   >        ,sum(shownum) as sum_prod_show_cnt

   >        ,sum(clicknum) as sum_prod_click_cnt

   >        ,sysdate('yyyy-MM-dd hh:mm:ss')as dw_ins_date 

   >   FROM (SELECT category_id

   >                ,category_level2_id

   >            FROM   pub_en_category_dimt0_h

   >           WHERE   dw_begin_date <= '2012-04-17'

   >             AND   hp_dw_end_date > '2012-04-17'

   >             AND   category_type = '02'

   >        ) a

   >   JOIN (SELECT admin_member_id

   >                ,lower(trim(regexp_replace(keyword,'[\\\\t|\\\\s]+', ' '))) as keywords

   >                ,coalesce(shownum,cast(0 as bigint)) as shownum

   >                ,coalesce(clicknum,cast(0 as bigint)) as clicknum

   >                ,post_category_id

   >           FROM  idl_en_ctr_click_sdt0

   >          WHERE  hp_stat_date = '2012-04-17'

   >            AND  keyword IS NOT NULL

   >        ) b

   >    ON (a.category_id = b.post_category_id)

   > GROUP BY b.admin_member_id

   >          ,a.category_level2_id     

   >          ,b.keywords ;

Total MapReduce jobs = 3

Launching Job 1 out of 3

Number of reduce tasks not specified.Estimated from input data size: 3

In order to change the average loadfor a reducer (in bytes):

 set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum numberof reducers:

 set hive.exec.reducers.max=<number>

In order to set a constant number ofreducers:

 set mapred.reduce.tasks=<number>

Cannot run job locally: InputSize (= 2511131447) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)

Warning:Can't find tianshu info:markid missing!

Starting Job = job_201205151343_1062053,Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201205151343_1062053

Kill Command = /dhwdata/hadoop/bin/../bin/hadoopjob  -Dmapred.job.tracker=hdpjt:9001 -killjob_201205151343_1062053

Hadoop job information for Stage-1: number of mappers: 111; number of reducers: 3

2012-05-25 13:24:05,753 Stage-1 map= 26%,  reduce = 0%

2012-05-25 13:24:23,957 Stage-1 map= 77%,  reduce = 0%

2012-05-25 13:24:41,265 Stage-1 map= 92%,  reduce = 0%

2012-05-25 13:25:00,561 Stage-1 map= 100%,  reduce = 14%

2012-05-25 13:25:11,320 Stage-1 map= 100%,  reduce = 23%

2012-05-25 13:25:29,255 Stage-1 map= 100%,  reduce = 32%

2012-05-25 13:25:48,604 Stage-1 map= 100%,  reduce = 57%

2012-05-25 13:26:07,861 Stage-1 map= 100%,  reduce = 70%

2012-05-25 13:26:31,523 Stage-1 map= 100%,  reduce = 73%

2012-05-25 13:26:54,908 Stage-1 map= 100%,  reduce = 77%

20

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值