一个关于MapJoin的测试用例

hive> create table lpx_mapjoin as

   > SELECT '2012-04-17' as stat_date

   >        ,b.admin_member_id

   >        ,a.category_level2_id

   >        ,b.keywords

   >        ,sum(shownum) as sum_prod_show_cnt

   >        ,sum(clicknum) as sum_prod_click_cnt

   >        ,sysdate('yyyy-MM-dd hh:mm:ss')as dw_ins_date 

   >   FROM (SELECT category_id

   >                ,category_level2_id

   >            FROM   pub_en_category_dimt0_h

   >           WHERE   dw_begin_date <= '2012-04-17'

   >             AND   hp_dw_end_date > '2012-04-17'

   >             AND   category_type = '02'

   >        ) a

   >   JOIN (SELECT admin_member_id

   >                ,lower(trim(regexp_replace(keyword,'[\\\\t|\\\\s]+', ' '))) as keywords

   >                ,coalesce(shownum,cast(0 as bigint)) as shownum

   >                ,coalesce(clicknum,cast(0 as bigint)) as clicknum

   >                ,post_category_id

   >           FROM  idl_en_ctr_click_sdt0

   >          WHERE  hp_stat_date = '2012-04-17'

   >            AND  keyword IS NOT NULL

   >        ) b

   >    ON (a.category_id = b.post_category_id)

   > GROUP BY b.admin_member_id

   >          ,a.category_level2_id     

   >          ,b.keywords ;

Total MapReduce jobs = 3

Launching Job 1 out of 3

Number of reduce tasks not specified.Estimated from input data size: 3

In order to change the average loadfor a reducer (in bytes):

 set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum numberof reducers:

 set hive.exec.reducers.max=<number>

In order to set a constant number ofreducers:

 set mapred.reduce.tasks=<number>

Cannot run job locally: InputSize (= 2511131447) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)

Warning:Can't find tianshu info:markid missing!

Starting Job = job_201205151343_1062053,Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201205151343_1062053

Kill Command = /dhwdata/hadoop/bin/../bin/hadoopjob  -Dmapred.job.tracker=hdpjt:9001 -killjob_201205151343_1062053

Hadoop job information for Stage-1: number of mappers: 111; number of reducers: 3

2012-05-25 13:24:05,753 Stage-1 map= 26%,  reduce = 0%

2012-05-25 13:24:23,957 Stage-1 map= 77%,  reduce = 0%

2012-05-25 13:24:41,265 Stage-1 map= 92%,  reduce = 0%

2012-05-25 13:25:00,561 Stage-1 map= 100%,  reduce = 14%

2012-05-25 13:25:11,320 Stage-1 map= 100%,  reduce = 23%

2012-05-25 13:25:29,255 Stage-1 map= 100%,  reduce = 32%

2012-05-25 13:25:48,604 Stage-1 map= 100%,  reduce = 57%

2012-05-25 13:26:07,861 Stage-1 map= 100%,  reduce = 70%

2012-05-25 13:26:31,523 Stage-1 map= 100%,  reduce = 73%

2012-05-25 13:26:54,908 Stage-1 map= 100%,  reduce = 77%

2012-05-25 13:27:16,370 Stage-1 map= 100%,  reduce = 80%

2012-05-25 13:27:31,626 Stage-1 map= 100%,  reduce = 82%

2012-05-25 13:27:43,934 Stage-1 map= 100%,  reduce = 84%

2012-05-25 13:27:59,381 Stage-1 map= 100%,  reduce = 87%

2012-05-25 13:28:12,944 Stage-1 map= 100%,  reduce = 88%

2012-05-25 13:28:32,217 Stage-1 map= 100%,  reduce = 92%

2012-05-25 13:28:50,428 Stage-1 map= 100%,  reduce = 94%

2012-05-25 13:29:05,792 Stage-1 map= 100%,  reduce = 96%

2012-05-25 13:29:10,207 Stage-1 map= 100%,  reduce = 97%

2012-05-25 13:29:16,808 Stage-1 map= 100%,  reduce = 98%

2012-05-25 13:29:23,570 Stage-1 map= 100%,  reduce = 99%

2012-05-25 13:29:33,150 Stage-1 map= 100%,  reduce = 100%

Ended Job = job_201205151343_1062053

Launching Job 2 out of 3

Number of reduce tasks not specified.Estimated from input data size: 1

In order to change the average loadfor a reducer (in bytes):

 set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum numberof reducers:

 set hive.exec.reducers.max=<number>

In order to set a constant number ofreducers:

 set mapred.reduce.tasks=<number>

Cannot run job locally: Input Size(= 927172565) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)

Warning:Can't find tianshu info:markid missing!

Starting Job = job_201205151343_1062532,Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201205151343_1062532

Kill Command = /dhwdata/hadoop/bin/../bin/hadoopjob  -Dmapred.job.tracker=hdpjt:9001 -killjob_201205151343_1062532

Hadoop job information for Stage-2: number of mappers: 15; number of reducers: 1

2012-05-25 13:30:10,087 Stage-2 map= 8%,  reduce = 0%

2012-05-25 13:30:24,216 Stage-2 map= 40%,  reduce = 0%

2012-05-25 13:30:37,083 Stage-2 map= 63%,  reduce = 0%

2012-05-25 13:30:52,854 Stage-2 map= 87%,  reduce = 0%

2012-05-25 13:31:29,057 Stage-2 map= 99%,  reduce = 0%

2012-05-25 13:31:38,701 Stage-2 map= 100%,  reduce = 0%

2012-05-25 13:31:47,734 Stage-2 map= 100%,  reduce = 4%

2012-05-25 13:31:54,239 Stage-2 map= 100%,  reduce = 11%

2012-05-25 13:31:59,466 Stage-2 map= 100%,  reduce = 18%

2012-05-25 13:32:04,989 Stage-2 map= 100%,  reduce = 22%

2012-05-25 13:32:15,136 Stage-2 map= 100%,  reduce = 29%

2012-05-25 13:32:22,236 Stage-2 map= 100%,  reduce = 38%

2012-05-25 13:32:31,750 Stage-2 map= 100%,  reduce = 43%

2012-05-25 13:32:41,773 Stage-2 map= 100%,  reduce = 59%

2012-05-25 13:32:51,002 Stage-2 map= 100%,  reduce = 67%

2012-05-25 13:33:02,976 Stage-2 map= 100%,  reduce = 68%

2012-05-25 13:33:17,175 Stage-2 map= 100%,  reduce = 69%

2012-05-25 13:33:34,641 Stage-2 map= 100%,  reduce = 70%

2012-05-25 13:33:53,767 Stage-2 map= 100%,  reduce = 71%

2012-05-25 13:34:15,878 Stage-2 map= 100%,  reduce = 72%

2012-05-25 13:34:38,170 Stage-2 map= 100%,  reduce = 73%

2012-05-25 13:34:49,164 Stage-2 map= 100%,  reduce = 74%

2012-05-25 13:35:13,119 Stage-2 map= 100%,  reduce = 75%

2012-05-25 13:35:26,087 Stage-2 map= 100%,  reduce = 76%

2012-05-25 13:35:46,155 Stage-2 map= 100%,  reduce = 77%

2012-05-25 13:36:04,623 Stage-2 map= 100%,  reduce = 78%

2012-05-25 13:36:26,206 Stage-2 map= 100%,  reduce = 79%

2012-05-25 13:36:54,454 Stage-2 map= 100%,  reduce = 80%

2012-05-25 13:37:18,056 Stage-2 map= 100%,  reduce = 81%

2012-05-25 13:37:37,048 Stage-2 map= 100%,  reduce = 82%

2012-05-25 13:37:49,247 Stage-2 map= 100%,  reduce = 83%

2012-05-25 13:38:20,496 Stage-2 map= 100%,  reduce = 84%

2012-05-25 13:38:33,526 Stage-2 map= 100%,  reduce = 85%

2012-05-25 13:38:58,196 Stage-2 map= 100%,  reduce = 86%

2012-05-25 13:39:13,534 Stage-2 map= 100%,  reduce = 87%

2012-05-25 13:39:39,946 Stage-2 map= 100%,  reduce = 88%

2012-05-25 13:39:56,969 Stage-2 map= 100%,  reduce = 89%

2012-05-25 13:40:39,779 Stage-2 map= 100%,  reduce = 91%

2012-05-25 13:41:20,402 Stage-2 map= 100%,  reduce = 92%

2012-05-25 13:41:27,308 Stage-2 map= 100%,  reduce = 93%

2012-05-25 13:41:46,528 Stage-2 map= 100%,  reduce = 94%

2012-05-25 13:42:14,809 Stage-2 map= 100%,  reduce = 95%

2012-05-25 13:42:31,821 Stage-2 map= 100%,  reduce = 96%

2012-05-25 13:42:48,402 Stage-2 map= 100%,  reduce = 97%

2012-05-25 13:43:26,587 Stage-2 map= 100%,  reduce = 98%

2012-05-25 13:43:49,918 Stage-2 map= 100%,  reduce = 99%

2012-05-25 13:44:05,737 Stage-2 map= 100%,  reduce = 100%

Ended Job = job_201205151343_1062532

Ended Job = 528296150, job is filteredout (removed at runtime).

Moving data to: hdfs://hdpnn:9000/group/alibaba-dw-icbu/tmp/hive-dwapp/hive_2012-05-25_13-21-45_780_2062757607370764899/-ext-10001

Moving data to: /group/alibaba-dw-icbu/hive/lpx_mapjoin

40843215 Rows loaded to hdfs://hdpnn:9000/group/alibaba-dw-icbu/tmp/hive-dwapp/hive_2012-05-25_13-21-45_780_2062757607370764899/-ext-10000

OK

Time taken: 1360.901 seconds

 

hive> create table lpx_mapjoin1as

   > SELECT /*+mapjoin(a)*/

   >        '2012-04-17' as stat_date

   >        ,b.admin_member_id

   >        ,a.category_level2_id

   >        ,b.keywords

   >        ,sum(shownum) as sum_prod_show_cnt

   >        ,sum(clicknum) as sum_prod_click_cnt

   >        ,sysdate('yyyy-MM-dd hh:mm:ss')as dw_ins_date 

   >   FROM (SELECT category_id

   >                ,category_level2_id

   >            FROM   pub_en_category_dimt0_h

   >           WHERE   dw_begin_date <= '2012-04-17'

   >             AND   hp_dw_end_date > '2012-04-17'

   >             AND   category_type = '02'

   >        ) a

   >   JOIN (SELECT admin_member_id

   >                ,lower(trim(regexp_replace(keyword,'[\\\\t|\\\\s]+', ' '))) as keywords

   >                ,coalesce(shownum,cast(0 as bigint)) as shownum

   >                ,coalesce(clicknum,cast(0 as bigint)) as clicknum

   >                ,post_category_id

   >           FROM  idl_en_ctr_click_sdt0

   >          WHERE  hp_stat_date = '2012-04-17'

   >            AND  keyword IS NOT NULL

   >        ) b

   >    ON (a.category_id = b.post_category_id)

   > GROUP BY b.admin_member_id

   >          ,a.category_level2_id     

   >          ,b.keywords ;

Total MapReduce jobs = 2

plan = file:/tmp/dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-local-10007/plan.xml

2012-05-25 01:30:35     Starting to launch local task to process mapjoin;      maximum memory = 932118528

2012-05-25 01:30:50     Processing rows:        4739   Hashtable size: 4739    Memory usage:   3523352 rate:   0.004

2012-05-25 01:30:50     Dump the hashtable into file: file:/tmp/dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-local-10005/HashTable-Stage-1/-0--.hashtable

2012-05-25 01:30:51     Upload 1 File to: file:/tmp/dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-local-10005/HashTable-Stage-1/-0--.hashtable File size:227587

2012-05-25 01:30:51     End of local task; Time Taken: 16.016 sec.

Mapred Local Task Succeeded . Convertthe Join into MapJoin

Launching Job 1 out of 2

Number of reduce tasks not specified.Estimated from input data size: 3

In order to change the average loadfor a reducer (in bytes):

 set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum numberof reducers:

 set hive.exec.reducers.max=<number>

In order to set a constant number ofreducers:

 set mapred.reduce.tasks=<number>

Cannot run job locally: Input Size(= 2510886581) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)

Warning:Can't find tianshu info:markid missing!

Starting Job = job_201205151343_1062638,Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201205151343_1062638

Kill Command = /dhwdata/hadoop/bin/../bin/hadoopjob  -Dmapred.job.tracker=hdpjt:9001 -killjob_201205151343_1062638

Hadoop job information for Stage-1: number of mappers: 45; number of reducers: 3

2012-05-25 13:31:38,418 Stage-1 map= 0%,  reduce = 0%

2012-05-25 13:32:19,988 Stage-1 map= 6%,  reduce = 0%

2012-05-25 13:32:29,559 Stage-1 map= 20%,  reduce = 0%

2012-05-25 13:32:40,713 Stage-1 map= 44%,  reduce = 0%

2012-05-25 13:32:50,025 Stage-1 map= 51%,  reduce = 0%

2012-05-25 13:32:55,730 Stage-1 map= 60%,  reduce = 0%

2012-05-25 13:33:02,644 Stage-1 map= 74%,  reduce = 0%

2012-05-25 13:33:08,201 Stage-1 map= 82%,  reduce = 0%

2012-05-25 13:33:13,908 Stage-1 map= 88%,  reduce = 0%

2012-05-25 13:33:18,679 Stage-1 map= 95%,  reduce = 0%

2012-05-25 13:33:22,685 Stage-1 map= 98%,  reduce = 0%

2012-05-25 13:33:24,247 Stage-1 map= 99%,  reduce = 0%

2012-05-25 13:33:29,223 Stage-1 map= 100%,  reduce = 0%

2012-05-25 13:33:37,668 Stage-1 map= 100%,  reduce = 12%

2012-05-25 13:33:45,733 Stage-1 map= 100%,  reduce = 13%

2012-05-25 13:33:51,046 Stage-1 map= 100%,  reduce = 20%

2012-05-25 13:33:53,741 Stage-1 map= 100%,  reduce = 21%

2012-05-25 13:33:56,426 Stage-1 map= 100%,  reduce = 22%

2012-05-25 13:34:05,551 Stage-1 map= 100%,  reduce = 29%

2012-05-25 13:34:11,259 Stage-1 map= 100%,  reduce = 32%

2012-05-25 13:34:15,869 Stage-1 map= 100%,  reduce = 46%

2012-05-25 13:34:19,307 Stage-1 map= 100%,  reduce = 67%

2012-05-25 13:34:34,205 Stage-1 map= 100%,  reduce = 68%

2012-05-25 13:34:38,535 Stage-1 map= 100%,  reduce = 69%

2012-05-25 13:34:40,661 Stage-1 map= 100%,  reduce = 70%

2012-05-25 13:34:46,251 Stage-1 map= 100%,  reduce = 71%

2012-05-25 13:34:53,543 Stage-1 map= 100%,  reduce = 72%

2012-05-25 13:35:01,525 Stage-1 map= 100%,  reduce = 73%

2012-05-25 13:35:05,963 Stage-1 map= 100%,  reduce = 74%

2012-05-25 13:35:15,245 Stage-1 map= 100%,  reduce = 75%

2012-05-25 13:35:18,322 Stage-1 map= 100%,  reduce = 76%

2012-05-25 13:35:26,102 Stage-1 map= 100%,  reduce = 77%

2012-05-25 13:35:32,168 Stage-1 map= 100%,  reduce = 78%

2012-05-25 13:35:37,227 Stage-1 map= 100%,  reduce = 79%

2012-05-25 13:35:44,590 Stage-1 map= 100%,  reduce = 80%

2012-05-25 13:35:51,719 Stage-1 map= 100%,  reduce = 81%

2012-05-25 13:36:00,501 Stage-1 map= 100%,  reduce = 82%

2012-05-25 13:36:13,369 Stage-1 map= 100%,  reduce = 84%

2012-05-25 13:36:21,199 Stage-1 map= 100%,  reduce = 85%

2012-05-25 13:36:39,548 Stage-1 map= 100%,  reduce = 86%

2012-05-25 13:36:48,621 Stage-1 map= 100%,  reduce = 88%

2012-05-25 13:36:59,733 Stage-1 map= 100%,  reduce = 89%

2012-05-25 13:37:08,042 Stage-1 map= 100%,  reduce = 90%

2012-05-25 13:37:15,360 Stage-1 map= 100%,  reduce = 91%

2012-05-25 13:37:23,589 Stage-1 map= 100%,  reduce = 92%

2012-05-25 13:37:35,594 Stage-1 map= 100%,  reduce = 94%

2012-05-25 13:37:42,345 Stage-1 map= 100%,  reduce = 95%

2012-05-25 13:37:48,968 Stage-1 map= 100%,  reduce = 96%

2012-05-25 13:37:54,187 Stage-1 map= 100%,  reduce = 97%

2012-05-25 13:37:58,186 Stage-1 map= 100%,  reduce = 98%

2012-05-25 13:38:15,289 Stage-1 map= 100%,  reduce = 99%

2012-05-25 13:38:24,615 Stage-1 map= 100%,  reduce = 100%

Ended Job = job_201205151343_1062638

Ended Job = 1570692836, job is filteredout (removed at runtime).

Moving data to: hdfs://hdpnn:9000/group/alibaba-dw-icbu/tmp/hive-dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-ext-10001

Moving data to: /group/alibaba-dw-icbu/hive/lpx_mapjoin1

27223002 Rows loaded to hdfs://hdpnn:9000/group/alibaba-dw-icbu/tmp/hive-dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-ext-10000

OK

Time taken: 519.457 seconds

总结:Mapjoin会把小表全部载入内存,在map阶段直接拿另一个表的数据和内存中表数据进行匹配。由于在map中直接完成JOIN操作,省略了reduce操作,因此运行效率很高。

 

Join 操作在 Map 阶段完成,不再需要Reduce,前提条件是需要的数据在 Map的过程中可以访问到。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值