hive -- hive.auto.convert.join优化

最新推荐文章于 2024-05-10 14:54:23 发布

微风凉

最新推荐文章于 2024-05-10 14:54:23 发布

阅读量1.1w

点赞数 1

分类专栏： hive

本文链接：https://blog.csdn.net/weixin_39150719/article/details/102931864

版权

hive 专栏收录该内容

20 篇文章 1 订阅

订阅专栏

先了解mapjoin架构：
在这里插入图片描述

对于这参数一定要使用的谨慎一些：
参数详细：
在这里插入图片描述

具体实验：
hive> show databases;
OK
default
Time taken: 0.993 seconds, Fetched: 3 row(s)
hive> create database test;
OK
Time taken: 1.192 seconds
hive> use test;
OK
Time taken: 0.098 seconds
hive> create table bigtable(id bigint, time bigint, uid string, keyword string, url_rank int, click_num int, click_
url string) row format delimited fields terminated by ‘\t’;OK
Time taken: 0.377 seconds
hive> create table smalltable(id bigint, time bigint, uid string, keyword string, url_rank int, click_num int, clic
k_url string) row format delimited fields terminated by ‘\t’;OK
Time taken: 0.09 seconds
hive> create table jointable(id bigint, time bigint, uid string, keyword string, url_rank int, click_num int, click
_url string) row format delimited fields terminated by ‘\t’;OK
Time taken: 0.125 seconds
hive> load data local inpath ‘/home/oracle/Downloads/o/testdata/bigtable’ into table bigtable;
Loading data to table test.bigtable
OK
Time taken: 1.897 seconds
hive> load data local inpath ‘/home/oracle/Downloads/o/testdata/smalltable’ into table smalltable;
Loading data to table test.smalltable
OK
Time taken: 0.335 seconds

hive> set hive.auto.convert.join
    > ;
hive.auto.convert.join=true

小表join大表的情况下执行时间

hive> insert overwrite table jointable
> select b.id, b.time, b.uid, b.keyword, b.url_rank, b.click_num, b.click_url
> from smalltable s
> left join bigtable b
> on b.id = s.id;
Query ID = oracle_20191105221818_9a0ae1b4-5316-4605-ad0c-09947d6e6ef6
Total jobs = 1
Stage-1 is selected by condition resolver.
Launching Job 1 out of 1
ge-1: number of mappers: 2; number of reducers: 1
2019-11-05 22:18:31,462 Stage-1 map = 0%, reduce = 0%
2019-11-05 22:18:42,284 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 2.3 sec
2019-11-05 22:18:44,439 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 7.86 sec
2019-11-05 22:18:50,971 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 11.71 sec
2019-11-05 22:18:53,149 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 14.02 sec
2019-11-05 22:18:54,233 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.36 sec
2019-11-05 22:18:56,378 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 16.31 sec
2019-11-05 22:18:59,521 Stage-1 map = 100%, reduce = 98%, Cumulative CPU 20.09 sec
2019-11-05 22:19:02,721 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 23.76 sec
MapReduce Total cumulative CPU time: 23 seconds 760 msec
Ended Job = job_1573005250104_0001
Loading data to table test.jointable
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 23.76 sec HDFS Read: 142190202 HDFS Write: 132010489 SUCCESS
Total MapReduce CPU Time Spent: 23 seconds 760 msec
OK
Time taken: 47.871 seconds

大表join小表的情况下执行时间

hive> insert overwrite table jointable
> select b.id, b.time, b.uid, b.keyword, b.url_rank, b.click_num, b.click_url
> from bigtable b
> left join smalltable s
> on s.id = b.id;
Query ID = oracle_20191105222020_664b4ccc-426e-48f0-b8fe-825404a4f791
Total jobs = 2
Stage-5 is selected by condition resolver.
Stage-1 is filtered out by condition resolver.
Execution log at: /tmp/oracle/oracle_20191105222020_664b4ccc-426e-48f0-b8fe-825404a4f791.log
2019-11-05 10:20:41 Starting to launch local task to process map join; maximum memory = 1908932608
2019-11-05 10:20:43 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/oracle/69729fe6-db8
6-4b4e-968a-64c2297a479c/hive_2019-11-05_22-20-37_145_8291916404521446309-1/-local-10002/HashTable-Stage-3/MapJoin-mapfile11–.hashtable2019-11-05 10:20:43 Uploaded 1 File to: file:/tmp/oracle/69729fe6-db86-4b4e-968a-64c2297a479c/hive_2019-11-05_2
2-20-37_145_8291916404521446309-1/-local-10002/HashTable-Stage-3/MapJoin-mapfile11–.hashtable (402269 bytes)2019-11-05 10:20:43 End of local task; Time Taken: 1.536 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 2 out of 2
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2019-11-05 22:20:54,871 Stage-3 map = 0%, reduce = 0%
2019-11-05 22:21:06,536 Stage-3 map = 52%, reduce = 0%, Cumulative CPU 7.3 sec
2019-11-05 22:21:09,695 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 12.1 sec
MapReduce Total cumulative CPU time: 12 seconds 100 msec
Ended Job = job_1573005250104_0002
Loading data to table test.jointable
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1 Cumulative CPU: 12.1 sec HDFS Read: 129167959 HDFS Write: 261160554 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 100 msec
OK
Time taken: 34.845 seconds

设置参数set hive.auto.convert.join=false;后的执行时间相差无几

hive> set hive.auto.convert.join=false;

小表join大表的情况下执行时间

hive> insert overwrite table jointable
> select b.id, b.time, b.uid, b.keyword, b.url_rank, b.click_num, b.click_url
> from smalltable s
> left join bigtable b
> on b.id = s.id;
Query ID = oracle_20191105222222_3e01ecab-3b3f-4234-8742-d020a101b825
Total jobs = 1
Launching Job 1 out of 1
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2019-11-05 22:22:20,519 Stage-1 map = 0%, reduce = 0%
2019-11-05 22:22:30,357 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 2.39 sec
2019-11-05 22:22:33,566 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 8.53 sec
2019-11-05 22:22:39,053 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 12.52 sec
2019-11-05 22:22:42,283 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.35 sec
2019-11-05 22:22:44,431 Stage-1 map = 100%, reduce = 68%, Cumulative CPU 17.12 sec
2019-11-05 22:22:47,643 Stage-1 map = 100%, reduce = 98%, Cumulative CPU 21.03 sec
2019-11-05 22:22:50,806 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 24.17 sec
MapReduce Total cumulative CPU time: 24 seconds 170 msec
Ended Job = job_1573005250104_0003
Loading data to table test.jointable
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 24.17 sec HDFS Read: 142190298 HDFS Write: 132010489 SUCCESS
Total MapReduce CPU Time Spent: 24 seconds 170 msec
OK
Time taken: 43.843 seconds

大表join小表的情况下执行时间

hive> insert overwrite table jointable
> select b.id, b.time, b.uid, b.keyword, b.url_rank, b.click_num, b.click_url
> from bigtable b
> left join smalltable s
> on s.id = b.id;
Query ID = oracle_20191105222323_6d714959-b875-4f0d-955f-80f68fa779d6
Total jobs = 1
Launching Job 1 out of 1
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2019-11-05 22:23:21,725 Stage-1 map = 0%, reduce = 0%
2019-11-05 22:23:31,503 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 2.51 sec
2019-11-05 22:23:33,574 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 8.77 sec
2019-11-05 22:23:40,301 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 12.07 sec
2019-11-05 22:23:43,447 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.22 sec
2019-11-05 22:23:45,569 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 16.89 sec
2019-11-05 22:23:48,715 Stage-1 map = 100%, reduce = 83%, Cumulative CPU 20.87 sec
2019-11-05 22:23:51,859 Stage-1 map = 100%, reduce = 98%, Cumulative CPU 24.35 sec
2019-11-05 22:23:53,949 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 26.14 sec
MapReduce Total cumulative CPU time: 26 seconds 140 msec
Ended Job = job_1573005250104_0004
Loading data to table test.jointable
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 26.14 sec HDFS Read: 142189839 HDFS Write: 261160554 SUCCESS
Total MapReduce CPU Time Spent: 26 seconds 140 msec
OK
Time taken: 46.306 seconds

微风凉

关注

1
点赞
踩
47

收藏

觉得还不错? 一键收藏
0
评论
hive -- hive.auto.convert.join优化

对于这参数一定要使用的谨慎一些：参数详细：具体实验：hive> show databases;OKdefaultTime taken: 0.993 seconds, Fetched: 3 row(s)hive> create database test;OKTime taken: 1.192 secondshive> use test;OKTime t...
复制链接

扫一扫