hive学习（3）--- 较大数据下的hql执行速度

最新推荐文章于 2022-12-09 14:11:56 发布

glgl2424

最新推荐文章于 2022-12-09 14:11:56 发布

阅读量1.7k

点赞数

分类专栏： Hadoop相关

本文链接：https://blog.csdn.net/glgl2424/article/details/26807393

版权

本文通过一个示例展示了在20600000条数据下，Hive执行包含多表连接、条件判断、排序及限制的HQL语句的过程。执行过程中涉及3个MapReduce作业，分析了Hive根据查询复杂度规划Job的初步规律，并对比了加入排序操作对Job数量的影响。

摘要由CSDN通过智能技术生成

生成一个20000000条数据的wetherdata4.txt文件，追加到weather表中，形成20600000条数据下的查询性能情况分析：

第一条hql语句：

select cy.number,wh.*,pm.pmlevel
from cityinfo cy join weather wh on (cy.name=wh.city)
join pminfo pm on (pm.pmvalue=wh.pmvalue)
where wh.city='hangzhou' and wh.weath='fine' and wh.minTemperat in
( -18,25,43) order by maxTemperat DESC limit 20;

三表联合，有判断条件，有in，有排序

执行过程：

hive> select cy.number,wh.*,pm.pmlevel
> from cityinfo cy join weather wh on (cy.name=wh.city)
> join pminfo pm on (pm.pmvalue=wh.pmvalue)
> where wh.city='hangzhou' and wh.weath='fine' and wh.minTemperat in
> ( -18,25,43) order by maxTemperat DESC limit 20;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks not specified. Estimated from input data size: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_1400798880449_0059, Tracking URL = http://Master.Hadoop:8088/proxy/application_1400798880449_0059/
Kill Command = /usr/hadoop/bin/hadoop job -kill job_1400798880449_0059
Hado