生成一个20000000条数据的wetherdata4.txt文件,追加到weather表中,形成20600000条数据下的查询性能情况分析:
第一条hql语句:
select cy.number,wh.*,pm.pmlevel
from cityinfo cy join weather wh on (cy.name=wh.city)
join pminfo pm on (pm.pmvalue=wh.pmvalue)
where wh.city='hangzhou' and wh.weath='fine' and wh.minTemperat in
( -18,25,43) order by maxTemperat DESC limit 20;
三表联合,有判断条件,有in,有排序
执行过程:
hive> select cy.number,wh.*,pm.pmlevel
> from cityinfo cy join weather wh on (cy.name=wh.city)
> join pminfo pm on (pm.pmvalue=wh.pmvalue)
> where wh.city='hangzhou' and wh.weath='fine' and wh.minTemperat in
> ( -18,25,43) order by maxTemperat DESC limit 20;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks not specified. Estimated from input data size: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_1400798880449_0059, Tracking URL = http://Master.Hadoop:8088/proxy/application_1400798880449_0059/
Kill Command = /usr/hadoop/bin/hadoop job -kill job_1400798880449_0059
Hado