Hive 主要是跑批处理的,Impala主要是做准实时的内存查询,我拿他俩做比较~~感觉我好没意思阿~~
一、count 比较
十万数量级
Imapal
[cdh-node2:21000] > select count(1) from userinfo;
Query: select count(1) from userinfo
+----------+
| count(1) |
+----------+
| 124850 |
+----------+
Fetched 1 row(s) in 2.39s
[cdh-node2:21000] > select count(1) from userinfo;
Query: select count(1) from userinfo
+----------+
| count(1) |
+----------+
| 124850 |
+----------+
Fetched 1 row(s) in 0.57s
[cdh-node2:21000] >
Hive
> select count(1) from userinfo;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1422624309453_0060, Tracking URL = http://cdh-node1:8088/proxy/application_1422624309453_0060/
Kill Command = /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/bin/hadoop job -kill job_1422624309453_0060
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-01-31 18:06:57,974 Stage-1 map = 0%, reduce = 0%
2015-01-31 18:07:06,297 Stage-1 map &#