Hive 和 Kylin 性能对比
需求:根据部门名称[dname]统计员工薪资总数[sum(sal)]
SQL :
select dname,sum(sal) from emp e join dept d on e.deptno = d.deptno group by dname;
Hive
hive (kylin_test)> select dname,sum(sal) from emp e join dept d on e.deptno = d.deptno group by dname;
Query ID = root_20220126134911_810707b2-fef4-4e02-b04a-38a35d3b15d5
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Running with YARN Application = application_1643156223090_0011
Kill Command = /opt/module/hadoop-3.1.3/bin/yarn application -kill application_1643156223090_0011
Hive on Spark Session Web UI URL: http://hadoop103:40661
Query Hive on Spark job[0] stages: [0]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED
--------------------------------------------------------------------------------------
Stage-0 ........ 0 FINISHED 1 1 0 0 0
--------------------------------------------------------------------------------------
STAGES: 01/01 [==========================>>] 100% ELAPSED TIME: 25.75 s
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 25.75 second(s)
Launching Job 2 out of 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Running with YARN Application = application_1643156223090_0011
Kill Command = /opt/module/hadoop-3.1.3/bin/yarn application -kill application_1643156223090_0011
Hive on Spark Session Web UI URL: http://hadoop103:40661
Query Hive on Spark job[1] stages: [1, 2]
Spark job[1] status = RUNNING
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED
--------------------------------------------------------------------------------------
Stage-1 ........ 0 FINISHED 1 1 0 0 0
Stage-2 ........ 0 FINISHED 1 1 0 0 0
--------------------------------------------------------------------------------------
STAGES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.26 s
--------------------------------------------------------------------------------------
Spark job[1] finished successfully in 6.26 second(s)
OK
dname _c1
SALES 9400.0
RESEARCH 10875.0
ACCOUNTING 8750.0
Time taken: 110.635 seconds, Fetched: 3 row(s)
Kylin 查询
1)进入 Insight 页面
2)在 New Query 中输入查询语句并 Submit
由上两个结果可见!
Hive执行用了110秒,kylin用了 13秒,kylin的查询比hive的查询快了很多倍!
3)数据图表展示及导出
条形图 :
饼状图:
Excel导出: