Hive on MR/Spark

在当前稍微新一点的Hadoop版本中,都支持Hive跑在Spark上,即使用Spark代替默认的MapReduce作为Hive的执行引擎。实际上,除了Spark和MapReduce,Hive还支持使用Tez作为执行引擎,这里对Hive on Tez不作介绍。假如在已经安装配置好Spark之后,只需要简单地通过“set hive.execution.engine=spark”来切换让Hive跑在Spark之上。

首先试验使用默认的MapReduce作为Hive的计算引擎,

hive> select count(*) from parquet_part;
Query ID = trafodion_20171214134444_539bfa11-c931-4d08-87f7-32384b2afdaa
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1512024625412_0010, Tracking URL = http://esg06.esgyncn.local:8088/proxy/application_1512024625412_0010/
Kill Command = /opt/cloudera/parcels/CDH-5.4.11-1.cdh5.4.11.p0.5/lib/hadoop/bin/hadoop job  -kill job_1512024625412_0010
Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1
2017-12-14 13:44:50,830 Stage-1 map = 0%,  reduce = 0%
2017-12-14 13:44:57,147 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 7.76 sec
2017-12-14 13:44:58,185 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 20.63 sec
2017-12-14 13:45:04,410 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 22.32 sec
MapReduce Total cumulative CPU time: 22 seconds 320 msec
Ended Job = job_1512024625412_0010
MapReduce Jobs Launched:
Stage-Stage-1: Map: 5  Reduce: 1   Cumulative CPU: 22.32 sec   HDFS Read: 75549 HDFS Write: 9 SUCCESS
Total MapReduce CPU Time Spent: 22 seconds 320 msec
OK
20000000
Time taken: 25.052 seconds, Fetched: 1 row(s)

从以上输出可以看出,上述SQL语句的运行过程中做map、reduce的动作。下面我们使用Spark做为执行引擎,

hive> set hive.execution.engine=spark;
hive> select count(*) from parquet_part;
Query ID = trafodion_20171214134545_788e5aab-7fc0-4484-8769-69fff1c464f5
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = 13195ebf-4daf-4203-a605-6c0d209c4965

Query Hive on Spark job[0] stages:
0
1

Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2017-12-14 13:46:01,340 Stage-0_0: 0(+2)/5      Stage-1_0: 0/1
2017-12-14 13:46:04,368 Stage-0_0: 0(+2)/5      Stage-1_0: 0/1
2017-12-14 13:46:05,377 Stage-0_0: 3(+1)/5      Stage-1_0: 0/1
2017-12-14 13:46:06,387 Stage-0_0: 4(+0)/5      Stage-1_0: 0/1
2017-12-14 13:46:08,404 Stage-0_0: 4(+1)/5      Stage-1_0: 0/1
2017-12-14 13:46:09,411 Stage-0_0: 5/5 Finished Stage-1_0: 1/1 Finished
Status: Finished successfully in 15.12 seconds
OK
20000000
Time taken: 23.379 seconds, Fetched: 1 row(s)

从上述输出结果可以看出,这条SQL语句的执行过程中已经采用Spark的方式去执行了。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

数据源的港湾

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值