搭建Hive3.1.2 on Spark2.4.7单机环境

本文档详细记录了在Ubuntu20.04.1 LTS系统上,如何搭建Hive3.1.2与Spark2.4.7的单机环境。内容包括安装Spark,配置Spark环境,集成Hive,修改hive-site.xml,复制jar包到Hive目录,以及启动Hive on Spark的步骤。过程中需要注意环境变量的设置,特别是hive.execution.engine的配置,以确保Hive使用Spark作为执行引擎。
摘要由CSDN通过智能技术生成

搭建Hive3.1.2 on Spark2.4.7单机环境

搭建过程参考网上各种教程, 现在汇总下具体步骤内容。

先上本机运行情况

  • 执行Hive on Spark
:~$ hive
Hive Session ID = 6eed60ea-639e-4b17-ad3f-4ef008c510f0

Logging initialized using configuration in file:/opt/apache-hive-bin/conf/hive-log4j2.properties Async: true
Hive Session ID = d028c5da-691d-4313-a388-c2c00a4c306b
hive> use hive;
OK
Time taken: 0.451 seconds
hive> show tables;
OK
tab_name
hive_table
Time taken: 0.142 seconds, Fetched: 1 row(s)
hive> desc hive_table;
OK
col_name	data_type	comment
id                  	int                 	                    
name                	string              	                    
ver                 	string              	                    
package             	string              	                    
path                	string              	                    
Time taken: 0.117 seconds, Fetched: 5 row(s)
hive> select * from hive_table;
OK
hive_table.id	hive_table.name	hive_table.ver	hive_table.package	hive_table.path
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	/opt/hadoop
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	/opt/apache-hive-bin
3	mysql	8.0.20	mysql-server	/usr/local/mysql
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	/opt/spark-bin-without-hadoop
Time taken: 1.25 seconds, Fetched: 4 row(s)
hive> select t1.id, t1.name, t1.ver, t1.package, row_number() over(partition by t1.id order by t2.id desc) as row_id, count(1) over(partition by t1.name) as cnt from hive_table t1 left join hive_table t2 on 1 = 1 where t2.id is not null;
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Warning: Map Join MAPJOIN[19][bigTable=?] in task 'Stage-1:MAPRED' is a cross product
Query ID = ***_20201025134120_9a652c42-4ceb-47e5-bcca-1b213b8c6cd7
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1603552715345_0005
Kill Command = /opt/hadoop/bin/yarn application -kill application_1603552715345_0005
Hive on Spark Session Web UI URL: http://localhost:4040

Query Hive on Spark job[0] stages: [0]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-0 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 7.07 s     
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 7.07 second(s)
Launching Job 2 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1603552715345_0005
Kill Command = /opt/hadoop/bin/yarn application -kill application_1603552715345_0005
Hive on Spark Session Web UI URL: http://localhost:4040

Query Hive on Spark job[1] stages: [1, 2, 3]
Spark job[1] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-1 ........         0      FINISHED      1          1        0        0       0  
Stage-2 ........         0      FINISHED      1          1        0        0       0  
Stage-3 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 03/03    [==========================>>] 100%  ELAPSED TIME: 8.07 s     
--------------------------------------------------------------------------------------
Spark job[1] finished successfully in 8.07 second(s)
OK
t1.id	t1.name	t1.ver	t1.package	row_id	cnt
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	1	4
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	2	4
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	3	4
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	4	4
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	1	4
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	2	4
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	3	4
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	4	4
3	mysql	8.0.20	mysql-server	1	4
3	mysql	8.0.20	mysql-server	2	4
3	mysql	8.0.20	mysql-server	3	4
3	mysql	8.0.20	mysql-server	4	4
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	1	4
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	2	4
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	3	4
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	4	4
Time taken: 30.586 seconds, Fetched: 16 row(s)
  • 执行map reduce
hive> set hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may no
  • 2
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值