搭建Hive3.1.2 on Spark2.4.7单机环境

最新推荐文章于 2024-05-30 13:02:37 发布

hxtog

最新推荐文章于 2024-05-30 13:02:37 发布

阅读量2.4k

点赞数 2

分类专栏： hive on spark 大数据开发文章标签： hive spark hadoop mapreduce 大数据

本文链接：https://blog.csdn.net/hxtog/article/details/109272871

版权

本文档详细记录了在Ubuntu20.04.1 LTS系统上，如何搭建Hive3.1.2与Spark2.4.7的单机环境。内容包括安装Spark，配置Spark环境，集成Hive，修改hive-site.xml，复制jar包到Hive目录，以及启动Hive on Spark的步骤。过程中需要注意环境变量的设置，特别是hive.execution.engine的配置，以确保Hive使用Spark作为执行引擎。

摘要由CSDN通过智能技术生成

文章目录

搭建Hive3.1.2 on Spark2.4.7单机环境

搭建Hive3.1.2 on Spark2.4.7单机环境

搭建过程参考网上各种教程, 现在汇总下具体步骤内容。

先上本机运行情况

执行Hive on Spark

:~$ hive
Hive Session ID = 6eed60ea-639e-4b17-ad3f-4ef008c510f0

Logging initialized using configuration in file:/opt/apache-hive-bin/conf/hive-log4j2.properties Async: true
Hive Session ID = d028c5da-691d-4313-a388-c2c00a4c306b
hive> use hive;
OK
Time taken: 0.451 seconds
hive> show tables;
OK
tab_name
hive_table
Time taken: 0.142 seconds, Fetched: 1 row(s)
hive> desc hive_table;
OK
col_name	data_type	comment
id                  	int                 	                    
name                	string              	                    
ver                 	string              	                    
package             	string              	                    
path                	string              	                    
Time taken: 0.117 seconds, Fetched: 5 row(s)
hive> select * from hive_table;
OK
hive_table.id	hive_table.name	hive_table.ver	hive_table.package	hive_table.path
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	/opt/hadoop
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	/opt/apache-hive-bin
3	mysql	8.0.20	mysql-server	/usr/local/mysql
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	/opt/spark-bin-without-hadoop
Time taken: 1.25 seconds, Fetched: 4 row(s)
hive> select t1.id, t1.name, t1.ver, t1.package, row_number() over(partition by t1.id order by t2.id desc) as row_id, count(1) over(partition by t1.name) as cnt from hive_table t1 left join hive_table t2 on 1 = 1 where t2.id is not null;
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Warning: Map Join MAPJOIN[19][bigTable=?] in task 'Stage-1:MAPRED' is a cross product
Query ID = ***_20201025134120_9a652c42-4ceb-47e5-bcca-1b213b8c6cd7
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1603552715345_0005
Kill Command = /opt/hadoop/bin/yarn application -kill application_1603552715345_0005
Hive on Spark Session Web UI URL: http://localhost:4040

Query Hive on Spark job[0] stages: [0]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-0 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 7.07 s     
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 7.07 second(s)
Launching Job 2 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1603552715345_0005
Kill Command = /opt/hadoop/bin/yarn application -kill application_1603552715345_0005
Hive on Spark Session Web UI URL: http://localhost:4040

Query Hive on Spark job[1] stages: [1, 2, 3]
Spark job[1] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-1 ........         0      FINISHED      1          1        0        0       0  
Stage-2 ........         0      FINISHED      1          1        0        0       0  
Stage-3 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 03/03    [==========================>>] 100%  ELAPSED TIME: 8.07 s     
--------------------------------------------------------------------------------------
Spark job[1] finished successfully in 8.07 second(s)
OK
t1.id	t1.name	t1.ver	t1.package	row_id	cnt
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	1	4
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	2	4
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	3	4
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	4	4
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	1	4
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	2	4
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	3	4
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	4	4
3	mysql	8.0.20	mysql-server	1	4
3	mysql	8.0.20	mysql-server	2	4
3	mysql	8.0.20	mysql-server	3	4
3	mysql	8.0.20	mysql-server	4	4
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	1	4
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	2	4
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	3	4
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	4	4
Time taken: 30.586 seconds, Fetched: 16 row(s)

执行map reduce

hive> set hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may no

最低0.47元/天解锁文章

hxtog

关注

2
点赞
踩
17

收藏

觉得还不错? 一键收藏
3
评论
搭建Hive3.1.2 on Spark2.4.7单机环境

文章目录搭建Hive3.1.2 on Spark2.4.7单机环境先上本机运行情况准备工作安装Spark配置Spark配置spark-env.sh配置spark-default.conf启动Spark集成Hive3.1.2修改hive-site.xml复制jar包到hive安装lib文件夹下删除spark下orc-core-1.5.5-nohive.jar文件启动hive on spark总结搭建Hive3.1.2 on Spark2.4.7单机环境搭建过程参考网上各种教程, 现在汇总下具体步骤内容。
复制链接

扫一扫

专栏目录