编译Spark3.1.0并集成hive3.1.2和hadoop3.3.0-CSDN博客

本文档详细记录了在Ubuntu20.04.1 LTS环境下，如何编译Spark3.1.0并集成hive3.1.2和hadoop3.3.0的过程。从准备工作到编译Spark，再到配置和启动Spark、Hive，最后总结了关键步骤，以供学习参考。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

编译Spark3.1.0并集成hive3.1.2和hadoop3.3.0

编译Spark3.1.0并集成hive3.1.2和hadoop3.3.0

搭建过程参考网上各种教程, 现在汇总下具体步骤内容。

先上本机运行情况

启动hive

:~$ hive
Hive Session ID = 6eed60ea-639e-4b17-ad3f-4ef008c510f0

Logging initialized using configuration in file:/opt/apache-hive-bin/conf/hive-log4j2.properties Async: true
Hive Session ID = d028c5da-691d-4313-a388-c2c00a4c306b
hive> use hive;
OK
Time taken: 0.453 seconds
hive> set hive.exec.mode.local.auto=true;
hive> select * from hive_table;
OK
hive_table.id	hive_table.name	hive_table.ver	hive_table.package	hive_table.path
1	hadoop	3.3.0	hadoop-3.3.0.tar.gz	/opt/hadoop
2	hive	3.2.1	apache-hive-3.1.2-bin.tar.gz	/opt/apache-hive-bin
3	mysql	8.0.20	mysql-connector-java-8.0.20.jar	/usr/local/mysql
4	spark	2.4.7	spark-2.4.7-bin-without-hadoop.tgz	/opt/spark-bin-without-hadoop
5	spark	3.1.0	spark-bin-build.tgz	/opt/spark-bin-build
Time taken: 1.801 seconds, Fetched: 5 row(s)
hive> set hive.execution.engine=spark;
hive> select distinct name from hive_table;
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Query ID = ***_20210209134130_bf6e4390-f007-4bf9-bb04-5d8c254232d1
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1612848697117_0001
Kill Command = /opt/hadoop/bin/yarn application -kill application_1612848697117_0001
Hive on Spark Session Web UI URL: http://******:45913

Query Hive on Spark job[0] stages: [0, 1]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-0 ........         0      FINISHED      1          1        0        0       0  
Stage-1 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 02/02    [==========================>>] 100%  ELAPSED TIME: 9.08 s     
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 9.08 second(s)
OK
name
hadoop
hive
mysql
spark
Time taken: 27.206 seconds, Fetched: 4 row(s)
hive> exit;
:~$ spark-sql
21/02/09 13:42:33 WARN Utils: Your hostname, *** resolves to a loopback address: 127.0.1.1; using ****** instead (on interface wlp4s0)
21/02/09 13:42:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/02/09 13:42:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/02/09 13:42:38 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/02/09 13:42:38 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
Spark master: local[*], Application Id: local-1612849356371
spark-sql> select distinct name from hive_table;
name                                                                            
mysql
hadoop
hive
spark
Time taken: 5.13 seconds, Fetched 4 row(s)
spark-sql> exit;