文章目录
编译Spark3.1.0并集成hive3.1.2和hadoop3.3.0
搭建过程参考网上各种教程, 现在汇总下具体步骤内容。
先上本机运行情况
- 启动hive
:~$ hive
Hive Session ID = 6eed60ea-639e-4b17-ad3f-4ef008c510f0
Logging initialized using configuration in file:/opt/apache-hive-bin/conf/hive-log4j2.properties Async: true
Hive Session ID = d028c5da-691d-4313-a388-c2c00a4c306b
hive> use hive;
OK
Time taken: 0.453 seconds
hive> set hive.exec.mode.local.auto=true;
hive> select * from hive_table;
OK
hive_table.id hive_table.name hive_table.ver hive_table.package hive_table.path
1 hadoop 3.3.0 hadoop-3.3.0.tar.gz /opt/hadoop
2 hive 3.2.1 apache-hive-3.1.2-bin.tar.gz /opt/apache-hive-bin
3 mysql 8.0.20 mysql-connector-java-8.0.20.jar /usr/local/mysql
4 spark 2.4.7 spark-2.4.7-bin-without-hadoop.tgz /opt/spark-bin-without-hadoop
5 spark 3.1.0 spark-bin-build.tgz /opt/spark-bin-build
Time taken: 1.801 seconds, Fetched: 5 row(s)
hive> set hive.execution.engine=spark;
hive> select distinct name from hive_table;
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Query ID = ***_20210209134130_bf6e4390-f007-4bf9-bb04-5d8c254232d1
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Running with YARN Application = application_1612848697117_0001
Kill Command = /opt/hadoop/bin/yarn application -kill application_1612848697117_0001
Hive on Spark Session Web UI URL: http://******:45913
Query Hive on Spark job[0] stages: [0, 1]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED
--------------------------------------------------------------------------------------
Stage-0 ........ 0 FINISHED 1 1 0 0 0
Stage-1 ........ 0 FINISHED 1 1 0 0 0
--------------------------------------------------------------------------------------
STAGES: 02/02 [==========================>>] 100% ELAPSED TIME: 9.08 s
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 9.08 second(s)
OK
name
hadoop
hive
mysql
spark
Time taken: 27.206 seconds, Fetched: 4 row(s)
hive> exit;
:~$ spark-sql
21/02/09 13:42:33 WARN Utils: Your hostname, *** resolves to a loopback address: 127.0.1.1; using ****** instead (on interface wlp4s0)
21/02/09 13:42:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/02/09 13:42:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/02/09 13:42:38 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/02/09 13:42:38 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
Spark master: local[*], Application Id: local-1612849356371
spark-sql> select distinct name from hive_table;
name
mysql
hadoop
hive
spark
Time taken: 5.13 seconds, Fetched 4 row(s)
spark-sql> exit;
- 安装Hadoop3.3.0和hive3.1.2
详细内容参考本人之前博客: