文章目录
搭建Hive3.1.2 on Spark2.4.7单机环境
搭建过程参考网上各种教程, 现在汇总下具体步骤内容。
先上本机运行情况
- 执行Hive on Spark
:~$ hive
Hive Session ID = 6eed60ea-639e-4b17-ad3f-4ef008c510f0
Logging initialized using configuration in file:/opt/apache-hive-bin/conf/hive-log4j2.properties Async: true
Hive Session ID = d028c5da-691d-4313-a388-c2c00a4c306b
hive> use hive;
OK
Time taken: 0.451 seconds
hive> show tables;
OK
tab_name
hive_table
Time taken: 0.142 seconds, Fetched: 1 row(s)
hive> desc hive_table;
OK
col_name data_type comment
id int
name string
ver string
package string
path string
Time taken: 0.117 seconds, Fetched: 5 row(s)
hive> select * from hive_table;
OK
hive_table.id hive_table.name hive_table.ver hive_table.package hive_table.path
1 hadoop 3.3.0 hadoop-3.3.0.tar.gz /opt/hadoop
2 hive 3.2.1 apache-hive-3.1.2-bin.tar.gz /opt/apache-hive-bin
3 mysql 8.0.20 mysql-server /usr/local/mysql
4 spark 2.4.7 spark-2.4.7-bin-without-hadoop.tgz /opt/spark-bin-without-hadoop
Time taken: 1.25 seconds, Fetched: 4 row(s)
hive> select t1.id, t1.name, t1.ver, t1.package, row_number() over(partition by t1.id order by t2.id desc) as row_id, count(1) over(partition by t1.name) as cnt from hive_table t1 left join hive_table t2 on 1 = 1 where t2.id is not null;
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Warning: Map Join MAPJOIN[19][bigTable=?] in task 'Stage-1:MAPRED' is a cross product
Query ID = ***_20201025134120_9a652c42-4ceb-47e5-bcca-1b213b8c6cd7
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Running with YARN Application = application_1603552715345_0005
Kill Command = /opt/hadoop/bin/yarn application -kill application_1603552715345_0005
Hive on Spark Session Web UI URL: http://localhost:4040
Query Hive on Spark job[0] stages: [0]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED
--------------------------------------------------------------------------------------
Stage-0 ........ 0 FINISHED 1 1 0 0 0
--------------------------------------------------------------------------------------
STAGES: 01/01 [==========================>>] 100% ELAPSED TIME: 7.07 s
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 7.07 second(s)
Launching Job 2 out of 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Running with YARN Application = application_1603552715345_0005
Kill Command = /opt/hadoop/bin/yarn application -kill application_1603552715345_0005
Hive on Spark Session Web UI URL: http://localhost:4040
Query Hive on Spark job[1] stages: [1, 2, 3]
Spark job[1] status = RUNNING
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED
--------------------------------------------------------------------------------------
Stage-1 ........ 0 FINISHED 1 1 0 0 0
Stage-2 ........ 0 FINISHED 1 1 0 0 0
Stage-3 ........ 0 FINISHED 1 1 0 0 0
--------------------------------------------------------------------------------------
STAGES: 03/03 [==========================>>] 100% ELAPSED TIME: 8.07 s
--------------------------------------------------------------------------------------
Spark job[1] finished successfully in 8.07 second(s)
OK
t1.id t1.name t1.ver t1.package row_id cnt
1 hadoop 3.3.0 hadoop-3.3.0.tar.gz 1 4
1 hadoop 3.3.0 hadoop-3.3.0.tar.gz 2 4
1 hadoop 3.3.0 hadoop-3.3.0.tar.gz 3 4
1 hadoop 3.3.0 hadoop-3.3.0.tar.gz 4 4
2 hive 3.2.1 apache-hive-3.1.2-bin.tar.gz 1 4
2 hive 3.2.1 apache-hive-3.1.2-bin.tar.gz 2 4
2 hive 3.2.1 apache-hive-3.1.2-bin.tar.gz 3 4
2 hive 3.2.1 apache-hive-3.1.2-bin.tar.gz 4 4
3 mysql 8.0.20 mysql-server 1 4
3 mysql 8.0.20 mysql-server 2 4
3 mysql 8.0.20 mysql-server 3 4
3 mysql 8.0.20 mysql-server 4 4
4 spark 2.4.7 spark-2.4.7-bin-without-hadoop.tgz 1 4
4 spark 2.4.7 spark-2.4.7-bin-without-hadoop.tgz 2 4
4 spark 2.4.7 spark-2.4.7-bin-without-hadoop.tgz 3 4
4 spark 2.4.7 spark-2.4.7-bin-without-hadoop.tgz 4 4
Time taken: 30.586 seconds, Fetched: 16 row(s)
- 执行map reduce
hive> set hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may no