准备工作
- 设备&电脑
电脑(虚拟机): Ubuntu20.04.1 LTS, 已安装open-jdk(1.8), 已安装hive(3.1.2), 已安装hadoop(3.2.2)
- 安装包
- https://archive.apache.org/dist/spark/spark-2.4.7/
- 因为是hive on spark 所以我们使用spark-without-hadoop
安装Spark
- 解压Spark文件
wget https://archive.apache.org/dist/spark/spark-2.4.7/spark-2.4.7-bin-without-hadoop.tgz
tar -zxvf spark-2.4.7-bin-without-hadoop.tgz -C /data/java
- 配置Spark环境变量
增加SPARK_HOME环境变量
sudo vim /etc/profile
查看增加后的环境变量(我个人电脑中包含HIVE_HOME 和 HADOOP_HOME的环境变量)
export SPARK_HOME=/data/java/spark-2.4.7-bin-without-hadoop
export PATH=${PATH}:${SPARK_HOME}/bin
配置Spark
Spark单机模式只需要配置spark-env.sh 和 spark-default.conf文件, 其他文件不做处理
配置spark-env.sh
- 增加TERM, JAVA_HOME, HADOOP_HOME, SPARK_HOME, SPARK_DIST_CLASSPATH等环境变量
:~$ cd /data/java/spark-2.4.7-bin-without-hadoop/conf/
cp ./spark-env.sh.template ./spark-env.sh
vim ./spark-env.sh
- 查看增加后的内容
vim spark-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_301
export SCALA_HOME=/data/scala/scala-2.11.12
export HADOOP_HOME=/data/java/hadoop-3.2.2
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_HOME=/data/java/spark-2.4.7-bin-without-hadoop
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export HIVE_HOME=/data/java/apache-hive-3.1.2-bin
export MASTER_WEBUI_PORT=8079
配置spark-default.conf
- 增加spark.executor.memory, spark.driver.cores, spark.driver.maxResultSize内容
cp ./spark-defaults.conf.template ./spark-defaults.conf
vim ./spark-defaults.conf
tail -n 11 ./spark-defaults.conf
# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled true
# spark.eventLog.dir hdfs://namenode:8021/directory
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
集成Hive3.1.2
修改hive-site.xml
-
修改hive-site.xml中的hive.execution.engine
-
新增的property内容
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--JDBC元数据仓库连接字符串-->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<!--JDBC元数据仓库驱动类名-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<!--元数据仓库用户名-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<!--元数据仓库密码-->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>xxx</value>
<description>password to use against metastore database</description>
</property>
![请添加图片描述](https://img-blog.csdnimg.cn/d4d65e3102fa43ec87de270c884ff5d5.png)
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>hive.enable.spark.execution.engine</name>
<value>true</value>
</property>
<property>
<name>spark.master</name>
<value>yarn</value>
</property>
<property>
<name>spark.enentLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.driver.maxResultSize</name>
<value>0</value>
</property>
<property>
<name>spark.executor.extraJavaOptions</name>
<value>-XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"</value>
</property>
<property>
<name>hive.spark.client.server.connect.timeout</name>
<value>300000ms</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://127.0.0.1:9820/user/hive</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://127.0.0.1:9820/user/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://127.0.0.1:9820/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.local</name> //注意这里是false, 不是本地模式
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value></value>
</property>
</configuration>
配置 hive 添加 spark 依赖包到到 hive 中,在/bin/hive 中 增加一段
cd /data/java/apache-hive-3.1.2-bin/bin
vim hive
for f in ${SPARK_HOME}/jars/*.jar; do CLASSPATH=${CLASSPATH}:$f; done