hive on spark Hive3.1.2 on Spark2.4.7单机环境

spark on hive参考这篇文章

准备工作

  • 设备&电脑

电脑(虚拟机): Ubuntu20.04.1 LTS, 已安装open-jdk(1.8), 已安装hive(3.1.2), 已安装hadoop(3.2.2)

  • 安装包
    • https://archive.apache.org/dist/spark/spark-2.4.7/
    • 因为是hive on spark 所以我们使用spark-without-hadoop

安装Spark

  • 解压Spark文件
wget https://archive.apache.org/dist/spark/spark-2.4.7/spark-2.4.7-bin-without-hadoop.tgz
tar -zxvf spark-2.4.7-bin-without-hadoop.tgz -C /data/java

  • 配置Spark环境变量

增加SPARK_HOME环境变量

sudo vim /etc/profile

查看增加后的环境变量(我个人电脑中包含HIVE_HOME 和 HADOOP_HOME的环境变量)

export SPARK_HOME=/data/java/spark-2.4.7-bin-without-hadoop
export PATH=${PATH}:${SPARK_HOME}/bin

配置Spark

Spark单机模式只需要配置spark-env.sh 和 spark-default.conf文件, 其他文件不做处理

配置spark-env.sh

  • 增加TERM, JAVA_HOME, HADOOP_HOME, SPARK_HOME, SPARK_DIST_CLASSPATH等环境变量
:~$ cd /data/java/spark-2.4.7-bin-without-hadoop/conf/

cp ./spark-env.sh.template ./spark-env.sh
vim ./spark-env.sh
  • 查看增加后的内容
vim spark-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_301
export SCALA_HOME=/data/scala/scala-2.11.12
export HADOOP_HOME=/data/java/hadoop-3.2.2
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_HOME=/data/java/spark-2.4.7-bin-without-hadoop
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export HIVE_HOME=/data/java/apache-hive-3.1.2-bin
export MASTER_WEBUI_PORT=8079

配置spark-default.conf

  • 增加spark.executor.memory, spark.driver.cores, spark.driver.maxResultSize内容
cp ./spark-defaults.conf.template ./spark-defaults.conf
vim ./spark-defaults.conf
tail -n 11 ./spark-defaults.conf
# Example:
# spark.master                     spark://master:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

集成Hive3.1.2

修改hive-site.xml

  • 修改hive-site.xml中的hive.execution.engine

  • 新增的property内容

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--JDBC元数据仓库连接字符串-->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <!--JDBC元数据仓库驱动类名-->
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <!--元数据仓库用户名-->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
   <!--元数据仓库密码-->
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>xxx</value>
    <description>password to use against metastore database</description>
  </property>
![请添加图片描述](https://img-blog.csdnimg.cn/d4d65e3102fa43ec87de270c884ff5d5.png)

<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>

  <property>
    <name>hive.enable.spark.execution.engine</name>
    <value>true</value>
  </property>
  <property>
    <name>spark.master</name>
    <value>yarn</value>
  </property>
  <property>
    <name>spark.enentLog.enabled</name>
    <value>true</value>
  </property>


  <property>
    <name>spark.driver.maxResultSize</name>
    <value>0</value>
  </property>
  <property>
    <name>spark.executor.extraJavaOptions</name>
    <value>-XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"</value>
  </property>
  <property>
  <name>hive.spark.client.server.connect.timeout</name>
  <value>300000ms</value>
  </property>
          <property>
    <name>hive.exec.scratchdir</name>
    <value>hdfs://127.0.0.1:9820/user/hive</value>
</property>

        <property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://127.0.0.1:9820/user/hive/warehouse</value>
</property>
<property>
    <name>hive.exec.scratchdir</name>
    <value>hdfs://127.0.0.1:9820/user/hive/warehouse</value>
</property>

        <property>
  <name>hive.metastore.local</name>  //注意这里是false, 不是本地模式
  <value>false</value>
</property>
<property>
  <name>hive.metastore.uris</name>
  <value></value>
</property>
</configuration>

配置 hive 添加 spark 依赖包到到 hive 中,在/bin/hive 中 增加一段

cd /data/java/apache-hive-3.1.2-bin/bin

vim hive

for f in ${SPARK_HOME}/jars/*.jar; do CLASSPATH=${CLASSPATH}:$f; done

请添加图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值