Hive on Spark编译
1)从官网下载Spark源码并解压
下载地址:
https://www.apache.org/dyn/closer.lua/spark/spark-2.4.5/spark-2.4.5.tgz
2)上传并解压spark
3)进入spark解压后的目录
4)执行编译命令
[@hadoop101 spark-2.4.5]$ ./dev/make-distribution.sh --name without-hive --tgz -Pyarn -Phadoop-3.1 -Dhadoop.version=3.1.3 -Pparquet-provided -Porc-provided -Phadoop-provided
5)等待编译完成,spark-2.4.5-bin-without-hive.tgz为最终文件
Hive on Spark配置
1)解压spark-2.4.5-bin-without-hive.tgz
tar -zxf /opt/software/spark-2.4.5-bin-without-hive.tgz -C /opt/module
mv /opt/module/spark-2.4.5-bin-without-hive /opt/module/spark
2)配置SPARK_HOME环境变量
sudo vim /etc/profile.d/my_env.sh
添加如下内容
export SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/bin
source 使其生效
source /etc/profile.d/my_env.sh
3)配置spark运行环境
mv /opt/module/spark/conf/spark-env.sh.template /opt/module/spark/conf/spark-env.sh
vim /opt/module/spark/conf/spark-env.sh
添加如下内容
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
4)新建spark配置文件
vim /opt/module/hive/conf/spark-defaults.conf
添加如下内容
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop101:8020/spark-history
spark.executor.memory 1g
spark.driver.memory 1g
5)在HDFS创建如下路径
hadoop fs -mkdir /spark-history
6)上传Spark依赖到HDFS
hadoop fs -mkdir /spark-jars
hadoop fs -put /opt/module/spark/jars/* /spark-jars
7)修改hive-site.xml
<!--Spark依赖位置-->
<property>
<name>spark.yarn.jars</name>
<value>hdfs://hadoop101:8020/spark-jars/*</value>
</property>
<!--Hive执行引擎-->
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<!--Hive和spark连接超时时间-->
<property>
<name>hive.spark.client.connect.timeout</name>
<value>10000ms</value>
</property>
注意:hive.spark.client.connect.timeout的默认值是1000ms,如果执行hive的insert语句时,抛如下异常,可以调大该参数到10000ms
FAILED: SemanticException Failed to get a
spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to
create Spark client for Spark session d9e0224c-3d14-4bf4-95bc-ee3ec56df48e
Hive on Spark测试
1)启动hive客户端
bin/hive
2)创建一张测试表
hive (default)> create external table student(id int, name string) location '/student';
3)通过insert测试效果
hive (default)> insert into table student values(1,'abc');
4)如果插入数据过程/tmp/atguigu/hive.log文件中抛如下异常
Caused by: javax.security.sasl.SaslException:
Server closed before SASL negotiation finished.
5)修改/opt/module/hadoop-3.1.3/etc/hadoop/capacity-scheduler.xml中am启动的最大资源配置。分发、并重新启动resourcemanager
vim capacity-scheduler.xml
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>1</value>
</property>