目录
基础环境搭建
-
基于前面的文章,hadoop已经搭建好,下面我们将进行spark on yarn搭建
-
下载并配置scala,我们选择scala-2.12.8/这个版本即可,下载解压scala即可
配置环境
scala
export SCALA_HOME=/opt/bigdata/scala/default
spark配置
spark下载
-
解压
tar -zxvf spark-2.4.3-bin-hadoop2.7.tgz -C ./
spark配置文件
-
spark-env.sh配置
cp spark-env.sh.template spark-env.sh
vim spark-env.sh 配置一下信息
export JAVA_HOME=/usr/local/java_1.8.0_121
#SCALA环境变量
export SCALA_HOME=/opt/bigdata/scala/default
#Hadoop路径
export HADOOP_HOME=/opt/bigdata/hadoop/default
#Hadoop配置目录
export HADOOP_CONF_DIR= H A D O O P H O M E / e t c / h a d o o p e x p o r t S P A R K Y A R N U S E R E N V = HADOOP_HOME/etc/hadoop export SPARK_YARN_USER_ENV= HADOOPHOME/etc/hadoopexportSPARKYARNUSERENV={HADOOP_CONF_DIR}
export SPARK_HOME=/opt/bigdata/spark/default
export HIVE_HOME=/opt/bigdata/hive/default
export HIVE_CONF_DIR= H I V E H O M E / c o n f e x p o r t P A T H = {HIVE_HOME}/conf export PATH= HIVEHOME/confexportPATH={JAVA_HOME}/bin: S C A L A H O M E / b i n : {SCALA_HOME}/bin: SCALAHOME/bin:{HADOOP_HOME}/bin: S P A R K H O M E / b i n : {SPARK_HOME}/bin: SPARKHOME/bin:{HIVE_HOME}/bin:$PATH -
spark-defaults.conf配置
cp spark-defaults.conf.template spark-defaults.conf
vim spark-defaults.conf 配置如下信息spark job log收集,收集到hdfs上
spark.eventLog.enabled true
spark.eventLog.dir hdfs://ecs-6531-0002.novalocal:9000/tmp/spark/eventLogs
spark.eventLog.compress true
#默认序列化方式
spark.serializer org.apache.spark.serializer.KryoSerializer部署模式yarn
spark.master yarn
默认driver核心数
spark.driver.cores 1
默认driver内存数
spark.driver.memory 800m
默认executer核心数
spark.executor.cores 1
默认executer内存数
spark.executor.memory 1000m
默认executer实例数
spark.executor.instances 1
hive仓库地址
spark.sql.warehouse.dir hdfs://ecs-6531-0002.novalocal:9000/user/root/warehouse
-
拷贝hive-site.xml到spark conf下,因为要连接hive
cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf/hive-site.xml
环境配置
-
vim /etc/profile
spark 配置
export SPARK_YARN_USER_ENV= H A D O O P C O N F D I R e x p o r t S P A R K H O M E = / o p t / b i g d a t a / s p a r k / d e f a u l t e x p o r t P A T H = {HADOOP_CONF_DIR} export SPARK_HOME=/opt/bigdata/spark/default export PATH= HADOOPCONFDIRexportSPARKHOME=/opt/bigdata/spark/defaultexportPATH={SCALA_HOME}/bin: S P A R K H O M E / b i n : {SPARK_HOME}/bin: SPARKHOME/bin:PATH
拷贝包
-
拷贝spark shuffle on yarn到包到yarn的目录下
cp /opt/bigdata/spark/spark-2.4.3-bin-hadoop2.7/yarn/spark-2.4.3-yarn-shuffle.jar /opt/bigdata/hadoop/hadoop-3.2.0/share/hadoop/yarn/
yarn配置
-
配置yarn-site.xml文件
需要把spark_shuffle加上
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark_shuffle</value> </property>
-
重启yarn
spark启动测试
-
直接输入spark-sql启动
-
测试查询hive
-
使用spark提交任务 直接spark-submit jar包即可