Spark Yarn-cluster在生产环境部署, 同时支持参数可配置化方法
在Spark中,有Yarn-Client和Yarn-Cluster两种模式可以运行在Yarn上,通常Yarn-cluster适用于生产环境,而Yarn-Cluster更适用于交互,调试模式
提示:前提条件有hadoop集群, 可以在yarn上运行Job
文章目录
Spark Yarn-cluster与Yarn-client
第一步: 下载spark jar包
例如:spark下载地址: https://spark.apache.org/downloads.html
提示:以下是本篇文章正文内容,下面案例可供参考
第二步:修改spark-env.sh 文件
添加:
export JAVA_HOME=/usr/java/jdk1.8.0_144
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=fdc08:2181,fdc09:2181,fdc10:2181 -Dspark.deploy.zookeeper.dir=/spark"
### Let's run everything with JVM runtime, instead of Scala
export SPARK_LAUNCH_WITH_SCALA=0
#export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib
#export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib
export SPARK_MASTER_WEBUI_PORT=18080
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_PORT=7078
export SPARK_WORKER_WEBUI_PORT=18081
export SPARK_WORKER_DIR=/var/run/spark/work
export SPARK_LOG_DIR=/var/log/spark
export SPARK_PID_DIR='/var/run/spark/'
export SPARK_LOCAL_DIRS=/data/spark/tmp
#export SPARK_WORKER_CORES=7
#export SPARK_WORKER_MEMORY=42g
第三步: spark shell命令
on yarn-cluster 模式运行
/Application/local/spark/bin/spark-submit \
--master yarn \
--name MainFabIndicatorErrorReportService \
--deploy-mode cluster \
--queue root.root \
--driver-cores 1 \
--driver-memory 10G \
--driver-java-options "-XX:PermSize=256M -XX:MaxPermSize=256M" \
--num-executors 15 \
--executor-cores 10 \
--executor-memory 40G \
--conf spark.yarn.maxAppAttempts=1 \
--conf spark.storage.memoryFraction=0.5 \
--conf spark.storage.unrollFraction=0.3 \
--conf spark.cleaner.ttl=-1 \
--conf spark.sql.shuffle.partitions=2000 \
--conf spark.network.timeout=1800s \
--conf spark.yarn.submit.waitAppCompletion=false \
--conf "spark.executor.extraJavaOptions=-XX:PermSize=128m -XX:MaxPermSize=128m -XX:NewSize=2g -XX:MaxNewSize=2g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
--class com.gosun.execute.Analysis \
--files /data/conf/MainFabSparkReport.properties \
--jars ${jars} MainFabSparkReport.properties 2>&1 >/dev/null
注释: 需要修改
1: --files /data/conf/MainFabSparkReport.properties
2: 添加 MainFabSparkReport.properties参数
on yarn-client 模式
/Application/local/spark/bin/spark-submit \
--master yarn \
--name analysis_${yesterday} \
--deploy-mode client \
--queue root.root \
--driver-cores 1 \
--driver-memory 10G \
--driver-java-options "-XX:PermSize=256M -XX:MaxPermSize=256M" \
--num-executors 15 \
--executor-cores 10 \
--executor-memory 40G \
--conf spark.yarn.maxAppAttempts=1 \
--conf spark.storage.memoryFraction=0.5 \
--conf spark.storage.unrollFraction=0.3 \
--conf spark.cleaner.ttl=-1 \
--conf spark.sql.shuffle.partitions=2000 \
--conf spark.network.timeout=1800s \
--conf spark.yarn.submit.waitAppCompletion=false \
--conf "spark.executor.extraJavaOptions=-XX:PermSize=128m -XX:MaxPermSize=128m -XX:NewSize=2g -XX:MaxNewSize=2g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
--class com.gosun.execute.Analysis \
--files /data/conf/MainFabSparkReport.properties \
--jars ${jars} MainFabSparkReport.properties 2>&1 >/dev/null
执行shell启动spark命令运行结果:
spark读取外部配置文件,实现可配置化
spark代码demo:
shell脚本部署:
注释: 需要修改
1: --files /data/conf/MainFabSparkReport.properties
2: 添加 MainFabSparkReport.properties参数