配置yarn-site.xml
修改yarn-site.xml,增加如下配置,并分发重启yarn
<!-- 日志聚集功能使能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志保留时间设置7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<!-- 日志服务器 -->
<property>
<name>yarn.log.server.url</name>
<!-- mr job history server url -->
<value>http://localhost:19888/jobhistory/logs</value>
</property>
配置spark-defaults.conf
修改spark-defaults.conf,增加如下配置,并分发
# 开启spark的日志
spark.eventLog.enabled true
# spark日志存储路径
spark.eventLog.dir hdfs://hadoop102:9000/spark/eventLog
# 告诉yarn要跳转的spark的historyServer地址
spark.yarn.historyServer.address=hadoop102:18080
# 设置spark的historyServer端口
spark.history.ui.port=18080
配置spark-env.sh
修改spark-env.sh,增加如下配置,并分发
#设置spark的historyServer端口、存储路径
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080
-Dspark.history.retainedApplications=30
-Dspark.history.fs.logDirectory=hdfs://hadoop102:9000/spark/eventLog"
#设置yarn的配置文件路径,spark on yarn需要
YARN_CONF_DIR=/opt/module/hadoop-2.7.2/etc/hadoop
启动MR History Server
sbin/mr-jobhistory-daemon.sh start historyserver
如果不开启mr history server,会出现访问不到hdfs日志:java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.
测试
上面工作做完之后,重启yarn,可以跑一个spark的测试程序。之后从yarn界面的history按钮可以成功跳转spark的HistoryServer界面。
bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--executor-memory 1G \
--total-executor-cores 2 \
./examples/jars/spark-examples_2.11-2.4.5.jar \
100
点击查看日志,没有问题: