基于Yarn模式下spark作业执行遇到的问题
1.spark作业提交脚本
#!/bin/sh
/home/hadoop/spark-2.1-hadoop2.6/bin/spark-submit \
--class cn.xx.bigdata.test.xxAppAcessLog \
--master yarn \
--deploy-mode cluster\
--executor-memory 1g \
--total-executor-cores 2 \
/home/hadoop/cn.xx.bigdata-1.0-SNAPSHOT.jar \
hdfs://hdp-xx-01:8020/access.log \
hdfs://hdp-xx-01:8020/out00005
2.出现异常日志
20/05/24 16:39:55 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1590309594170
final status: UNDEFINED
tracking URL: http://hdp-xx-01:8088/proxy/application_1590301112271_0006/
user: hadoop
20/05/24 16:39:56 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:39:57 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:39:58 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:39:59 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:40:00 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:40:01 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:40:02 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:40:03 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:40:04 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:40:05 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:40:06 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:40:07 INFO Client: Application report for application_1590301112271_0006 (state: ACCEPTED)
20/05/24 16:40:08 INFO Client: Application report for application_1590301112271_0006 (state: FAILED)
20/05/24 16:40:08 INFO Client:
client token: N/A
diagnostics: Application application_1590301112271_0006 failed 2 times due to AM Container for appattempt_1590301112271_0006_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://hdp-qm-01:8088/cluster/app/application_1590301112271_0006Then, click on links to logs of each attempt.
Diagnostics: File file:/tmp/spark-9dcafed9-eec4-4c72-b397-838ffb0acd88/__spark_libs__4666794781743350229.zip does not exist
java.io.FileNotFoundException: File file:/tmp/spark-9dcafed9-eec4-4c72-b397-838ffb0acd88/__spark_libs__4666794781743350229.zip does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1590309594170
final status: FAILED
tracking URL: http://hdp-xx-01:8088/cluster/app/application_1590301112271_0006
user: hadoop
20/05/24 16:40:08 INFO Client: Deleted staging directory file:/home/hadoop/.sparkStaging/application_1590301112271_0006
二、问题处理过程
1.查到spark-en.sh配置文件 HADOOP_HOME出问题,/home/hadoop/hadoop-2.6.0. 修改为当前私有的hadoop版本即可。我的为hadoop-2.7.2(注意发送到集群所有节点)
export HADOOP_HOME=/home/hadoop/hadoop-2.7.2
export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.2/etc/hadoop
2.继续提交spark作业测试,同样的log提示
....................
Diagnostics: File file:/tmp/spark-9dcafed9-eec4-4c72-b397-838ffb0acd88/__spark_libs__4666794781743350229.zip does not exist
............................
3.查找到作业文件提交模式为standalone模式
修改前:为standalone模式
val conf = new SparkConf().setAppName("AppAcessLog").setMaster("spark://hdp-qm-01:7077")
修改后:可以指定yarn模式或者不设置master
val conf = new SparkConf().setAppName("AppAcessLog")
4.继续提交spark作业测试,ok了!
0/05/24 17:10:17 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.80.111
ApplicationMaster RPC port: 0
queue: default
start time: 1590311384675
final status: SUCCEEDED
tracking URL: http://hdp-xx-01:8088/proxy/application_1590311071085_0001/
user: hadoop
20/05/24 17:10:17 INFO util.ShutdownHookManager: Shutdown hook called
20/05/24 17:10:17 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-38d2cf56-faf6-4027-904b-b9573c049a1e