yarn-session.sh -m yarn-cluster或flink run -m yarn-cluster examples/batch/WordCount.jar报错
背景:
执行以下两个命令报错:
yarn-session.sh -m yarn-cluster
flink run -m yarn-cluster examples/batch/WordCount.jar
报错信息如下:
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1679471041834_0001 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1679471041834_0001_000001 exited with exitCode: 127
Failing this attempt.Diagnostics: [2023-03-22 15:44:47.568]Exception from container-launch.
Container id: container_1679471041834_0001_01_000001
Exit code: 127
[2023-03-22 15:44:47.597]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
[2023-03-22 15:44:47.597]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
排查过程:
检查hadoop是否正常:
hdfs dfsadmin -report
执行yarn测试程序:
yarn jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.3.jar pi 1 1
如果以上两个都没有问题,继续进行后续的检查
登录yarn web页面:一般是第一个节点的8088端口
点击FAILED》点击对应的任务ID
然后点击Logs
分别查看Local Logs下面的几个日志文件,一般异常信息都记录在里面
我遇到的问题是我的hadoop-env.sh配置文件中JAVA_HOME配置有问题,导致程序无法运行
正确的配置信息如下:
修改hadoop-env.sh配置之后,重启hadoop集群即可
然后继续执行下面的命令应该会正常的
yarn-session.sh -m yarn-cluster
flink run -m yarn-cluster examples/batch/WordCount.jar
本片内容重点是排查问题的过程和思路,希望对于各位同学有帮助