问题:
19/12/24 02:38:57 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 8 for reason Container marked as failed: container_e103_1576150833936_207084_02_000009 on host: xydw35. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_e103_1576150833936_207084_02_000009
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
原因:历史文件冲突,该节点下的历史jar包和文件导致,清理即可
解决:
1.先停掉该节点的NodeManager角色
2.清理缓存
find /data*/yarn/nm/ -name *.jar | xargs rm -rf
rm -rf /var/lib/hadoop-yarn/yarn-nm-recovery-new/*# 可能还要加一步:重启agent
3. 恢复该节点的NodeManager角色