Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException

本文记录了一次Hadoop MapReduce任务运行失败的排查过程,发现是由于yarn-site.xml中配置了未验证的yarn.application.classpath参数导致。通过注释掉该配置解决了问题。

新搭建了个集群,搭建完了,运行

hadoop jar /opt/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar pi 10 100

测试一下,却爆出下面的错误:

2014-08-24 02:24:02,456 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1408860487557_0002 failed 2 times due to AM Container for appattempt_1408860487557_0002_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
        at org.apache.hadoop.util.Shell.run(Shell.java:418)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
2014-08-24 02:24:02,457 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1408860487557_0002 State change from FINAL_SAVING to FAILED
2014-08-24 02:24:02,457 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=mapred   OPERATION=Application Finished - Failed TARGET=RMAppManager     RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED       PERMISSIONS=Application application_1408860487557_0002 failed 2 times due to AM Container for appattempt_1408860487557_0002_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
        at org.apache.hadoop.util.Shell.run(Shell.java:418)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

想到yarn-site.xml中使用了一个以前没配置过的参数:

<property>
    <name>yarn.application.classpath</name>
    <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
    </value>
  </property>


注释掉这个配置,重新运行MR任务,不报错了,毋庸置疑就是路径的问题了。先记录下来,以后再找具体原因。

2025-09-24 16:49:28,764 WARN [ContainersLauncher #100] org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e30_1713927644429_718401_01_000005 is : 1 2025-09-24 16:49:28,764 WARN [ContainersLauncher #100] org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_e30_1713927644429_718401_01_000005 and exit code: 1 ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:294) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:543) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:355) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:104) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) 2025-09-24 16:49:28,765 WARN [ContainersLauncher #100] org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container launch failed : Container exited with a non-zero exit code 1. 2025-09-24 16:49:28,803 WARN [ContainersLauncher #100] org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: delete returned false for path: [/data/emr/yarn/local/usercache/hadoop/appcache/application_1713927644429_718401/container_e30_1713927644429_718401_01_000005/sysfs] 这个是什么信息
最新发布
09-25
starting org.apache.spark.deploy.master.Master, logging to /data/lakedata/spark/logs/spark-gkzx-org.apache.spark.deploy.master.Master-1-gkzx.out localhost: /data/lakedata/spark/conf/spark-env.sh: line 87: hadoop: command not found localhost: starting org.apache.spark.deploy.worker.Worker, logging to /data/lakedata/spark/logs/spark-gkzx-org.apache.spark.deploy.worker.Worker-1-gkzx.out localhost: failed to launch: nice -n 0 /data/lakedata/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://192.168.0.61:7077 localhost: at org.apache.spark.util.ShutdownHookManager$.<clinit>(ShutdownHookManager.scala) localhost: at org.apache.spark.util.SparkUncaughtExceptionHandler.uncaughtException(SparkUncaughtExceptionHandler.scala:41) localhost: at java.base/java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1055) localhost: at java.base/java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1050) localhost: at java.base/java.lang.Thread.dispatchUncaughtException(Thread.java:1997) localhost: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ShutdownHookManager localhost: at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) localhost: at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) localhost: at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) localhost: ... 10 more localhost: full log in /data/lakedata/spark/logs/spark-gkzx-org.apache.spark.deploy.worker.Worker-1-gkzx.out
08-02
<think>我们正在解决两个问题:1. hadoop命令未找到 2. ClassNotFoundException: org.apache.hadoop.util.ShutdownHookManager 根据用户提供的引用[1]和[2],以及用户上一次的问题(集成Spark 3.5.1)和我们的回答,我们可以推断: 1. "hadoop命令未找到"通常是因为Spark在启动时需要调用Hadoop的命令行工具(比如在YARN模式下),但系统环境变量PATH中没有包含Hadoop的bin目录。 2. ClassNotFoundException通常是因为缺少Hadoop相关的jar包,或者Spark的类路径没有正确包含这些jar包。 解决方案: 1. 针对hadoop命令未找到: - 确保Hadoop已经正确安装,并且其bin目录已经添加到系统的PATH环境变量中。 - 可以在启动Spark之前,设置PATH环境变量,或者将Hadoop的bin目录软链接到/usr/bin等系统路径下(不推荐,最好用环境变量)。 2. 针对ClassNotFoundException: org.apache.hadoop.util.ShutdownHookManager: - 这个类属于hadoop-common包。需要确保Spark的类路径中包含Hadoop的common相关jar包。 - 根据引用[2],我们可以通过设置SPARK_DIST_CLASSPATH(或者SPARK_CLASSPATH,但Spark官方推荐使用SPARK_DIST_CLASSPATH)来包含Hadoopjar包。 具体步骤: 步骤1:设置Hadoop的环境变量 在启动Spark的节点上,确保Hadoop的环境变量已经设置。例如,在~/.bashrc或/etc/profile中添加: ```bash export HADOOP_HOME=/path/to/your/hadoop # 替换为实际的Hadoop安装路径 export PATH=$PATH:$HADOOP_HOME/bin ``` 然后执行`source ~/.bashrc`使其生效。 步骤2:设置Spark的类路径 Spark在启动时,需要将Hadoopjar包加入到类路径中。我们可以通过设置SPARK_DIST_CLASSPATH环境变量来实现。 在Spark的配置文件`spark-env.sh`中,添加: ```bash export SPARK_DIST_CLASSPATH=$(hadoop classpath) ``` 注意:这里使用`$(hadoop classpath)`命令来获取Hadoop的类路径,这要求hadoop命令已经在PATH中(步骤1已经设置)。 如果`hadoop classpath`命令不可用(比如Hadoop版本过低),我们可以手动设置,类似于引用[2]中的方式: ```bash export SPARK_DIST_CLASSPATH="/home/hadoop/hadoop-2.2.0/share/hadoop/common/*:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/*:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/*" ``` 注意:将路径替换为实际的Hadoop安装路径。 步骤3:验证 重新启动Spark,并提交一个任务(如SparkPi)进行测试。 注意事项: - 确保所有节点上都有相同的配置(如果是集群模式)。 - 如果使用YARN模式,确保YARN的配置正确。 如果以上步骤仍然不能解决问题,可能需要检查Hadoop和Spark的版本兼容性。Spark 3.5.1通常需要Hadoop 3.x版本,但也可以兼容Hadoop 2.7+(需要重新编译Spark)。如果使用的是预编译的Spark二进制包,请确认其编译时使用的Hadoop版本与当前环境一致。 下面我们将具体命令和步骤整理如下:</think>### 解决 Spark 启动失败:`hadoop 命令未找到` 和 `ClassNotFoundException` 错误 以下是针对两个关键问题的完整解决方案,结合 Hadoop 环境配置和 Spark 类路径设置: #### 问题 1: **"hadoop 命令未找到"** 根本原因:系统 PATH 未包含 Hadoop 的 bin 目录 **解决方案**: 1. **永久性配置环境变量** 在 `~/.bashrc` 或 `/etc/profile` 中添加: ```bash export HADOOP_HOME=/path/to/your/hadoop # 替换为实际路径 export PATH=$PATH:$HADOOP_HOME/bin ``` 执行 `source ~/.bashrc` 生效 2. **临时测试(仅当前会话)** 直接在终端设置: ```bash export PATH=/path/to/hadoop/bin:$PATH ``` #### 问题 2: **`ClassNotFoundException: org.apache.hadoop.util.ShutdownHookManager`** 根本原因:Spark 未加载 Hadoop 核心 JAR 包 **解决方案**: 1. **配置 `SPARK_DIST_CLASSPATH`** 在 `$SPARK_HOME/conf/spark-env.sh` 中添加: ```bash export SPARK_DIST_CLASSPATH=$(hadoop classpath) ``` > 此命令自动获取 Hadoop 所有依赖路径[^2] 2. **手动指定类路径(备用方案)** 若 `hadoop classpath` 不可用,参考格式: ```bash export SPARK_DIST_CLASSPATH=" $HADOOP_HOME/share/hadoop/common/*: $HADOOP_HOME/share/hadoop/common/lib/*: $HADOOP_HOME/share/hadoop/hdfs/*: $HADOOP_HOME/share/hadoop/yarn/* " ``` #### 验证步骤: 1. 检查 Hadoop 命令: ```bash hadoop version # 应显示版本信息 ``` 2. 提交测试任务(SparkPi 示例): ```bash spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ $SPARK_HOME/examples/jars/spark-examples_*.jar 10 ``` 3. 检查日志: ```bash tail -f $SPARK_HOME/logs/spark-*.out | grep -i "error" ``` #### 关键注意事项: 1. **路径一致性** - 确保 `$HADOOP_HOME` 与集群实际安装路径一致 - 检查 `hadoop classpath` 输出是否包含 `hadoop-common-*.jar` 2. **权限问题** 若使用非 root 用户运行 Spark: ```bash chmod -R 755 $HADOOP_HOME/bin # 确保执行权限 ``` 3. **版本兼容性** | Spark 版本 | 推荐 Hadoop 版本 | |------------|------------------| | 3.5.1 | 3.3.1+ | | 3.4.x | 3.2.x+ | > 不匹配时需重新编译 Spark:`./dev/make-distribution.sh --tgz -Phadoop-3.3 -Pyarn` > 通过以上配置,95% 的类路径问题可解决[^1][^2]。若仍报错,检查 Hadoop 安装完整性(尤其 `$HADOOP_HOME/share/hadoop/common` 目录)
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

陈振阳

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值