Yarn 运行报错 Could not get pid for container_****

提交mapreduce的example案例到YARN上运行时,一直卡在map 0% reduce 0%,报错提示:
Job job_** failed with state FAILED due to: Application application_*** failed 2 times due to AM Container for appattempt_*** exited with exitCode: -1 完整报错如下:
看了看发现是container_***这个文件不存在

19/04/17 16:49:06 INFO mapreduce.Job:  map 0% reduce 0%
19/04/17 16:49:06 INFO mapreduce.Job: Job job_1555490729473_0002 failed with state FAILED due to: Application application_1555490729473_0002 failed 2 times due to AM Container for appattempt_1555490729473_0002_000002 exited with  exitCode: -1
For more detailed output, check application tracking page:http://suddev-PC:8088/proxy/application_1555490729473_0002/Then, click on links to logs of each attempt.
Diagnostics: File /home/suddev/dev/bd/app/tmp/nm-local-dir/usercache/suddev/appcache/application_1555490729473_0002/container_1555490729473_0002_02_000001 does not exist
Failing this attempt. Failing the application.
19/04/17 16:49:06 INFO mapreduce.Job: Counters: 0
Job Finished in 9.091 seconds
java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/suddev/QuasiMonteCarlo_1555490936334_196215618/out/reduce-out
	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1750)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774)
	at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
	at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
	at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

继续看yarn nodemanager的log,如下

2019-04-17 16:46:11,609 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Failed to launch container.
java.io.FileNotFoundException: File /home/suddev/dev/bd/app/tmp/nm-local-dir/usercache/suddev/appcache/application_1555490729473_0001/container_1555490729473_0001_02_000001 does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:513)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:161)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2019-04-17 16:46:11,610 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1555490729473_0001_02_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2019-04-17 16:46:11,610 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1555490729473_0001_02_000001
2019-04-17 16:46:12,914 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1555490729473_0001_02_000001
2019-04-17 16:46:13,714 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Could not get pid for container_1555490729473_0001_02_000001. Waited for 2000 ms.
2019-04-17 16:46:13,727 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-suddev/nm-local-dir/usercache/suddev/appcache/application_1555490729473_0001/container_1555490729473_0001_02_000001

去文件夹看了看发现真没有这个文件,各种尝试后发现yarnyarn.nodemanager.local-dirshadoophadoop.tmp.dir参数对应文件位置不一致

解决办法

hdfs-site.xmlhadoop.tmp.dir属性和yarn-site.xml中的yarn.nodemanager.local-dirs属性设置为相同路径
示例
hdfs-site.xml

<property>
        <name>hadoop.tmp.dir</name>
        <value>/home/suddev/dev/bd/app/tmp</value>
</property>

yarn-site.xml

<property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/home/suddev/dev/bd/app/tmp</value>
</property>

然后重启dfs和Yarn就可以正常工作啦

./stop-dfs.sh
./stop-yarn.sh
./start-dfs.sh
./start-yarn.sh
当你运行`yarn serve`时,出现了报错yarn不是内部命令”,这通常说明你的系统没有正确安装yarn或者你的yarn命令没有被正确配置。你可以按照以下步骤来解决这个问题: 1. 首先,确认你已经正确安装了yarn。可以通过在终端或命令行中运行`yarn --version`命令来检查yarn是否被正确安装。如果你看到了yarn的版本号,那么yarn已经成功安装。如果没有看到任何输出或者出现了“bash: yarn: command not found”错误,那么你需要安装yarn。 2. 如果yarn没有被正确安装,请按照yarn的官方文档或网上的教程正确安装yarn。根据你的操作系统,可能有不同的安装方式。确保你按照步骤进行安装并且安装成功。 3. 安装完成后,你可能需要将yarn命令添加到系统的环境变量中。这样系统就可以通过命令行找到yarn命令。你可以通过运行`yarn global bin`命令来查看yarn的目录位置,然后将该目录添加到系统的环境变量中。在Windows系统中,你可以将yarn的安装目录添加到系统的Path变量中;在Mac或Linux系统中,你可以将yarn的安装目录添加到你的bash配置文件(如`.bashrc`或`.bash_profile`)的`PATH`环境变量中。 4. 添加完环境变量后,关闭当前终端或命令行窗口,并重新打开一个新的窗口。然后再次运行`yarn serve`命令,看看是否还会出现报错。如果一切顺利,你应该能够成功运行`yarn serve`命令了。 希望这些步骤可以帮助你解决问题,如果还有其他疑问,请随时提问。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值