问题表现:yarn任务出现启动即失败的现象
详细报错:
ERROR localizer.ResourceLocalizationService (ResourceLocalizationService.java:addResource(920)) - Failed to submit rsrc { { hdfs://ha/user/*.jar, 1600077673107, FILE, null },pending,[(container_e725_1601104649135_299638_01_000002)],604883556399904,FAILED} for download. Either queue is full or threadpool is shutdown.
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ExecutorCompletionService$QueueingFuture@44d95a0e rejected from org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor@38bc8a14[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5160]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:181)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:899)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:777)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:719)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:748)
解决方案:
- 查看resourcemanager日志定为nodemanger,查看nodemanger日志获取上述报错,google后发现3.1.4需要打补丁https://issues.apache.org/jira/browse/YARN-9968
- 重启yarn集群
深度分析:集群之前出现过namenode及datanode的重启,但没有对yarn重启