最近线上的的nm 有crash的问题,查看错误日志:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
2014
-
06
-
19
00
:
01
:
22
,
308
FATAL
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Error: Shutting downjava.util.
ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:
761
)
at java.util.LinkedList$ListItr.next(LinkedList.java:
696
)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource.toString(LocalizedResource.java:
120
)
at java.lang.String.valueOf(String.java:
2826
)
at java.lang.StringBuilder.append(StringBuilder.java:
115
)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:
656
)
2014
-
06
-
19
00
:
01
:
22
,
308
INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
2014
-
06
-
19
00
:
03
:
40
,
685
INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading
public
rsrc:{ hdfs:
//bipcluster/tmp/hive-hdfs/hive_2014-06-19_00-05-51_049_5891972191087895437/-mr-10004/a1495555-b0dc-4356-8b68-1c881012e123, 1403107405580, FILE, null }
2014
-
06
-
19
00
:
03
:
40
,
685
FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.util.concurrent.RejectedExecutionException
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:
1768
)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:
767
)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:
658
)
at java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:
152
)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:
618
)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:
514
)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:
456
)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:
128
)
at org.apache.hadoop.yarn.event.AsyncDispatcher$
1
.run(AsyncDispatcher.java:
77
)
at java.lang.Thread.run(Thread.java:
662
)
2014
-
06
-
19
00
:
03
:
40
,
685
INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.
|
是在做resource localize时多线程的并发更新问题导致nm异常退出
这是一个bug,bug id:
https://issues.apache.org/jira/browse/YARN-573
bug描述:
1
2
3
4
5
6
|
Shared data structures in Public Localizer and Private Localizer are not Thread safe.
PublicLocalizer
1
) pending accessed by addResource (part of event handling) and run method (as a part of PublicLocalizer.run() ).
PrivateLocalizer (LocalizerRunner?)
1
) pending accessed by addResource (part of event handling) and findNextResource (i.remove()).
Also update method should be fixed. It too is sharing pending list.
|
控制resource localize的有两个线程
PublicLocalizer 和 LocalizerRunner,一个用来控制public文件的下载,一个用来控制private文件的下载,两者都会操作pending,fix的方法就是增加同步,这个bug已经在cdh5.2.0的yarn中fix了。
关于触发java.util.ConcurrentModificationException的异常可以参考:
1
|
http:
//examples.javacodegeeks.com/java-basics/exceptions/java-util-concurrentmodificationexception-how-to-handle-concurrent-modification-exception/
|
本文转自菜菜光 51CTO博客,原文链接:http://blog.51cto.com/caiguangguang/1587265,如需转载请自行联系原作者