1、application日志
2016-08-11 14:48:15,174 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0
2016-08-11 14:48:15,174 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 0
2016-08-11 14:48:15,174 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=0
2016-08-11 14:48:15,174 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 1
<span style="color:#ff6666;">2016-08-11 14:48:16,176 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0
</span>2016-08-11 14:48:16,177 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 0
2016-08-11 14:48:16,177 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=0
2016-08-11 14:48:16,177 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 1
2016-08-11 14:48:17,179 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0
2016-08-11 14:48:17,179 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 0
2016-08-11 14:48:17,179 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=0
2016-08-11 14:48:17,179 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 1
程序中大量的上面日志出现,跟踪RMContainerAllocator代码:
// availableMemForMap must be sufficient to run at least 1 map
if (ResourceCalculatorUtils.computeAvailableContainers(availableResourceForMap,
mapResourceRequest, getSchedulerResourceTypes()) <= 0) {
// to make sure new containers are given to maps and not reduces
// ramp down all scheduled reduces if any
// (since reduces are scheduled at higher priority than maps)
LOG.info("Ramping down all scheduled reduces:"
+ scheduledRequests.reduces.size());
for (ContainerRequest req : scheduledRequests.reduces.values()) {
pendingReduces.add(req);
}
scheduledRequests.reduces.clear();
computeAvailableContainers:
<pre name="code" class="html"> public static int computeAvailableContainers(Resource available,
Resource required, EnumSet<SchedulerResourceTypes> resourceTypes) {
if (resourceTypes.contains(SchedulerResourceTypes.CPU)) {
return Math.min(available.getMemory() / required.getMemory(),
available.getVirtualCores() / required.getVirtualCores());
}
return available.getMemory() / required.getMemory();
}
// get available resources for this job
Resource headRoom = getAvailableResources();
if (headRoom == null) {
headRoom = Resources.none();
}
LOG.info("Recalculating schedule, headroom=" + headRoom);
//check for slow start
if (!getIsReduceStarted()) {//not set yet
int completedMapsForReduceSlowstart = (int)Math.ceil(reduceSlowStart *
totalMaps);
if(completedMaps < completedMapsForReduceSlowstart) {
LOG.info("Reduce slow start threshold not met. " +
"completedMapsForReduceSlowstart " +
completedMapsForReduceSlowstart);
return;
} else {
LOG.info("Reduce slow start threshold reached. Scheduling reduces.");
setIsReduceStarted(true);
}
}
//if all maps are assigned, then ramp up all reduces irrespective of the
//headroom
if (scheduledMaps == 0 && numPendingReduces > 0) {
LOG.info("All maps assigned. " +
"Ramping up all remaining reduces:" + numPendingReduces);
scheduleAllReduces();
return;
}
从上面两段代码说明资源以及不够用了
2、从nm日志看,请求的物理内存是25G,太多了,导致其他job获取不到资源,
2016-08-11 14:48:42,560 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30460 for container-id container_1470814634002_0543_01_000008: 410.9 MB of 25 GB physical memory used; 1.7 GB of 125 GB virtual memory used
2016-08-11 14:48:45,609 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30460 for container-id container_1470814634002_0543_01_000008: 410.8 MB of 25 GB physical memory used; 1.7 GB of 125 GB virtual memory used
2016-08-11 14:48:48,659 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30460 for container-id container_1470814634002_0543_01_000008: 410.8 MB of 25 GB physical memory used; 1.7 GB of 125 GB virtual memory used
2016-08-11 14:48:51,708 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30460 for container-id container_1470814634002_0543_01_000008: 410.8 MB of 25 GB physical memory used; 1.7 GB of 125 GB virtual memory used
2016-08-11 14:48:54,756 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30460 for container-id container_1470814634002_0543_01_000008: 410.8 MB of 25 GB physical memory used; 1.7 GB of 125 GB virtual memory used
2016-08-11 14:48:57,804 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30460 for container-id container_1470814634002_0543_01_000008: 410.8 MB of 25 GB physical memory used; 1.7 GB of 125 GB virtual memory used
2016-08-11 14:49:00,857 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30460 for container-id container_1470814634002_0543_01_000008: 410.8 MB of 25 GB physical memory used; 1.7 GB of 125 GB virtual memory used
2016-08-11 14:49:03,904 INFO org.apache.hadoo
3、于是修改配置
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
问题解决