1、Yarn的调度本地性是指将作业分配到数据所在节点,可以减少很多网络IO,对MR作业来说,只有map task有本地性需求,reduce task和failed map task都没有本地性需求
2、Yarn的调度本地性是通过延迟调度来满足的,本地性有3个级别:节点本地、机架本地和随意调度,当调度不能满足本地性时,调度器会计算错过的调度机会数量,并等待该计数达到阈值,然后将本地性约束放宽到下一个级别
3、为了调度到满足数据本地性的节点,可以错过一定数量的调度机会,这个错过机会数量的阈值由以下参数控制:
FairScheduler
// 配置为浮点数,最终错过的节点数为配置 * 集群总节点数
yarn.scheduler.fair.locality.threshold.node 默认为-1.0f
yarn.scheduler.fair.locality.threshold.rack 默认为-1.0f
CapacityScheduler
// 配置为正整数,即最终错过的节点数
yarn.scheduler.capacity.node-locality-delay 默认为40
yarn.scheduler.capacity.rack-locality-additional-delay 默认为-1
4、如果 YARN 与文件系统分开部署,则应禁用此功能,因为本地性没有意义,将以上参数设置为-1即可禁用本地调度功能
container请求事件的产生
TaskAttemptImpl.java
taskAttempt.eventHandler.handle(new ContainerRequestEvent(
taskAttempt.attemptId, taskAttempt.resourceCapability,
taskAttempt.dataLocalHosts.toArray(
new String[taskAttempt.dataLocalHosts.size()]),
taskAttempt.dataLocalRacks.toArray(
new String[taskAttempt.dataLocalRacks.size()])));
dataLocalHosts和dataLocalRacks来源于MapTaskAttemptImpl.java
public MapTaskAttemptImpl(TaskId taskId, int attempt,
EventHandler eventHandler, Path jobFile,
int partition, TaskSplitMetaInfo splitInfo, JobConf conf,
TaskAttemptListener taskAttemptListener,
Token<JobTokenIdentifier> jobToken,
Credentials credentials, Clock clock,
AppContext appContext) {
super(taskId, attempt, eventHandler,
// splitInfo.getLocations()就是输入数据split所在节点
taskAttemptListener, jobFile, partition, conf, splitInfo.getLocations(),
jobToken, credentials, clock, appContext);
this.splitInfo = splitInfo;
}
接下来是RMContainerAllocator.java,处理container请求事件
private void handleMapContainerRequest(ContainerRequestEvent reqEvent) {
...
if(mapContainerRequestAccepted) {
// set the resources
reqEvent.getCapability().setMemorySize(
mapResourceRequest.getMemorySize());
reqEvent.getCapability().setVirtualCores(
mapResourceRequest.getVirtualCores());
// 添加map
scheduledRequests.addMap(reqEvent); //maps are immediately scheduled
} else {
...
}
}
scheduledRequests.addMap方法
void addMap(ContainerRequestEvent event) {
ContainerRequest request = null;
...
} else {
// 创建container请求
request =
new ContainerRequest(event, PRIORITY_MAP, mapNodeLabelExpression);
// 将数据所在节点与task attempt相对应,是为了之后的container分配节点本地性
for (String host : event.getHosts()) {
LinkedList<TaskAttemptId> list = mapsHostMapping.get(host);
if (list == null) {
list = new LinkedList<TaskAttemptId>();
mapsHostMapping.put(host, list);
}
list.add(event.getAttemptID());
if (LOG.isDebugEnabled()) {
LOG.debug("Added attempt req to host " + host);
}
}
// 将机架与task attempt相对应,是为了之后的container分配机架本地性
for (String rack : event.getRacks(