hadoop3.1.1 优化之slowstart(源码分析)

本文探讨的参数是:mapreduce.job.reduce.slowstart.completedmaps

关于此参数的介绍

hadoop 3.1.1中mapred-default.xml中关于此参数的介绍如下:

mapreduce.job.reduce.slowstart.completedmaps

0.05

(默认值)

Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.(介绍)

默认配置的表达的意思是: 0.05%的map task结束后可以开始为reduce task申请资源。

配置的参数如果设置的过小,可能会让reduce task和map task争抢资源(若reduce占用过多资源,让map等待时间过长,map会抢占reduce),且会造成reduce空转。

若配置的参数过大,就丧失了让reduce和map并行,来降低job工作时间的目标。

因此slowstart参数该如何设计是一个学问。

在别人的博客里发现了以下的信息(33%的配比是如何推断得知我还不清楚:),以后搞懂了补上)

源码分析

接下来进入源码分析阶段。

1、初始化,在RMContainerAllocator.java中的serviceInit函数里可看到reduceSlowStart的初始化过程,如果配置文件里对其进行了更改,就用更改过的参数值;如果没有,则是使用的默认值0.05。

2、RMContainerAllocator.java中的heartbeat函数(当心跳来的时候就会触发)会判断map task已完成的数量并判断是否要开始调度reduce task。

 protected synchronized void heartbeat() throws Exception {
    scheduleStats.updateAndLogIfChanged("Before Scheduling: ");
    List<Container> allocatedContainers = getResources();

    if (allocatedContainers != null && allocatedContainers.size() > 0) {
      System.out.println("allocatedContainers:");
      for(int i = 0 ; i < allocatedContainers.size(); i ++){
        System.out.println(allocatedContainers.get(i).toString());
      }
      scheduledRequests.assign(allocatedContainers);
    }

    int completedMaps = getJob().getCompletedMaps();//获取已完成的map数量
    int completedTasks = completedMaps + getJob().getCompletedReduces(); //获取已完成的task数量
    //如果已完成的task数量没有变 || map的资源请求大于0
    if ((lastCompletedTasks != completedTasks) ||
          (scheduledRequests.maps.size() > 0)) {
      lastCompletedTasks = completedTasks;//更新已完成的task数量
      recalculateReduceSchedule = true; //重新计算reduce的调度
    }

    if (recalculateReduceSchedule) {
      boolean reducerPreempted = preemptReducesIfNeeded();//考虑是否要抢占reduce

      if (!reducerPreempted) {//如果不用抢占reduce
        // Only schedule new reducers if no reducer preemption happens for
        // this heartbeat 只有不需要抢占reduce的时候,才会在这个心跳里启动新的reduce.
        scheduleReduces(getJob().getTotalMaps(), completedMaps,
            scheduledRequests.maps.size(), scheduledRequests.reduces.size(),
            assignedRequests.maps.size(), assignedRequests.reduces.size(),
            mapResourceRequest, reduceResourceRequest, pendingReduces.size(),
            maxReduceRampupLimit, reduceSlowStart);
      }

      recalculateReduceSchedule = false;
    }

    scheduleStats.updateAndLogIfChanged("After Scheduling: ");
  }

3、此函数判断是否要抢占reduce,如果所有申请了资源的map都没被分到资源,且等待时间过长,就要启用抢占reduce。

boolean preemptReducesIfNeeded() {
    if (reduceResourceRequest.equals(Resources.none())) {
      return false; // no reduces
    }

    if (assignedRequests.maps.size() > 0) {
      // there are assigned mappers  有map task已经分配到了资源且正在运行
      return false;
    }

    if (scheduledRequests.maps.size() <= 0) {
      // there are no pending requests for mappers 没有map task向resourceManager发送资源请求,但尚未分配到资源;
      return false;
    }

    // At this point:
    // we have pending mappers and all assigned resources are taken by reducers
    //有等待的map 但所有已分配的资源都被reducer分走了
    if (reducerUnconditionalPreemptionDelayMs >= 0) {
      // Unconditional preemption is enabled.
      //启用无条件抢占。如果映射器挂起的时间超过了配置的阈值,要抢占reduce。
      if (preemptReducersForHangingMapRequests(
          reducerUnconditionalPreemptionDelayMs)) {
        return true;
      }
    }

    // The pending mappers haven't been waiting for too long. Let us see if
    // there are enough resources for a mapper to run. This is calculated by
    // excluding scheduled reducers from headroom and comparing it against
    // resources required to run one mapper.
    //如果在队列里等待的map没有等待过久,可以看有没有适合的资源给map,如果有的话,就不用抢占reduce了。
    Resource scheduledReducesResource = Resources.multiply(
         reduceResourceRequest, scheduledRequests.reduces.size());
    Resource availableResourceForMap =
         Resources.subtract(getAvailableResources(), scheduledReducesResource);
    if (ResourceCalculatorUtils.computeAvailableContainers(availableResourceForMap,
        mapResourceRequest, getSchedulerResourceTypes()) > 0) {
       // Enough room to run a mapper
      return false;
    }

    // Available resources are not enough to run mapper. See if we should hold
    // off before preempting reducers and preempt if okay.
    return preemptReducersForHangingMapRequests(reducerNoHeadroomPreemptionDelayMs);
  }

3、如果不需要抢占reduce资源,那么可以调度新的reduce,接下来看scheduleReduces函数,调度reduce的时候把reduceSlowStart作为参数传了进去。

//check for slow start
    if (!getIsReduceStarted()) {//not set yet 在Reduce调度尚未启动时
      int completedMapsForReduceSlowstart = (int)Math.ceil(reduceSlowStart * 
                      totalMaps); //计算开始调度reduce时,map task应该完成的数量
      if(completedMaps < completedMapsForReduceSlowstart) {//尚未达到
        LOG.info("Reduce slow start threshold not met. " +
              "completedMapsForReduceSlowstart " + 
            completedMapsForReduceSlowstart);
        return;
      } else {
        LOG.info("Reduce slow start threshold reached. Scheduling reduces.");
        setIsReduceStarted(true); //设置开始调度reduce
      }
    }

4、若job处于Uber模式(小作业条件下,频繁的创建container会给集群带来较大的消耗,因此创造了uber模式,在uber模式下,所有的map和reduce都是串行的,都在同一个container中)。若是Uber模式,JobImpl中会将reduceSlowStart设置为1.

 if (isUber) {
      LOG.info("Uberizing job " + jobId + ": " + numMapTasks + "m+"
          + numReduceTasks + "r tasks (" + dataInputLength
          + " input bytes) will run sequentially on single node.");
 
      // make sure reduces are scheduled only after all map are completed
      conf.setFloat(MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART,
                        1.0f);

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值