Yarn调度器的理解与优化

未燃

已于 2023-03-09 21:47:38 修改

阅读量776

点赞数

文章标签： yarn java hadoop

于 2022-02-27 15:17:57 首次发布

本文链接：https://blog.csdn.net/aliRan314/article/details/122832937

版权

1 调度器的简介与概述

1.1 调度器的基础架构

资源调度器是YARN最核心的组件之一，它是插拔式可配的，它定义了一整套接口规范以便用户可按照需要实现自己的调度器;
YARN自带了FifoScheduler（先入先出调度器）、CapacityScheduler（容量调度器）FairScheduler（公平调度器）三种常用资源调度器;
目前较多使用的调度器是FairScheduler , FairScheduler 它是多用户调度器，基于队列为单位划分资源，每个队列可设定一定比例的资源最低保证和使用上限，同时每个用户也可设定一定的资源使用上限以防止资源滥用；当一个队列的资源有剩余时，可暂时将剩余资源共享给其他队列，目标就是在整个时间线上让所有的applications平均的获取资源。

这块可以通过插拔特性和事件处理器两个角度来分析调度器在YARN中的基础架构。

1.1.1 插拔式组件

ResourceManager在初始化时会根据用户配置项（yarn-site.xml）来创建资源调度器对象。

protected ResourceScheduler createScheduler() {
   
  //获取调度器配置类名
    String schedulerClassName = conf.get(YarnConfiguration.RM_SCHEDULER,
        YarnConfiguration.DEFAULT_RM_SCHEDULER);
    LOG.info("Using Scheduler: " + schedulerClassName);
    try {
   
      Class<?> schedulerClazz = Class.forName(schedulerClassName);
      if (ResourceScheduler.class.isAssignableFrom(schedulerClazz)) {
   
        return (ResourceScheduler) ReflectionUtils.newInstance(schedulerClazz,
            this.conf);
      } else {
   
        throw new YarnRuntimeException("Class: " + schedulerClassName
            + " not instance of " + ResourceScheduler.class.getCanonicalName());
      }
    } catch (ClassNotFoundException e) {
   
      throw new YarnRuntimeException("Could not instantiate Scheduler: "
          + schedulerClassName, e);
    }
  }

所有资源调度器都需要实现ResourceScheduler（继承YarnScheduler和Recoverable）接口，具体代码如下：

public interface YarnScheduler extends EventHandler<SchedulerEvent> {
   
  /**获取队列信息
   * Get queue information
   * @param queueName queue name
   * @param includeChildQueues include child queues?
   * @param recursive get children queues?
   * @return queue information
   * @throws IOException
   */
  public QueueInfo getQueueInfo(String queueName, boolean includeChildQueues,
      boolean recursive) throws IOException;

  /**获取当前用户的队列ACL权限
   * Get acls for queues for current user.
   * @return acls for queues for current user
   */
  public List<QueueUserACLInfo> getQueueUserAclInfo();

  /**
   * Get the whole resource capacity of the cluster.
   * @return the whole resource capacity of the cluster.
   */
  @LimitedPrivate("yarn")
  @Unstable
  public Resource getClusterResource();

  /**获取调度器最少可分配的资源
   * Get minimum allocatable {@link Resource}.
   * @return minimum allocatable resource
   */
  public Resource getMinimumResourceCapability();

  /**获取调度器最多可分配的资源
   * Get maximum allocatable {@link Resource} at the cluster level.
   * @return maximum allocatable resource
   */
  public Resource getMaximumResourceCapability();

  public Resource getMaximumResourceCapability(String queueName);

  /**获取当前集群可用节点的数量*/
  public int getNumClusterNodes();

  /**AM与调度器的核心API*/
  
  Allocation
  allocate(ApplicationAttemptId appAttemptId,
      List<ResourceRequest> ask,
      List<SchedulingRequest> schedulingRequests,
      List<SchedulingRequest> deleteSchedulingRequests,
      List<ContainerId> release,
      List<String> blacklistAdditions,
      List<String> blacklistRemovals,
      List<ContainerResourceChangeRequest> increaseRequests,
      List<ContainerResourceChangeRequest> decreaseRequests);
  //接口较多
  ···········

  /**
   * 获取调度失败归因
   * @param applicationId 需要获取调度失败归因的App
   * @return 调度失败归因信息
   */
  ScheduleResult getScheduleResult(ApplicationId applicationId);
}
public interface Recoverable {
   
  //RM重启调用此借口恢复调度器的信息
  public void recover(RMState state)