关于Yarn源码那些事-前传之ResourceManager篇(一)初始化

在关于Yarn那些事的博客里,介绍的主要是针对任务提交的一个动态流程说明,而其中牵涉到的一些细节问题,必须通过Resourcemanager的启动和NodeManager的启动,来更好的说明。

而本系列,就详细说说ResourceManager启动过程中,都发生了什么。

我们都知道,Yarn的启动脚本是start-yan.sh,我们就从这个脚本开始,琢磨琢磨。

"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  start resourcemanager

脚本里这句话,指向了本目录下的yarn-daemon.sh脚本,命令参数指定了resourcemanager,接着看yarn-daemon.sh脚本:

nohup nice -n $YARN_NICENESS "$HADOOP_YARN_HOME"/bin/yarn --config $YARN_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &

这句话很关键,交代了我们实际的启动脚本是bin目录下的yarn,我们看下:

elif [ "$COMMAND" = "resourcemanager" ] ; then
  CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/rm-config/log4j.properties
  CLASSPATH=${CLASSPATH}:"$HADOOP_YARN_HOME/$YARN_DIR/timelineservice/*"
  CLASSPATH=${CLASSPATH}:"$HADOOP_YARN_HOME/$YARN_DIR/timelineservice/lib/*"
  CLASS='org.apache.hadoop.yarn.server.resourcemanager.ResourceManager'
  YARN_OPTS="$YARN_OPTS $YARN_RESOURCEMANAGER_OPTS"
  if [ "$YARN_RESOURCEMANAGER_HEAPSIZE" != "" ]; then
    JAVA_HEAP_MAX="-Xmx""$YARN_RESOURCEMANAGER_HEAPSIZE""m"
  fi

终于顺利找到了根基,原来根据我们指定的脚本,找到的是ResourceManager这个类来启动的,下面就看看这个类。

先来看下注释:

/**
 * The ResourceManager is the main class that is a set of components. "I am the
 * ResourceManager. All your resources belong to us..."
 *
 */
@SuppressWarnings("unchecked")
public class ResourceManager extends CompositeService implements Recoverable

格外霸气,管理整个集群内所有的资源,并且继承了CompositeService类,这是一个服务类,不多介绍了,主要提供了一些服务初始化和启动的方法,供子类使用。

从ResourceManager的成员变量开始看起:

protected ClientToAMTokenSecretManagerInRM clientToAMSecretManager = new ClientToAMTokenSecretManagerInRM();
	protected RMContainerTokenSecretManager containerTokenSecretManager;
	protected NMTokenSecretManagerInRM nmTokenSecretManager;

	protected AMRMTokenSecretManager amRmTokenSecretManager;

	private Dispatcher rmDispatcher;

	protected ResourceScheduler scheduler;
	private ClientRMService clientRM;
	protected ApplicationMasterService masterService;
	private ApplicationMasterLauncher applicationMasterLauncher;
	private AdminService adminService;
	private ContainerAllocationExpirer containerAllocationExpirer;
	protected NMLivelinessMonitor nmLivelinessMonitor;
	protected NodesListManager nodesListManager;
	private EventHandler<SchedulerEvent> schedulerDispatcher;
	protected RMAppManager rmAppManager;
	protected ApplicationACLsManager applicationACLsManager;
	protected QueueACLsManager queueACLsManager;
	protected RMDelegationTokenSecretManager rmDTSecretManager;
	private DelegationTokenRenewer delegationTokenRenewer;
	private WebApp webApp;
	protected RMContext rmContext;
	protected ResourceTrackerService resourceTracker;
	private boolean recoveryEnabled;

很多,具体可以参照每个类的用法,在此不多说,其中牵涉到Application较多的,比如RMAppManager,RMContext等,需要细看,这是废话,每个都值得研究。

看Main方法:

			Configuration conf = new YarnConfiguration();
			ResourceManager resourceManager = new ResourceManager();
			ShutdownHookManager.get().addShutdownHook(new CompositeServiceShutdownHook(resourceManager),
					SHUTDOWN_HOOK_PRIORITY);
			setHttpPolicy(conf);
			resourceManager.init(conf);
			resourceManager.start();
		

直接把目光聚焦在这里,我们重点研究下服务的初始化和启动。

this.rmDispatcher = createDispatcher();
		addIfService(this.rmDispatcher);

初始化的第一个关键点,创建调度器,这是ResourceManager异步调度的关键,看看这个方法:很简单:

protected Dispatcher createDispatcher() {
		return new AsyncDispatcher();
	}

很明显,这是个异步调度器,看看这个类的注释和初始化步骤:

/**
 * Dispatches {@link Event}s in a separate thread. Currently only single thread
 * does that. Potentially there could be multiple channels for each event type
 * class and a thread pool can be used to dispatch the events.
 */
@SuppressWarnings("rawtypes")
@Public
@Evolving
public class AsyncDispatcher extends AbstractService implements Dispatcher 

这也是一个需要启动的服务,用于事件的调度:

public AsyncDispatcher() {
		this(new LinkedBlockingQueue<Event>());
	}

	public AsyncDispatcher(BlockingQueue<Event> eventQueue) {
		super("Dispatcher");
		this.eventQueue = eventQueue;
		this.eventDispatchers = new HashMap<Class<? extends Enum>, EventHandler>();
	}

注意,Dispatcher内部封装了一个阻塞队列,运行过程中会把事件都放在这个池子里,并进行调度处理,同时定义了一个eventDispatchers,后续代码更容易看懂这个map的作用:

我们仔细看看AsyncDispatcher的服务初始化代码:

@Override
	protected void serviceInit(Configuration conf) throws Exception {
		this.exitOnDispatchException = conf.getBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY,
				Dispatcher.DEFAULT_DISPATCHER_EXIT_ON_ERROR);
		super.serviceInit(conf);
	}

目前来说,AsyncDispatcher的初始化代码先到这儿,我们继续看ResourceManager的服务初始化代码:

this.amRmTokenSecretManager = createAMRMTokenSecretManager(conf);

我们看到内部有个这个成员变量:

/**
 * AMRM-tokens are per ApplicationAttempt. If users redistribute their
 * tokens, it is their headache, god save them. I mean you are not supposed to
 * distribute keys to your vault, right? Anyways, ResourceManager saves each
 * token locally in memory till application finishes and to a store for restart,
 * so no need to remember master-keys even after rolling them.
 */
public class AMRMTokenSecretManager extends
    SecretManager<AMRMTokenIdentifier> 

看注释,清晰明了,每次提交一个ApplicationAttempt时候,都不用再递交自己的token了,实际上还是一个身份验证工具(自己的理解)。

this.containerAllocationExpirer = new ContainerAllocationExpirer(this.rmDispatcher);
		addService(this.containerAllocationExpirer);

看下这两句话ContainerAllocationExpirer,并且加到了serviceList中,用于最后的初始化,我们看看这个是什么作用,注释非常简单,还是留在动态提交ApplicationMaster的时候分析吧,其实主要是用来判断分配的container是否在规定时间内得到启动的:

AMLivelinessMonitor amLivelinessMonitor = createAMLivelinessMonitor();
		addService(amLivelinessMonitor);

看这个AMLiveLinessMonitor,顾名思义,是用来检查ApplicationLivenessMonitor是否存活的,其继承了这个类:

/**
 * A simple liveliness monitor with which clients can register, trust the
 * component to monitor liveliness, get a call-back on expiry and then finally
 * unregister.
 */
@Public
@Evolving
public abstract class AbstractLivelinessMonitor<O> extends AbstractService 

同时,ContainerAllocationMonitor也继承了这个类,就是用于让客户端监控的:

AMLivelinessMonitor amFinishingMonitor = createAMLivelinessMonitor();
		addService(amFinishingMonitor);

下面初始化了一个一样的AMLiveLinessMonitor,但是变量名不同,同样是起监控作用的:

接下来看RMStateStore的初始化:

boolean isRecoveryEnabled = conf.getBoolean(YarnConfiguration.RECOVERY_ENABLED,
				YarnConfiguration.DEFAULT_RM_RECOVERY_ENABLED);

		RMStateStore rmStore = null;
		if (isRecoveryEnabled) {
			recoveryEnabled = true;
			rmStore = RMStateStoreFactory.getStore(conf);
		} else {
			recoveryEnabled = false;
			rmStore = new NullRMStateStore();
		}

对于大部分成员变量的初始化不予多说,先看下RMStateStore的初始化,在我们默认配置下:isRecoveryEnabled为false,所以创建了一个空的RMStateStore,即NullRMStateStore,对于ResourceManager的状态进行存储:

			rmStore.init(conf);
			rmStore.setRMDispatcher(rmDispatcher);
		

这里面注意下,rmStore内部的dispatcher与RM的dispatcher不是同一个,代码如下:

private Dispatcher rmDispatcher;
AsyncDispatcher dispatcher;

RMStateStore内部有两个调度器,rmDispatcher是RM的调度器,而dispatcher则是其内部用来调度事件的调度器,对于RMStateStore的init代码有些绕,仔细看下:

@Override
	public void init(Configuration conf) {
		if (conf == null) {
			throw new ServiceStateException("Cannot initialize service " + getName() + ": null configuration");
		}
		if (isInState(STATE.INITED)) {
			return;
		}
		synchronized (stateChangeLock) {
			if (enterState(STATE.INITED) != STATE.INITED) {
				setConfig(conf);
				try {
					serviceInit(config);
					if (isInState(STATE.INITED)) {
						// if the service ended up here during init,
						// notify the listeners
						notifyListeners();
					}
				} catch (Exception e) {
					noteFailure(e);
					ServiceOperations.stopQuietly(LOG, this);
					throw ServiceStateException.convert(e);
				}
			}
		}
	}

其实际调用的是AbstractService的init方法,其中调用到了serviceInit方法,而这个方法,则是RMStateStore的方法:

public synchronized void serviceInit(Configuration conf) throws Exception {
		// create async handler
		dispatcher = new AsyncDispatcher();
		dispatcher.init(conf);
		dispatcher.register(RMStateStoreEventType.class, new ForwardingEventHandler());
		initInternal(conf);
	}

很清楚看到了内部封装了一个自己的dispatcher,用于调度RMStateStoreEventType类型的事件:

下面接着看:

this.rmContext = new RMContextImpl(this.rmDispatcher, rmStore, this.containerAllocationExpirer,
				amLivelinessMonitor, amFinishingMonitor, delegationTokenRenewer, this.amRmTokenSecretManager,
				this.containerTokenSecretManager, this.nmTokenSecretManager, this.clientToAMSecretManager);

这是重头戏,我们必须看看这个拥有如此多成员变量的RMContextImpl到底是什么:

/**
 * Context of the ResourceManager.
 */
public interface RMContext {

这是RMContextImpl父类的注释,是ResourceManager的上下文,就相当于管家了,基本大权在握,是ResourceManager的心腹。

接下来是这儿:

		this.nodesListManager = new NodesListManager(this.rmContext);

其实就相当于告诉了RM,这里到底有多少个子节点可供使用,而且给新建的NodesListManager内部也安插了RM的心腹,即RMContextImpl。

		this.rmDispatcher.register(NodesListManagerEventType.class, this.nodesListManager);

看到这儿,我们又得回去看AsyncDispatcher中的一个register方法:

@SuppressWarnings("unchecked")
	@Override
	public void register(Class<? extends Enum> eventType, EventHandler handler) {
		/* check to see if we have a listener registered */
		EventHandler<Event> registeredHandler = (EventHandler<Event>) eventDispatchers.get(eventType);
		LOG.info("Registering " + eventType + " for " + handler.getClass());
		if (registeredHandler == null) {
			eventDispatchers.put(eventType, handler);
		} else if (!(registeredHandler instanceof MultiListenerHandler)) {
			/* for multiple listeners of an event add the multiple listener handler */
			MultiListenerHandler multiHandler = new MultiListenerHandler();
			multiHandler.addHandler(registeredHandler);
			multiHandler.addHandler(handler);
			eventDispatchers.put(eventType, multiHandler);
		} else {
			/* already a multilistener, just add to it */
			MultiListenerHandler multiHandler = (MultiListenerHandler) registeredHandler;
			multiHandler.addHandler(handler);
		}
	}

仔细看来,其实就相当于eventDispatcher的充实,把各类事件即相应的处理,都送给eventdispatcher,方便后续出现类似事件,eventdispatcher能够迅速找到对应的对象来进行处理。

这里,就相当于把对应于NodeManager出现的事情,都交给了NodeListManager,这就是你的工作了。

public enum NodesListManagerEventType {
	NODE_USABLE, NODE_UNUSABLE
}

如果出现了节点可用和不可用的事情,你就得迅速予以处理了。

// Initialize the scheduler
		this.scheduler = createScheduler();
		this.schedulerDispatcher = createSchedulerEventDispatcher();
		addIfService(this.schedulerDispatcher);
		this.rmDispatcher.register(SchedulerEventType.class, this.schedulerDispatcher);

接下来,创建了一个调度器,这一段得仔细看看了,因为yarn中的调度器非常重要,我们作业的初始化,都离不开它:

protected ResourceScheduler createScheduler() {
		String schedulerClassName = conf.get(YarnConfiguration.RM_SCHEDULER, YarnConfiguration.DEFAULT_RM_SCHEDULER);
		LOG.info("Using Scheduler: " + schedulerClassName);
		try {
			Class<?> schedulerClazz = Class.forName(schedulerClassName);
			if (ResourceScheduler.class.isAssignableFrom(schedulerClazz)) {
				return (ResourceScheduler) ReflectionUtils.newInstance(schedulerClazz, this.conf);
			} else {
				throw new YarnRuntimeException("Class: " + schedulerClassName + " not instance of "
						+ ResourceScheduler.class.getCanonicalName());
			}
		} catch (ClassNotFoundException e) {
			throw new YarnRuntimeException("Could not instantiate Scheduler: " + schedulerClassName, e);
		}
	}

我这里的代码是hadoop 2.2.0,在默认配置下,配置的调度器是:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

而接着,我们给调度器自己定义了一个调度器,换句话说,对于调度器接受的事件类型,其会继续调度给其他的handler去处理

最后,把这个调度器也注册给了RM内部的调度器:

protected EventHandler<SchedulerEvent> createSchedulerEventDispatcher() {
		return new SchedulerEventDispatcher(this.scheduler);
	}
		this.rmDispatcher.register(SchedulerEventType.class, this.schedulerDispatcher);

接着,我们的异步调度器又加入了三个成分,负责对Application相关事件,ApplicationAttempt事件,RMNode事件进行调度:

// Register event handler for RmAppEvents
		this.rmDispatcher.register(RMAppEventType.class, new ApplicationEventDispatcher(this.rmContext));

		// Register event handler for RmAppAttemptEvents
		this.rmDispatcher.register(RMAppAttemptEventType.class, new ApplicationAttemptEventDispatcher(this.rmContext));

		// Register event handler for RmNodes
		this.rmDispatcher.register(RMNodeEventType.class, new NodeEventDispatcher(this.rmContext));

接着,我们看下这个:

this.resourceTracker = createResourceTrackerService();
		addService(resourceTracker);

从官方文档中,我们知道RM实现了对于系统全部资源的管控,而这个管控是通过RPC来实现的,NodeManager调用ResourceTracker内的方法来提交自己的资源,而RM端有相应的处理,返回命令,让NM予以执行,而ResourceTrackerSerive就是在此处初始化的:

@Override
	protected void serviceInit(Configuration conf) throws Exception {
		resourceTrackerAddress = conf.getSocketAddr(YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,
				YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,
				YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_PORT);

		RackResolver.init(conf);
		nextHeartBeatInterval = conf.getLong(YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS,
				YarnConfiguration.DEFAULT_RM_NM_HEARTBEAT_INTERVAL_MS);
		if (nextHeartBeatInterval <= 0) {
			throw new YarnRuntimeException("Invalid Configuration. " + YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS
					+ " should be larger than 0.");
		}

		minAllocMb = conf.getInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
				YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
		minAllocVcores = conf.getInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES,
				YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES);

		super.serviceInit(conf);
	}

可以看到,这里牵涉到了很多的默认配置的地址,所以,看代码是必要的,我们想要把配置文件全部搞通,其实认真搞通代码,就轻而易举了。

masterService = createApplicationMasterService();
		addService(masterService);

看看这一段,新建了一个ApplicationMasterService,用于对所有提交的ApplicationMaster进行管理:其中的serviceInit方法不复杂,重点在其serviceStart方法中:

下面,我们注意看下这段代码:

this.rmAppManager = createRMAppManager();
		// Register event handler for RMAppManagerEvents
		this.rmDispatcher.register(RMAppManagerEventType.class, this.rmAppManager);
		this.rmDTSecretManager = createRMDelegationTokenSecretManager(this.rmContext);
		rmContext.setRMDelegationTokenSecretManager(this.rmDTSecretManager);
		clientRM = createClientRMService();
		rmContext.setClientRMService(clientRM);
		addService(clientRM);

		adminService = createAdminService(clientRM, masterService, resourceTracker);
		addService(adminService);

		this.applicationMasterLauncher = createAMLauncher();
		this.rmDispatcher.register(AMLauncherEventType.class, this.applicationMasterLauncher);

这里有些东西需要注意,新建了一个ClientRMService,负责客户端与RM的所有交互,对于客户端的每个请求,我们都可以在ClientRMService下面找到相应的代码。

下面接着看,AMLauncher,这个很重要,负责ApplicationMaster的启动,必须重点分析下。

this.applicationMasterLauncher = createAMLauncher();
		this.rmDispatcher.register(AMLauncherEventType.class, this.applicationMasterLauncher);

基于RM的管家,建立了一个AMLauncher:

protected ApplicationMasterLauncher createAMLauncher() {
		return new ApplicationMasterLauncher(this.rmContext);
	}

分析下其中的serviceInit方法:

@Override
	protected void serviceStart() throws Exception {
		launcherHandlingThread.start();
		super.serviceStart();
	}

看看其中的launchHandlingThread:

private class LauncherThread extends Thread {
		public LauncherThread() {
			super("ApplicationMaster Launcher");
		}
		@Override
		public void run() {
			while (!this.isInterrupted()) {
				Runnable toLaunch;
				try {
					toLaunch = masterEvents.take();
					launcherPool.execute(toLaunch);
				} catch (InterruptedException e) {
					LOG.warn(this.getClass().getName() + " interrupted. Returning.");
					return;
				}
			}
		}
	}

其内部封装了一个线程池,对于调度给自身的事件不断进行处理:

在serviceInit方法的最后,调用了父类的serviceInit方法,我们看下其父类CompositeService的serviceInit方法:

protected void serviceInit(Configuration conf) throws Exception {
		List<Service> services = getServices();
		if (LOG.isDebugEnabled()) {
			LOG.debug(getName() + ": initing services, size=" + services.size());
		}
		for (Service service : services) {
			service.init(conf);
		}
		super.serviceInit(conf);
	}
public List<Service> getServices() {
		synchronized (serviceList) {
			return Collections.unmodifiableList(serviceList);
		}
	}

在看源码的时候,发现RM的serviceInit方法中,所有服务都有一个操作:

protected void addService(Service service) {
		if (LOG.isDebugEnabled()) {
			LOG.debug("Adding service " + service.getName());
		}
		synchronized (serviceList) {
			serviceList.add(service);
		}
	}

调用了父类中的addService方法,把所有服务添加到了serviceList中,在这里统一予以初始化,不得不说设计很精妙,在我们分析源码的时候,必须注意看下service相关的类;最顶层的父类是Serivce类,这是个接口,定义了服务的基本操作和生命周期,其唯一的实现类,是个抽象类,为AbstractService,一般来说,并不复杂的服务继承并实现AbstractService即可,复杂的服务如RM就会继承Compositeservice(AbstractService的子类)。

在RM服务初始化完毕之后,我们接着看服务的启动部分。

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值