YarnRpc例子-ResourceTracker协议分析

最新推荐文章于 2022-02-28 15:13:10 发布

吴春成-ZJU

最新推荐文章于 2022-02-28 15:13:10 发布

阅读量1.9k

点赞数

本文链接：https://blog.csdn.net/lvzhuyiyi/article/details/51766733

版权

本文详细介绍了Hadoop YARN框架中ResourceManager和NodeManager之间的通信机制，包括ResourceTracker服务的实现原理、客户端与服务器端的交互过程以及重试策略等关键技术细节。

摘要由CSDN通过智能技术生成

ResourceManager和NodeManager之间的通信协议是ResourceTracker.。

服务器端和客户端实现都满足，包结构和类名都符合上文所说的规范，ResourceTrackerPBServiceImpl实现了PB服务的BlockingInterface，实际上代理了ResourceTrackerService（真正实现类）的方法，

第一章

. 首先我们介绍Hadoop对PB参数和返回值的Java封装机制，客户端要从java类中释放它的PB原型，然后调用proxy相应方法。服务器端则是要封装它，调用真正的

ResourceTrackerService实现类进行相关操作。

比如抽象类RegisterNodeManagerRequest，它对应的PB Message是

message RegisterNodeManagerRequestProto {
  optional NodeIdProto node_id = 1;
  optional int32 http_port = 3;
  optional ResourceProto resource = 4;
  optional string nm_version = 5;
  repeated NMContainerStatusProto container_statuses = 6;
  repeated ApplicationIdProto runningApplications = 7;
}

字段完全对应，只是抽象类的get/set方法是抽象的，真正封装PB消息的是它的实现类RegisterNodeManagerRequestPBImpl。

    下面我们来分析这个实现类，除了上面必要的字段外，还有三个重要字段，proto、builder、viaProto。viaProto是Bolean

字段为true说明通过proto返回字段信息，否则通过builder。构造函数因此也分为两个。

     封装类要get一个字段首先检查对应成员变量，不为null返回，否则检查proto或builder是否有这个字段，没有返回null，有

则从proto消息转换到成员字段再返回。

     接着我们分析getProto方法，此方法用于proto和java pojo类之间的转换，首先调用mergeLocalToProto方法，此方法如果viaProto

为true会先调用maybeInitBuilder方法，此方法如果builder为null会创建，不为null但viaProto为true也会重新创建，最后把viaProto

置为false。然后调用mergeLocalToBuilder方法，就是把java pojo类非null的成员变量转换为Proto形式（调用成员变量.getProto

方法）后设置到builder中,最后调用builder.build()构建proto，把viaProto置为true，然后再返回这个Proto。

     下面分析成员变量set方法，先调用maybeInitBuilder方法，如果viaProto为true或者builder为null，则创建builder（为了重新

创建Proto，重置builder,viaProto为false表示Proto还在builder过程中，新数据在builder中），并把viaProto置为false，如果

set方法的参数为null，则清空builder中相应字段，否则设置成员变量的值即可。builder中的属性值只有在调用getProto时才会导入到

proto。

     第二章

    然后我们来分析，客户端的类，

     2.1 ResourceTrackerPBClientImpl比较简单，构造函数注册ResourceTrackerPB。class和Protobuf

RpcEngine的对应关系。用Rpc工厂类获取Proxy对象。剩下是几个协议方法，把参数java类获取他们封装在内部的Proto，调用proxy

对象的对应的方法，并封装返回的proto成java bean。

      客户端从开始调用起是NodeManager,他持有一个NodeStatusUpdater对象，NodeStatusUpdater类持有一个resourceTracker对象。

NodeStatusUpdater对象在resyncWithRM方法中会调用rebootNodeStatusUpdaterAndRegisterWithRM方法，该方法中会调用
resourceTracker对象的registerNodeManager方法。
     至于ResourceTracker的另一个rpc方法调用是在NodeManager的service.start()中，由于它继承自compositService所以他还包含
其他service,比如NodeStatusUpdater服务，NodeManager服务在service.init中会在自身下级服务加入NodeStatusUpdater服务，然后
在service.start()中调用NodeStatusUpdater的service.start().此方法进一步调用startStatusUpdater方法，此方法会启动一个线程
，run方法中会调用resourceTracker.nodeHeartbeat方法。
     2.2 接下来就是NodeStatusUpdater类的resourceTracker对象的创建问题，他来自getRMClient方法，里面调用ServerRMProxy.
createRMProxy(conf, ResourceTracker.class)方法，里面调用同名其他方法。此方法先创建retryPolicy
    RetryPolicy接口（方法shouldRetry（Exception e, int retries(重试数), int failovers（故障备援转移次数）, 
boolean isIdempotentOrAtMostOnce（方法是否是幂等性））返回RetryAction）的各种针对产生的异常的重试策略，RetryAction有
失败，重试，故障恢复后重试三种，并有重试时间字段。RetryAction的不同取决于Exception的不同。这里是RetryPolicy
实现类FailoverOnNetworkExceptionRetry重试时间以指数（*2）增长。
   //retries当前第几次重试 failovers已恢复次数  @Override
  public RetryAction shouldRetry(Exception e, int retries,
      int failovers, boolean isIdempotentOrAtMostOnce) throws Exception {
    //恢复次数超过阀值抛出异常
    if (failovers >= maxFailovers) {
      return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
          "failovers (" + failovers + ") exceeded maximum allowed ("
          + maxFailovers + ")");
    }
    //重试次数超过阀值跑出异常
    if (retries - failovers > maxRetries) {
      return new RetryAction(RetryAction.RetryDecision.FAIL, 0, "retries ("
          + retries + ") exceeded maximum allowed (" + maxRetries + ")");
    }
    //连不上都应该恢复重试
    if (e instanceof ConnectException ||
        e instanceof NoRouteToHostException ||
        e instanceof UnknownHostException ||
        e instanceof StandbyException ||
        e instanceof ConnectTimeoutException ||
        isWrappedStandbyException(e)) {
      return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
          getFailoverOrRetrySleepTime(failovers));
    //指定错误都应该重试
    } else if (e instanceof RetriableException
        || getWrappedRetriableException(e) != null) {
      // RetriableException or RetriableException wrapped 
      return new RetryAction(RetryAction.RetryDecision.RETRY,
            getFailoverOrRetrySleepTime(retries));
    //其他socket或IOException，除了RemoteException（IOException子类）,方法为幂等性的就重试，否则失败
    } else if (e instanceof SocketException
        || (e instanceof IOException && !(e instanceof RemoteException))) {
      if (isIdempotentOrAtMostOnce) {
        return RetryAction.FAILOVER_AND_RETRY;
      } else {
        return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
            "the invoked method is not idempotent, and unable to determine "
                + "whether it was invoked");
      }
    其他Exception或服务端错误(RemoteException)则用fallbackPolicy，立刻失败！
    } else {
        return fallbackPolicy.shouldRetry(e, retries, failovers,
            isIdempotentOrAtMostOnce);
    }
  }
}
    如果支持HA则创建ConfiguredRMFailoverProxyProvider（支持恢复重试的proxy提供者）,此类最重要的是getProxy（）方法，获取真正
的proxy，最后调用的是RMProxy.getProxy方法，
   @Private
static <T> T getProxy(final Configuration conf,
    final Class<T> protocol, final InetSocketAddress rmAddress)
    throws IOException {
  return UserGroupInformation.getCurrentUser().doAs(
    new PrivilegedAction<T>() {
      @Override
      public T run() {
        return (T) YarnRPC.create(conf).getProxy(protocol, rmAddress, conf);
      }
    });
}
  正好调用YarnRpc API。
  然后再调用RetryProxy.create方法，最后创建动态代理的方法是：
   /**
 * Create a proxy for an interface of implementations of that interface using
 * the given {@link FailoverProxyProvider} and the same retry policy for each
 * method in the interface.
 * 
 * @param iface the interface that the retry will implement
 * @param proxyProvider provides implementation instances whose methods should be retried
 * @param retryPolicy the policy for retrying or failing over method call failures
 * @return the retry proxy
 */
public static <T> Object create(Class<T> iface,
    FailoverProxyProvider<T> proxyProvider, RetryPolicy retryPolicy) {
 //动态代理
  return Proxy.newProxyInstance(
      proxyProvider.getInterface().getClassLoader(),
      new Class<?>[] { iface },
 // ConfiguredRMFailoverProxyProvider
      new RetryInvocationHandler<T>(proxyProvider, retryPolicy)
      );
}
   
 2.3 接下来我们看看RetryInvocationHandler的构造函数：
protected RetryInvocationHandler(FailoverProxyProvider<T> proxyProvider,
    RetryPolicy defaultPolicy,
    Map<String, RetryPolicy> methodNameToPolicyMap) {
  this.proxyProvider = proxyProvider;
  this.defaultPolicy = defaultPolicy;
  this.methodNameToPolicyMap = methodNameToPolicyMap;
//返回包含真正Proxy的proxyInfo
  this.currentProxy = proxyProvider.getProxy();
}

还有invoke方法
Override
public Object invoke(Object proxy, Method method, Object[] args)
  throws Throwable {
 //缓存的重试策略				
  RetryPolicy policy = methodNameToPolicyMap.get(method.getName());
  if (policy == null) {
    policy = defaultPolicy;
  }
  
  // The number of times this method invocation has been failed over.
  int invocationFailoverCount = 0;
 //proxy是否是Proxy类的实例而且它的InvocationHandler是RpcInvocationHandler
  final boolean isRpc = isRpcInvocation(currentProxy.proxy);
  final int callId = isRpc? Client.nextCallId(): RpcConstants.INVALID_CALL_ID;
  int retries = 0;
 //包含多次rpc重试
  while (true) {
    // The number of times this invocation handler has ever been failed over,
    // before this method invocation attempt. Used to prevent concurrent
    // failed method invocations from triggering multiple failover attempts.
    long invocationAttemptFailoverCount;
    synchronized (proxyProvider) {
      invocationAttemptFailoverCount = proxyProviderFailoverCount;
    }

    if (isRpc) {
//检查两个参数是否是无效值，而且原来的callId要为空
      Client.setCallIdAndRetryCount(callId, retries);
    }
    try {
//用真正proxy来执行此方法。
      Object ret = invokeMethod(method, args);
      hasMadeASuccessfulCall = true;
      return ret;
    } catch (Exception e) {
//如果出错，就看逻辑是否重试
      if (Thread.currentThread().isInterrupted()) {
        // If interrupted, do not retry.
        throw e;
      }
//从方法的注释看方法是幂等的或者最多一次的
      boolean isIdempotentOrAtMostOnce = proxyProvider.getInterface()
          .getMethod(method.getName(), method.getParameterTypes())
          .isAnnotationPresent(Idempotent.class);
      if (!isIdempotentOrAtMostOnce) {
        isIdempotentOrAtMostOnce = proxyProvider.getInterface()
            .getMethod(method.getName(), method.getParameterTypes())
            .isAnnotationPresent(AtMostOnce.class);
      }
//传入retries次数,failover次数，该方法如上面分析，获得RetryAction。
      RetryAction action = policy.shouldRetry(e, retries++,
          invocationFailoverCount, isIdempotentOrAtMostOnce);
      if (action.action == RetryAction.RetryDecision.FAIL) {
        //抛出失败原因
        if (action.reason != null) {
          LOG.warn("Exception while invoking " + currentProxy.proxy.getClass()
              + "." + method.getName() + " over " + currentProxy.proxyInfo
              + ". Not retrying because " + action.reason, e);
        }
        throw e;
      } else { // retry or failover
        // avoid logging the failover if this is the first call on this
        // proxy object, and we successfully achieve the failover without
        // any flip-flopping
//第一次失败重试没日志
        boolean worthLogging = 
          !(invocationFailoverCount == 0 && !hasMadeASuccessfulCall);
        worthLogging |= LOG.isDebugEnabled();
//根据条件不同或者是否开启debug模式打印不同日志
        if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY &&
            worthLogging) {
          String msg = "Exception while invoking " + method.getName()
              + " of class " + currentProxy.proxy.getClass().getSimpleName()
              + " over " + currentProxy.proxyInfo;

          if (invocationFailoverCount > 0) {
            msg += " after " + invocationFailoverCount + " fail over attempts"; 
          }
          msg += ". Trying to fail over " + formatSleepMessage(action.delayMillis);
          LOG.info(msg, e);
//action为retry或者FAILOVER_AND_RETRY的第一次恢复而且开启的debug模式，打印以下日志。
        } else {
          if(LOG.isDebugEnabled()) {
            LOG.debug("Exception while invoking " + method.getName()
                + " of class " + currentProxy.proxy.getClass().getSimpleName()
                + " over " + currentProxy.proxyInfo + ". Retrying "
                + formatSleepMessage(action.delayMillis), e);
          }
        }
        //睡眠重试策略的间隔
        if (action.delayMillis > 0) {
          Thread.sleep(action.delayMillis);
        }
        
        if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
          // Make sure that concurrent failed method invocations only cause a
          // single actual fail over.
          synchronized (proxyProvider) {
//防止别的地方也同时进行恢复
            if (invocationAttemptFailoverCount == proxyProviderFailoverCount) {
              //把ResourceManager的id换为下一个HA RM列表的id
              proxyProvider.performFailover(currentProxy.proxy);
              proxyProviderFailoverCount++;
            } else {
              LOG.warn("A failover has occurred since the start of this method"
                  + " invocation attempt.");
            }
           //获取对应新的ID RM Address的Proxy
            currentProxy = proxyProvider.getProxy();
          }
          invocationFailoverCount++;
        }
      }
    }
  }
}
       然后我们看看proxyProvider.getProxy()方法
      final InetSocketAddress rmAddress = rmProxy.getRMAddress(conf, protocol);
   
   
    
    

   
   
   
   
    
    最后调用
   
   
   
   
    
    

    
    conf.getSocketAddr(
  YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,
  YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,
  YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_PORT);


    
        会去获取当前RM ID，然后再去配置文件获取当前ID的Address

         让我们看看failover的方法（ConfiguredRMFailoverProxyProvider）@Override
public synchronized void performFailover(T currentProxy) {
 //换成新的id下标
  currentProxyIndex = (currentProxyIndex + 1) % rmServiceIds.length;
  //设置当前resourceManager的id，
  conf.set(YarnConfiguration.RM_HA_ID, rmServiceIds[currentProxyIndex]);
  LOG.info("Failing over to " + rmServiceIds[currentProxyIndex]);
}

第三章 服务端部分代码

    服务端代码在ResourceManager，他有一个ResourceTrackerService类成员变量，该类既是协议的实现类，又是服

务器端的启动代码，resourceTrackerService它是ResourceManager组合服务的一个子服务，会被调用start和init方法

，init方法是读取配置文件的配置，start方法如下：

   @Override
protected void serviceStart() throws Exception {
  super.serviceStart();
  // ResourceTrackerServer authenticates NodeManager via Kerberos if
  // security is enabled, so no secretManager.
  Configuration conf = getConfig();
//使用YarnRpc类
  YarnRPC rpc = YarnRPC.create(conf);
  this.server =
    rpc.getServer(ResourceTracker.class, this, resourceTrackerAddress,
        conf, null,
        conf.getInt(YarnConfiguration.RM_RESOURCE_TRACKER_CLIENT_THREAD_COUNT, 
            YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_CLIENT_THREAD_COUNT));
  
  // Enable service authorization?
  //如果支持认证，则加入或刷新安全认证的配置。
  if (conf.getBoolean(
      CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, 
      false)) {
    InputStream inputStream =
        this.rmContext.getConfigurationProvider()
            .getConfigurationInputStream(conf,
                YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE);
    if (inputStream != null) {
      conf.addResource(inputStream);
    }
    refreshServiceAcls(conf, RMPolicyProvider.getInstance());
  }

  this.server.start();
  conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
      YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,
      YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,
                         server.getListenerAddress());
}