Hadoop源码分析-RPC

最新推荐文章于 2017-08-16 09:57:47 发布

zqhxuyuan

最新推荐文章于 2017-08-16 09:57:47 发布

阅读量149

点赞数

分类专栏： Hadoop 文章标签： Hadoop 源码分析

本文链接：https://blog.csdn.net/zqhxuyuan/article/details/84430975

版权

Hadoop 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

方法	说明
waitForProxy	保证namenode启动正常且连接正常，主要由SecondayNode、Datanode、JobTracker使用
stopProxy	停止代理
getProxy	创建代理实例,获得代理实例的versioncode,再与getProxy()传入的versioncode做对比, 相同返回代理,不同抛出VersionMismatch异常
getServer	创建并返回一个Server实例，由TaskTracker、JobTracker、NameNode、DataNode使用
call	向一系列服务器发送一系列请求，在源码中没见到那个类使用该方法。但注释提到了：Expert，应该是给系统管理员使用的接口

内部类	说明
ClientCache	缓存Client对象
Invocation	用于封装方法名和参数，作为数据传输层。每次RPC调用传的参数实体类，其中Invocation包括了调用方法和配置文件
Invoker	具体的调用类，采用动态代理机制，继承InvocationHandler，有remoteId和client成员，id用以标识异步请求对象，client用以调用实现代码
Server	Server的具体类，实现了抽象类的call方法，获得传入参数的call实例，再获取method方法，使用反射机制调用具体的方法
VersionMismatch	版本不匹配异常,三个参数interfaceName, clientVersion, serverVersion

Invocation类仅作为VO，ClientCache类只是作为缓存，而Server类用于服务端的处理，他们都和客户端的数据流和业务逻辑没有关系。重点是Invoker类

Invoker使用了Java的动态代理：

Dynamic Proxy是由两个class实现的：java.lang.reflect.Proxy 和 java.lang.reflect.InvocationHandler, 后者是一个接口。

动态代理类是指在运行时生成的class, 在生成它时你必须提供一组interface给它, 然后该class就宣称它实现了这些interface，即接口的代理。

Dynamic Proxy是典型的Proxy模式, 它不会替你作实质性的工作, 在生成它的实例时你必须提供一个handler(InvocationHandler), 由它接管实际的工作。

这个handler, 在Hadoop的RPC中, 就是Invoker对象，Invoker实现了InvocationHandler接口。

可以简单地理解：通过一个接口来生成一个类, 这个类上的所有方法调用, 都会传递到你生成类时传递的InvocationHandler实现中。

在Hadoop的RPC中, Invoker实现了InvocationHandler的invoke方法（invoke方法也是InvocationHandler的唯一方法）。

Invoker会把所有跟这次调用相关的调用方法名, 参数类型列表, 参数列表打包, 然后利用Client, 通过socket传递到服务器端。

就是说, 在proxy类上的任何调用, 都通过Client发送到远方的服务器上。

Invoker使用Invocation。Invocation封装了一个远程调用的所有相关信息, 它的主要属性有:

methodName, 调用方法名, parameterClasses, 调用方法参数的类型列表和parameters, 调用方法参数。注意, 它实现了Writable接口, 可以串行化。

RPC.Server实现了org.apache.hadoop.ipc.Server, 你可以把一个对象, 通过RPC, 升级成为一个服务器。

服务器接收到请求，接收到的是Invocation对象, 反序列化后, 得到方法名, 方法参数列表和参数列表。

利用Java反射, 我们就可以调用对应的对象的方法。

调用的结果再通过socket, 返回给客户端, 客户端把结果解包后, 就可以返回给Dynamic Proxy的使用者了。

接口协议

把某些接口和接口中的方法称为协议，客户端和服务端只要实现这些接口中的方法就可以进行通信了

Hadoop的RPC机制正是采用了这种“架构层次的协议”，有一整套作为协议的接口

/**
 * Superclass of all protocols that use Hadoop RPC. 所有RPC协议接口的父接口
 */
public interface VersionedProtocol {
  /**
   * Return protocol version corresponding to protocol interface. 返回对应的协议接口的协议版本
   * @param protocol The classname of the protocol interface 协议接口的类名
   * @param clientVersion The version of the protocol that the client speaks 客户端版本
   * @return the version that the server will speak 服务器版本
   */
  public long getProtocolVersion(String protocol, long clientVersion) throws IOException;
}

实现VersionedProtocol接口的接口

HDFS相关

协议接口
ClientDatanodeProtocol	client与datanode交互的接口，操作不多，只有一个block恢复的方法。那么，其它数据请求的方法呢？client与datanode主要交互是通过流式的socket实现，源码在DataXceiver
ClientProtocol	client与Namenode交互的接口，所有控制流的请求均在这里，如：创建文件、删除文件等
DatanodeProtocol	Datanode与Namenode交互的接口，如心跳、blockreport等
NamenodeProtocol	SecondaryNode与Namenode交互的接口

Mapreduce相关

协议接口
InterDatanodeProtocol	Datanode内部交互的接口，用来更新block的元数据
InnerTrackerProtocol	TaskTracker与JobTracker交互的接口，功能与DatanodeProtocol相似
JobSubmissionProtocol	JobClient与JobTracker交互的接口，用来提交Job、获得Job等与Job相关的操作
TaskUmbilicalProtocol	Task中子进程与母进程交互的接口，子进程即map reduce等操作，母进程即TaskTracker，该接口会汇报子进程的运行状态

其它

协议接口
AdminOperationProtocol	不用用户操作的接口，提供一些管理操作，如刷新JobTracker的node列表
RefreshAuthorizationPolicyProtocol
RefreshUserMappingsProtocol

Invocation

  /** A method invocation, including the method name and its parameters.*/
  private static class Invocation implements Writable, Configurable { //实现hadoop的序列化接口Writable,因为要在Client和Server之间传输该对象
    private String methodName;  // The name of the method invoked. 
    private Class[] parameterClasses;  // The parameter classes. 
    private Object[] parameters;  // The parameter instances. 
    private Configuration conf;

    public Invocation() {}
    public Invocation(Method method, Object[] parameters) {
      this.methodName = method.getName();
      this.parameterClasses = method.getParameterTypes();
      this.parameters = parameters;
　　}
　　// 序列化
    public void readFields(DataInput in) throws IOException {
      methodName = UTF8.readString(in);
      parameters = new Object[in.readInt()];
      parameterClasses = new Class[parameters.length];
      ObjectWritable objectWritable = new ObjectWritable();
      for (int i = 0; i < parameters.length; i++) { //数组类型,每个数组元素也都需要序列化
        parameters[i] = ObjectWritable.readObject(in, objectWritable, this.conf);
        parameterClasses[i] = objectWritable.getDeclaredClass();
      }
    }
　　// 反序列化
    public void write(DataOutput out) throws IOException {
      UTF8.writeString(out, methodName);
      out.writeInt(parameterClasses.length);
      for (int i = 0; i < parameterClasses.length; i++) {
        ObjectWritable.writeObject(out, parameters[i], parameterClasses[i], conf);
      }
　　} 
  }

ClientCache

  /* Cache a client using its socket factory as the hash key */
  static private class ClientCache {
    private Map<SocketFactory, Client> clients = new HashMap<SocketFactory, Client>();

    /**
     * Construct & cache an IPC client with the user-provided SocketFactory if no cached client exists.
     * @param conf Configuration
     * @return an IPC client
     */
    private synchronized Client getClient(Configuration conf, SocketFactory factory) {
      // Construct & cache client.  The configuration is only used for timeout, and Clients have connection pools.
      // So we can either (a) lose some connection pooling and leak sockets,
      // or (b) use the same timeout for all configurations.
      // Since the IPC is usually intended globally, not per-job, we choose (a).
      Client client = clients.get(factory);
      if (client == null) {
        client = new Client(ObjectWritable.class, conf, factory);
        clients.put(factory, client);
      } else {
        client.incCount();
      }
      return client;
    }
    /**
     * Construct & cache an IPC client with the default SocketFactory if no cached client exists.
     */
    private synchronized Client getClient(Configuration conf) {
      return getClient(conf, SocketFactory.getDefault());
    }

    /**
     * Stop a RPC client connection 
     * A RPC client is closed only when its reference count becomes zero.
     */
    private void stopClient(Client client) {
      synchronized (this) {
        client.decCount();
        if (client.isZeroReference()) {
          clients.remove(client.getSocketFactory());
        }
      }
      if (client.isZeroReference()) {
        client.stop();
      }
    }
  }
  private static ClientCache CLIENTS=new ClientCache();

  static Client getClient(Configuration conf) { //for unit testing only
    return CLIENTS.getClient(conf);
  }

Invoker

  private static class Invoker implements InvocationHandler {
    private Client.ConnectionId remoteId;
    private Client client;
    private boolean isClosed = false;

    private Invoker(Class<? extends VersionedProtocol> protocol, InetSocketAddress address, UserGroupInformation ticket,
        Configuration conf, SocketFactory factory, int rpcTimeout, RetryPolicy connectionRetryPolicy) throws IOException {
      this.remoteId = Client.ConnectionId.getConnectionId(address, protocol, ticket, rpcTimeout, connectionRetryPolicy, conf);
      this.client = CLIENTS.getClient(conf, factory);
    }

    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
      ObjectWritable value = (ObjectWritable) client.call(new Invocation(method, args), remoteId);
      return value.get();
    }
    
    /* close the IPC client that's responsible for this invoker's RPCs */ 
    synchronized private void close() {
      if (!isClosed) {
        isClosed = true;
        CLIENTS.stopClient(client);
      }
    }
  }

一般我们看到的动态代理的invoke()方法中总会有 method.invoke(ac, arg); 而上面invoke() 中却没有,这是为什么? 其实使用 method.invoke(ac, arg); 是在本地JVM中调用；而在hadoop中，是将数据发送给服务端，服务端将处理的结果再返回给客户端，所以这里的invoke()方法必然需要进行网络通信.要让服务端能知道客户端想要调用的是哪个接口，接口和其他参数比如要调用的地址等封装为remoteId, 这是Client的内部类ConnectionId,唯一确定一个连接

waitForProxy()

  static VersionedProtocol waitForProxy(Class<? extends VersionedProtocol> protocol,
	  long clientVersion, InetSocketAddress addr, Configuration conf, int rpcTimeout, long connTimeout) throws IOException { 
    long startTime = System.currentTimeMillis();
    IOException ioe;
    while (true) {
      try {
        return getProxy(protocol, clientVersion, addr, conf, rpcTimeout);
      } catch(ConnectException se) {  // namenode has not been started
        LOG.info("Server at " + addr + " not available yet, Zzzzz...");
        ioe = se;
      } catch(SocketTimeoutException te) {  // namenode is busy
        LOG.info("Problem connecting to server: " + addr);
        ioe = te;
      }
      // check if timed out
      if (System.currentTimeMillis()-connTimeout >= startTime) {
        throw ioe;
      }
      // wait for retry
      try {
        Thread.sleep(1000);
      } catch (InterruptedException ie) {} // IGNORE
    }
  }

getProxy()

  /** Construct a client-side proxy object that implements the named protocol,talking to a server at the named address. */
  public static VersionedProtocol getProxy(Class<? extends VersionedProtocol> protocol, long clientVersion, InetSocketAddress addr, 
 	  UserGroupInformation ticket, Configuration conf, SocketFactory factory, int rpcTimeout, RetryPolicy connectionRetryPolicy) throws IOException {
    if (UserGroupInformation.isSecurityEnabled()) {
      SaslRpcServer.init(conf);
    }
    final Invoker invoker = new Invoker(protocol, addr, ticket, conf, factory, rpcTimeout, connectionRetryPolicy);
    VersionedProtocol proxy = (VersionedProtocol)Proxy.newProxyInstance(protocol.getClassLoader(), new Class[]{protocol}, invoker);
    long serverVersion = proxy.getProtocolVersion(protocol.getName(), clientVersion);
    if (serverVersion == clientVersion) {
      return proxy;
    } else {
      throw new VersionMismatch(protocol.getName(), clientVersion, serverVersion);
    }
  }

和05中的RPC类的getProxy()方法相似。

getProxy()会生成协议接口VersionedProtocol的代理对象，当客户端调用接口的方法，会回调Invoker对象的invoke方法

Invoker实现了Java的InvocationHandler接口，和例子一样，客户端会发送封装好的Invocation对象给服务端。

Invocation对象封装了客户端想要调用的服务端的接口，方法，参数。

服务端会调用具体的接口的方法，并返回方法的执行结果，类型为ObjectWritable

    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
      ObjectWritable value = (ObjectWritable) client.call(new Invocation(method, args), remoteId);
      return value.get();
    }

下一节分析client.call()是怎么将Invocation对象从客户端想服务端发送。

在分析Client之前，要明确以下几点目标:

客户端和服务端的连接是怎样建立的?
2. 客户端是怎样给服务端发送数据的?
3. 客户端是怎样获取服务端的返回数据的

现在来总结下Hadoop的RPC和我们自己实现的RPC的映射关系

角色	作用	05例子的对应类
Client	RPC服务的客户端	Client
RPC	实现了一个简单的RPC模型	RPC
Server	服务端的抽象类	Server接口
RPC.Server	服务端的具体类	RPC.RPCServer
VersionedProtocol	所有使用RPC服务的类都要实现该接口，在创建代理时用来判断代理对象是否创建正确	Echo接口
Invoker	动态代理	InvocationHandler
Invocation (RPC) Call (Client/Server)	封装客户端要调用的接口，方法，参数；以及服务端返回的方法执行结果	Invocation
Connection(Client) Listener(Server)	处理远程连接对象: 监听客户端写入; 转发给服务端调用具体方法; 向客户端写回数据	Listener