Spark2.3.3源码解析——RPC框架初始化详解

      作为分布式系统的spark rpc通信是一个很重要的课题。在spark2.3.3版本中rpc框架已经由akka替换成netty实现。接下来我们先介绍一下spark rpc中的组件,然后根据sparkcontext初始化过程来分析rpc组件如何初始化和使用原理。

       在spark中定义transportclient和transportserver来封装netty的channel和handler,定义box来封装bytebuf,所以spark rpc有两个重要过程初始化和通信。

一、RpcEnv初始化

 

      sparkcore所有的初始化都是从sparkcontext开始,如下图,transportclient的初始化也是在sparkContext中。

      这里的RpcEnv是一个抽象类,真正实现类是NettyRpcEnv。在NettyRpcEnv中包含了RPC通信的所有操作方法,当然也有创建transportclient所需要的参数以及createClient方法。

private[netty] class NettyRpcEnv(
    val conf: SparkConf,
    javaSerializerInstance: JavaSerializerInstance,
    host: String,
    securityManager: SecurityManager,
    numUsableCores: Int) extends RpcEnv(conf) with Logging {
  private[netty] val transportConf = SparkTransportConf.fromSparkConf(
    conf.clone.set("spark.rpc.io.numConnectionsPerPeer", "1"),
    "rpc",
    conf.getInt("spark.rpc.io.threads", 0))
  private val dispatcher: Dispatcher = new Dispatcher(this, numUsableCores)
  private val streamManager = new NettyStreamManager(this)
  private val transportContext = new TransportContext(transportConf,
    new NettyRpcHandler(dispatcher, this, streamManager))
  private def createClientBootstraps(): java.util.List[TransportClientBootstrap] = {
    if (securityManager.isAuthenticationEnabled()) {
      java.util.Arrays.asList(new AuthClientBootstrap(transportConf,
        securityManager.getSaslUser(), securityManager))
    } else {
      java.util.Collections.emptyList[TransportClientBootstrap]
    }
  }
  private val clientFactory = transportContext.createClientFactory(createClientBootstraps())
  @volatile private var fileDownloadFactory: TransportClientFactory = _
  val timeoutScheduler = ThreadUtils.newDaemonSingleThreadScheduledExecutor("netty-rpc-env-timeout")
  @volatile private var server: TransportServer = _
  private val stopped = new AtomicBoolean(false)
  private val outboxes = new ConcurrentHasMp[RpcAddress, Outbox]()

先来看一段NettyRpcEnv的参数初始化代码(有删减)。这里的初始化参数都有很重要的作用:

    • transportConf :通信配置参数
    • streamManager :流管理器
    • transportContext :通信上下文
    • dispatcher:netty handler中实际高性能异步处理消息的组件,从代码中可以看到disatcher已经封装到了nettyRpcHandler中,后面会重点讲这个的作用
    • clientFactory :创建transportclient工厂在传参的时候有一个createClientBootstraps方法,这个其实是在创建client的时候做一些补充handler比如权限校验
    • outboxes :需要传递的消息盒子

二、Dispatcher初始化

       在NettyRpcEnv对象创建的时候会直接初始化dispatcher。dispatcher里面有一个内部类用来封装rpcEndpoint、rpcEndpointRef、inbox的EndpointData。RpcEndpoint是消息处理终端属于服务端,RpcEndpointRes属于客户端代表消息终端引用。其中存储的消息是inbox和outbox这里面存储了一个message序列代表所有发送和收到的消息。

private[netty] class Dispatcher(nettyEnv: NettyRpcEnv, numUsableCores: Int) extends Logging {

  private class EndpointData(
      val name: String,
      val endpoint: RpcEndpoint,
      val ref: NettyRpcEndpointRef) {
    val inbox = new Inbox(ref, endpoint)
  }

  private val endpoints: ConcurrentMap[String, EndpointData] =
    new ConcurrentHashMap[String, EndpointData]
  private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] =
    new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]

  // Track the receivers whose inboxes may contain messages.
  private val receivers = new LinkedBlockingQueue[EndpointData]
 

       dispatcher作为中间层记录了endpoint和endpointRef之间的关系,确保消息被正确消费。endpoints、endpointRefs就记录了在服务端启动时向dispatcher注册的endpoint和endpointRef之间的关系。在后续建立通信时找到正确的处理终端消费信息。dispatcher是在nettyRpcEnv环境中初始化的,整个通信过程对于同一个环境来说只有一个dispatcher,全部由dispatcher进一步分发消息,所以他的高效是至关重要。那么他是怎么接受消息并高效分发的呢?

def registerRpcEndpoint(name: String, endpoint: RpcEndpoint): NettyRpcEndpointRef = {
  val addr = RpcEndpointAddress(nettyEnv.address, name)
  val endpointRef = new NettyRpcEndpointRef(nettyEnv.conf, addr, nettyEnv)
  synchronized {
    if (stopped) {
      throw new IllegalStateException("RpcEnv has been stopped")
    }
    if (endpoints.putIfAbsent(name, new EndpointData(name, endpoint, endpointRef)) != null) {
      throw new IllegalArgumentException(s"There is already an RpcEndpoint called $name")
    }
    val data = endpoints.get(name)
    endpointRefs.put(data.endpoint, data.ref)
    receivers.offer(data)  // for the OnStart message
  }
  endpointRef
}

       在dispatcher初始化中个人认为最重要的初始化就是threadpool线程池的初始化,这里不仅仅是开启了线程池,并且对接受消息的阻塞队列receivers 进行了监控。这个功能就交给了MessageLoop这个任务。我们来梳理一下这个消费过程。首先启动多线程并根据“spark.rpc.netty.dispatcher.numThreads”或者核数来确定线程数据,然后运行任务MessageLoop。这个任务的内容就是获取receivers中的新进消息,由于take方法是阻塞的所以当没有新消息进入时这个线程是阻塞的。当有新消息进入这个阻塞队列,dispatcher就会调用消息中的process方法进行消息消费。这整个过程一旦初始化以后就会自动完成分发消息的功能,非常方便高效。

/** Thread pool used for dispatching messages. */
private val threadpool: ThreadPoolExecutor = {
  val availableCores =
    if (numUsableCores > 0) numUsableCores else Runtime.getRuntime.availableProcessors()
  val numThreads = nettyEnv.conf.getInt("spark.rpc.netty.dispatcher.numThreads",
    math.max(2, availableCores))
  val pool = ThreadUtils.newDaemonFixedThreadPool(numThreads, "dispatcher-event-loop")
  for (i <- 0 until numThreads) {
    pool.execute(new MessageLoop)
  }
  pool
}

/** Message loop used for dispatching messages. */
private class MessageLoop extends Runnable {
  override def run(): Unit = {
    try {
      while (true) {
        try {
          val data = receivers.take()
          if (data == PoisonPill) {
            // Put PoisonPill back so that other MessageLoops can see it.
            receivers.offer(PoisonPill)
            return
          }
          data.inbox.process(Dispatcher.this)
        } catch {
          case NonFatal(e) => logError(e.getMessage, e)
        }
      }
    } catch {
      case ie: InterruptedException => // exit
    }
  }
}

三、TransportClient初始化

       介绍了dispatcher我们来看一下创建transportclient的代码。这段代码是NettyRpcEnv的入口,只需要传入通信地址即可。

private[netty] def createClient(address: RpcAddress): TransportClient = {
  clientFactory.createClient(address.host, address.port)
}

       接下来会进入工厂的创建方法。在介绍createclient方法之前,需要先介绍工厂类中的一个内部类ClinetPool,因为通信都是一对一的,所以避免不了创建多个transportclient,在工厂中使用ClientPool来管理相同地址客户端池,方便缓存客户端来重复使用。ClientPool在初始化的时候会一起初始化TransportClient,但是此时的TransportClient并没有channel和handler,只是一具空壳,他会在真正需要的时候创建与server端的连接,加入channel和handler到TransportClient中以便发送处理消息,这里其实有lazy加载的思想。从代码中可以看到除了transportclient以外,还有一个相同长度的Object[] locks,这里的locks和transportclients按照索引是一一对应的,通过对每个Transportclient分别采用不同的锁,降低并发情况下线程间对锁的争用,进而减少阻塞,提高并发度。这里的clientPool的思想比较常用,在开发过程中,遇到连接场景可以学习着使用提高性能。

private static class ClientPool {
  TransportClient[] clients;
  Object[] locks;

  ClientPool(int size) {
    clients = new TransportClient[size];
    locks = new Object[size];
    for (int i = 0; i < size; i++) {
      locks[i] = new Object();
    }
  }
}

       下面这段createClient方法其实并不是真正创建transportclient的代码而是从对应地址的clientpool获取已经创建的transportclient。简单介绍一下这个过程,显示更具传入的host和port生成InetSocketAddress,然后再在缓存池connectionPool中找是否存在clientpool连接池,如果不存在则创建一个新的clientpool并随机取出一个TransportClient,由于是新创建的clientpool,里面的transportClient自然是空的未激活的。这里为什么说transportClient是未激活的,其实是因为transportclient是对netty的channel的包装,在向channel中写入数据的时候会有其他处理(requesthandler等)这里还没有与server端建立连接将通信的channel放入client中,所以代码会进入最下面那个createClient方法来真正初始化TransportClient。如果已经存在clientpool,会直接将缓存中的transportclient返回。

private final ConcurrentHashMap<SocketAddress, ClientPool> connectionPool;

/**从缓存中获取transportclient*/
public TransportClient createClient(String remoteHost, int remotePort)
    throws IOException, InterruptedException {
  // Get connection from the connection pool first.
  // If it is not found or not active, create a new one.
  // Use unresolved address here to avoid DNS resolution each time we creates a client.
  final InetSocketAddress unresolvedAddress =
    InetSocketAddress.createUnresolved(remoteHost, remotePort);

  // Create the ClientPool if we don't have it yet.
  ClientPool clientPool = connectionPool.get(unresolvedAddress);
  if (clientPool == null) {
    connectionPool.putIfAbsent(unresolvedAddress, new ClientPool(numConnectionsPerPeer));
    clientPool = connectionPool.get(unresolvedAddress);
  }

  int clientIndex = rand.nextInt(numConnectionsPerPeer);
  TransportClient cachedClient = clientPool.clients[clientIndex];

  if (cachedClient != null && cachedClient.isActive()) {
    // Make sure that the channel will not timeout by updating the last use time of the
    // handler. Then check that the client is still alive, in case it timed out before
    // this code was able to update things.
    TransportChannelHandler handler = cachedClient.getChannel().pipeline()
      .get(TransportChannelHandler.class);
    synchronized (handler) {
      handler.getResponseHandler().updateTimeOfLastRequest();
    }

    if (cachedClient.isActive()) {
      logger.trace("Returning cached connection to {}: {}",
        cachedClient.getSocketAddress(), cachedClient);
      return cachedClient;
    }
  }

  // If we reach here, we don't have an existing connection open. Let's create a new one.
  // Multiple threads might race here to create new connections. Keep only one of them active.
  final long preResolveHost = System.nanoTime();
  final InetSocketAddress resolvedAddress = new InetSocketAddress(remoteHost, remotePort);
  final long hostResolveTimeMs = (System.nanoTime() - preResolveHost) / 1000000;
  if (hostResolveTimeMs > 2000) {
    logger.warn("DNS resolution for {} took {} ms", resolvedAddress, hostResolveTimeMs);
  } else {
    logger.trace("DNS resolution for {} took {} ms", resolvedAddress, hostResolveTimeMs);
  }

  synchronized (clientPool.locks[clientIndex]) {
    cachedClient = clientPool.clients[clientIndex];

    if (cachedClient != null) {
      if (cachedClient.isActive()) {
        logger.trace("Returning cached connection to {}: {}", resolvedAddress, cachedClient);
        return cachedClient;
      } else {
        logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress);
      }
    }
    clientPool.clients[clientIndex] = createClient(resolvedAddress);
    return clientPool.clients[clientIndex];
  }
}

       我们来看真正创建transportclient的这个create方法,这个createClient和上面取缓存的方法是重载的,两个方法可能只是为了先取缓存才导致方法参数不同,其实这两个参数意义是一样的。下面就是我们熟悉的netty创建客户端的代码了,这里不会对netty进行讲解,只会对其中的spark特殊处理的部分进行说明。我们都知道netty中最重要的过程就是责任链pipeline,在这里也不例外,sparkrpc对pipeline中的handler做了大量的特殊处理,其中包括messageHandler、TransportChannelHandler 、dispatcher还有transportclient。我们直接看一看这里最重要的部分context.initializePipeline(ch).

private TransportClient createClient(InetSocketAddress address)
    throws IOException, InterruptedException {
  logger.debug("Creating new connection to {}", address);

  Bootstrap bootstrap = new Bootstrap();
  bootstrap.group(workerGroup)
    .channel(socketChannelClass)
    // Disable Nagle's Algorithm since we don't want packets to wait
    .option(ChannelOption.TCP_NODELAY, true)
    .option(ChannelOption.SO_KEEPALIVE, true)
    .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, conf.connectionTimeoutMs())
    .option(ChannelOption.ALLOCATOR, pooledAllocator);

  if (conf.receiveBuf() > 0) {
    bootstrap.option(ChannelOption.SO_RCVBUF, conf.receiveBuf());
  }

  if (conf.sendBuf() > 0) {
    bootstrap.option(ChannelOption.SO_SNDBUF, conf.sendBuf());
  }

  final AtomicReference<TransportClient> clientRef = new AtomicReference<>();
  final AtomicReference<Channel> channelRef = new AtomicReference<>();

  bootstrap.handler(new ChannelInitializer<SocketChannel>() {
    @Override
    public void initChannel(SocketChannel ch) {
      TransportChannelHandler clientHandler = context.initializePipeline(ch);
      clientRef.set(clientHandler.getClient());
      channelRef.set(ch);
    }
  });

  // Connect to the remote server
  long preConnect = System.nanoTime();
  ChannelFuture cf = bootstrap.connect(address);
  if (!cf.await(conf.connectionTimeoutMs())) {
    throw new IOException(
      String.format("Connecting to %s timed out (%s ms)", address, conf.connectionTimeoutMs()));
  } else if (cf.cause() != null) {
    throw new IOException(String.format("Failed to connect to %s", address), cf.cause());
  }

  TransportClient client = clientRef.get();
  Channel channel = channelRef.get();
  assert client != null : "Channel future completed successfully with null client";

  // Execute any client bootstraps synchronously before marking the Client as successful.
  long preBootstrap = System.nanoTime();
  logger.debug("Connection to {} successful, running bootstraps...", address);
  try {
    for (TransportClientBootstrap clientBootstrap : clientBootstraps) {
      clientBootstrap.doBootstrap(client, channel);
    }
  } catch (Exception e) { // catch non-RuntimeExceptions too as bootstrap may be written in Scala
    long bootstrapTimeMs = (System.nanoTime() - preBootstrap) / 1000000;
    logger.error("Exception while bootstrapping client after " + bootstrapTimeMs + " ms", e);
    client.close();
    throw Throwables.propagate(e);
  }
  long postBootstrap = System.nanoTime();

  logger.info("Successfully created connection to {} after {} ms ({} ms spent in bootstraps)",
    address, (postBootstrap - preConnect) / 1000000, (postBootstrap - preBootstrap) / 1000000);

  return client;
}

       进入这个方法可以看到会将之前在NettyRpcEnv中初始化的NettyRpcHandler(这里面包含了dispatcher)作为参数传入,这里面主要有两个过程一个是创建通用channelhandler(客户端和服务端是共用的),另一个就是创建pipeline。创建pipeline其实就是将handler加入到pipeline中形成责任链。前面几个handler就是普通的编解码器和计时器。这里重点讲TransportChannelHandler消息处理器,所有的消息类型都是通过TransportChannelHandler进行分发处理。

public TransportChannelHandler initializePipeline(SocketChannel channel) {
  return initializePipeline(channel, rpcHandler);
}
public TransportChannelHandler initializePipeline(
    SocketChannel channel,
    RpcHandler channelRpcHandler) {
  try {
    TransportChannelHandler channelHandler = createChannelHandler(channel, channelRpcHandler);
    channel.pipeline()
      .addLast("encoder", ENCODER)
      .addLast(TransportFrameDecoder.HANDLER_NAME, NettyUtils.createFrameDecoder())
      .addLast("decoder", DECODER)
      .addLast("idleStateHandler", new IdleStateHandler(0, 0, conf.connectionTimeoutMs() / 1000))
      // NOTE: Chunks are currently guaranteed to be returned in the order of request, but this
      // would require more logic to guarantee if this were not part of the same event loop.
      .addLast("handler", channelHandler);
    return channelHandler;
  } catch (RuntimeException e) {
    logger.error("Error while initializing Netty pipeline", e);
    throw e;
  }
}

       在这个方法里面,会先后创建TransportResponseHandler和TransportRequestHandler ,这两个handler都继承自MessageHandler。MessageHandler中定义了handle、channelActive、exceptionCaught、channelInactive接口,handle方法就是对消息进行处理。messagehandler是一个泛型类,它指定Message子类作为消息参数。

       TransportResponseHandler对应的是客户端发送消息后对返回的消息进行处理的handler,TransportRequestHandler 则是服务端接受到消息对接收到的消息进行处理的handler。transportclient的主要作用是通过channel向服务端发送请求并且对返回结果进行处理,在transportclient发送请求后将callback处理通过addRpcRequet加入到TransportResponseHandler中,所以他有两个主要参数channel和responseHandler 。而TransportRequestHandler则是对于TransportServer来说的,对请求进行处理然后返回消息给客户端,所以需要channel, client,rpcHandler。TransportChannelHandler则是将这三者整合对客户端服务端统一进行处理,对消息的类型进行判断区分是服务端还是客户端,调用各自的handle方法。

private TransportChannelHandler createChannelHandler(Channel channel, RpcHandler rpcHandler) {
  TransportResponseHandler responseHandler = new TransportResponseHandler(channel);
  TransportClient client = new TransportClient(channel, responseHandler);
  TransportRequestHandler requestHandler = new TransportRequestHandler(channel, client,
    rpcHandler, conf.maxChunksBeingTransferred());
  return new TransportChannelHandler(client, responseHandler, requestHandler,
    conf.connectionTimeoutMs(), closeIdleConnections);
}

      在createClient中的最后一步就是用TransportClientBootstrap对client做一些引导工作。到这里transportclient就差不多初始化完成了。TransportServer的初始化和Transportclient大同小异。

四、spark rpc通信过程

      下图是spark rpc框架调用过程的uml图,可以帮助理解spark rpc框架各个组件之间是如何运作的。从下图中可以很直观的看到dispatcher扮演的角色,引导接收到的消息给指定终端进行消费。

        值得注意的是在使用nettyRpcEndpointRef发送请求时,内部会根据地址来判断是属于本地发送还是远程通信,如果是本地会直接调用dispatcher的分发方法,如果是远程则通过client进行远程发送。

private[netty] def send(message: RequestMessage): Unit = {
    val remoteAddr = message.receiver.address
    if (remoteAddr == address) {
      // Message to a local RPC endpoint.
      try {
        dispatcher.postOneWayMessage(message)
      } catch {
        case e: RpcEnvStoppedException => logDebug(e.getMessage)
      }
    } else {
      // Message to a remote RPC endpoint.
      postToOutbox(message.receiver, OneWayOutboxMessage(message.serialize(this)))
    }
  }

 

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值