Spark存储体系——Block传输服务

BlockTransferService是BlockManager的子组件之一,抽象类BlockTransferService有个实现类NettyBlockTransferService,BlockManager实际采用了NettyBlockTransferService提供的Block传输服务。

为什么要把由Netty实现的网络服务组件也放到存储体系里,由于Spark是分布式部署的,每个Task(准确说是任务尝试)最终都运行在不同的机器节点上。map任务的输出结果直接存储到map任务所在机器的存储体系中,reduce任务极有可能不在同一机器上运行,所以需要远程下载map任务的中间输出。NettyBlockTransferService提供了可以被其它节点的客户端访问的Shuffle服务。

有了Shuffle的服务端,那么也需要相应的Shuffle客户端,以便当前节点将Block上传到其它节点或者从其它节点下载Block到本地。BlockManager中创建Shuffle客户端的代码如下:

//org.apache.spark.storage.BlockManager
private[spark] val shuffleClient = if (externalShuffleServiceEnabled) {
  val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores)
  new ExternalShuffleClient(transConf, securityManager, securityManager.isAuthenticationEnabled(),
    securityManager.isSaslEncryptionEnabled())
} else {
  blockTransferService
}

由代码可知,如果部署了外部的Shuffle服务,则需要配置spark.shuffle.service.enabled属性为true(此属性将决定externalShuffleServiceEnabled的值,默认是false),此时将创建ExternalShuffleClient。但在默认情况下,NettyBlockTransferService也会作为Shuffle的客户端

1 初始化NettyBlockTransferService

NettyBlockTransferService只有在其init方法被调用 ,即被初始化后才提供服务。根据块管理器BlockManager可知,BlockManager在初始化的时候,将调用NettyBlockTransferService的init方法

//org.apache.spark.network.netty.NettyBlockTransferService
override def init(blockDataManager: BlockDataManager): Unit = {
  val rpcHandler = new NettyBlockRpcServer(conf.getAppId, serializer, blockDataManager)
  var serverBootstrap: Option[TransportServerBootstrap] = None
  var clientBootstrap: Option[TransportClientBootstrap] = None
  if (authEnabled) {
    serverBootstrap = Some(new SaslServerBootstrap(transportConf, securityManager))
    clientBootstrap = Some(new SaslClientBootstrap(transportConf, conf.getAppId, securityManager,
      securityManager.isSaslEncryptionEnabled()))
  }
  transportContext = new TransportContext(transportConf, rpcHandler)
  clientFactory = transportContext.createClientFactory(clientBootstrap.toSeq.asJava)
  server = createServer(serverBootstrap.toList)
  appId = conf.getAppId
  logInfo(s"Server created on ${hostName}:${server.getPort}")
}

其初始化步骤:

  • 1)创建NettyBlockRpcServer,NettyBlockRpcServer继承了RpcHandler,服务端对客户端的Block读写请求的处理都交给了RpcHandler的实现类,因此NettyBlockRpcServer将处理Block块的RPC请求
  • 2)准备服务引导程序TransportServerBootstrap和客户端引导程序TransportClientBootstrap
  • 3)创建TransportContext
  • 4)创建传输客户端工厂TransportClientFactory
  • 5)创建TransportServer
  • 6)获取当前应用的ID

2 NettyBlockRpcServer详解

下面重点来看看NettyBlockTransferService与NettyRpcEnv的最大区别——使用的RpcHandler实现类不同,NettyRpcEnv采用了NettyRpcHandler,而NettyBlockTransferService则采用了NettyBlockRpcServer

2.1 OneForOneStreamManager的实现

NettyBlockRpcServer中使用OneForOneStreamManager来提供一对一的流服务。OneForOneStreamManager实现StreamManager的registerChannel、getChunk、connectionTerminated、checkAuthorization、registerStream五个方法。根据文章《Spark内置的RPC框架》中第6段——服务端RpcHandler详解,可知TransportRequesthandler的processFetchRequest方法用到了StreamManager的checkAuthorization、registerChannel和getChunk三个方法,OneForOneStreamManager将处理ChunkFetchRequest类型的消息。

OneForOneStreamManager中使用StreamState来维护流的状态

//org.apache.spark.network.server.OneForOneStreamManager
private static class StreamState {
  final String appId;
  final Iterator<ManagedBuffer> buffers;
  Channel associatedChannel = null;
  int curChunk = 0;
  StreamState(String appId, Iterator<ManagedBuffer> buffers) {
    this.appId = appId;
    this.buffers = Preconditions.checkNotNull(buffers);
  }
}

StreamState包含以下属性:

  • appId:请求流所属的应用程序ID。此属性只有在ExternalShuffleClient启用后才会用到
  • buffers:ManagedBuffer的缓冲
  • associatedChannel:与当前流相关联的Channel
  • curChunk:为了保证客户端按顺序每次请求一个块,所以用此属性跟踪客户端当前接收到的ManagedBuffer的索引

有了对StreamState的了解,下面来看看OneForOneStreamManager的成员属性

  • nextStreamId:用于生成数据流的标识,类型为AtomicLong
  • streams:维护streamId与StreamState之间映射关系的缓存

下面学习OneForOneStreamManager中的方法

(1)checkAuthorization方法用以校验客户端是否有权限从给定的流中读取

//org.apache.spark.network.server.OneForOneStreamManager
@Override
public void checkAuthorization(TransportClient client, long streamId) {
  if (client.getClientId() != null) {
    StreamState state = streams.get(streamId);
    Preconditions.checkArgument(state != null, "Unknown stream ID.");
    if (!client.getClientId().equals(state.appId)) {
      throw new SecurityException(String.format(
        "Client %s not authorized to read stream %d (app %s).",
        client.getClientId(),
        streamId,
        state.appId));
    }
  }
}

如果没有配置对管理进行SASL认证(《Spark内置的RPC框架》第7段——服务端引导程序TransportServerBootstrap,其中描述了服务的SalsServerBootstrap对管理进行SASL认证的内容),TransportClient的clientId为null,因而实际上并不走权限检查。当启用了SASL认证,客户端需要给TransportClient的clientId赋值,因此才会走此检查。checkAuthorization方法检查的办法很简单,即将TransportClient的clientId属性值与streamId对应的StreamState的appId的值进行相等比较。

(2)registerChannel方法用于注册管道,其实际的作用是将一个流和一条(只能是一条)客户端的TCP连接关联起来,这可以保证对于单个的流只会有一个客户端读取。流关闭之后就永远不能够重用了。

//org.apache.spark.network.server.OneForOneStreamManager
@Override
public void registerChannel(Channel channel, long streamId) {
  if (streams.containsKey(streamId)) {
    streams.get(streamId).associatedChannel = channel;
  }
}

(3)getChunk方法用于获取单个的块(块被封装为ManagedBuffer)

@Override
public ManagedBuffer getChunk(long streamId, int chunkIndex) {
  StreamState state = streams.get(streamId);//从streams中获取StreamState
  if (chunkIndex != state.curChunk) {
    throw new IllegalStateException(String.format(
      "Received out-of-order chunk index %s (expected %s)", chunkIndex, state.curChunk));
  } else if (!state.buffers.hasNext()) {
    throw new IllegalStateException(String.format(
      "Requested chunk index beyond end %s", chunkIndex));
  }
  state.curChunk += 1;//将StreamState的curChunk加1,为下次接收请求做好准备
  ManagedBuffer nextChunk = state.buffers.next();//从buffers缓冲中获取ManagedBuffer
  if (!state.buffers.hasNext()) {//buffers缓冲中的ManagedBuffer,已经全部被客户端获取
    logger.trace("Removing stream id {}", streamId);
    streams.remove(streamId);
  }
  return nextChunk;
}

执行步骤如下:

  • 1)从streams中获取StreamState。如果要获取的块的索引与StreamState的curChunk属性不相等,则说明顺序有问题。如果要获取的块的索引超出了buffers缓冲的大小,这说明请求了一个超出范围的块
  • 2)将StreamState的curChunk加1,为下次接收请求做好准备
  • 3)从buffers缓冲中获取ManagedBuffer。如果buffers缓冲已经迭代到了末端,那么说明当前的块全部被客户端获取了,需要将streamId与对应的StreamState从streams中移除
  • 4)返回获取的ManagedBuffer

(4)registerStream方法用于向OneForOneStreamManager的streams缓存中注册流

public long registerStream(String appId, Iterator<ManagedBuffer> buffers) {
  long myStreamId = nextStreamId.getAndIncrement();
  streams.put(myStreamId, new StreamState(appId, buffers));
  return myStreamId;
}

registerStream方法首先生成一个新streamId,然后创建StreamState对象,最后将streamId与StreamState对象之间的映射关系放入streams中。

2.2 NettyBlockRpcServer的实现

介绍了NettyBlockRpcServer的内部组件OneForOneStreamManager,下面来看看NettyBlockRpcServer的实现

//org.apache.spark.network.netty.NettyBlockRpcServer
class NettyBlockRpcServer(
    appId: String,
    serializer: Serializer,
    blockManager: BlockDataManager)
  extends RpcHandler with Logging {
  private val streamManager = new OneForOneStreamManager()
  override def receive(
      client: TransportClient,
      rpcMessage: ByteBuffer,
      responseContext: RpcResponseCallback): Unit = {
    val message = BlockTransferMessage.Decoder.fromByteBuffer(rpcMessage)
    logTrace(s"Received request: $message")
    message match {
      case openBlocks: OpenBlocks => //打开(读取)Block
        val blocks: Seq[ManagedBuffer] =
          openBlocks.blockIds.map(BlockId.apply).map(blockManager.getBlockData)
        val streamId = streamManager.registerStream(appId, blocks.iterator.asJava)
        logTrace(s"Registered streamId $streamId with ${blocks.size} buffers")
        responseContext.onSuccess(new StreamHandle(streamId, blocks.size).toByteBuffer)
      case uploadBlock: UploadBlock => //上传Block
        val (level: StorageLevel, classTag: ClassTag[_]) = {
          serializer
            .newInstance()
            .deserialize(ByteBuffer.wrap(uploadBlock.metadata))
            .asInstanceOf[(StorageLevel, ClassTag[_])]
        }
        val data = new NioManagedBuffer(ByteBuffer.wrap(uploadBlock.blockData))
        val blockId = BlockId(uploadBlock.blockId)
        blockManager.putBlockData(blockId, data, level, classTag)
        responseContext.onSuccess(ByteBuffer.allocate(0))
    }
  }
  override def getStreamManager(): StreamManager = streamManager
}

根据上述代码,NettyBlockRpcServer实现了需要回复客户端的receive和getStreamManager两个方法,其中getStreamManager方法将返回OneForOneStreamManager。NettyBlockRpcServer的receive方法将分别接收以下两种消息。

(1)OpenBlocks:打开(读取)Block。其处理步骤:

  • 1)取出OpenBlocks消息携带的BlockId数组
  • 2)调用BlockManager的getBlockData方法,获取数组中每一个BlockId对应的Block(返回值为ManagedBuffer的序列)
  • 3)调用OneForOneStreamManager的registerStream方法,将ManagedBuffer序列注册到OneForOneStreamManager的streams缓存
  • 4)创建StreamHandler消息(包含streamId和ManagedBuffer序列的大小),并通过响应上下文回复客户端

(2)UploadBlock:上传Block。其处理步骤:

  • 1)对UploadBlock消息携带的元数据metadata进行序列化,得到存储级别(StorageLevel)和类型标记(上传Block的类型)
  • 2)将UploadBlock消息携带的Block数据(即blockData),封装为NioManagedBuffer
  • 3)获取UploadBlock消息携带的BlockId
  • 4)调用BlockManager的putBlockData方法,将Block存入本地存储体系
  • 5)通过响应上下文回复客户端

3 Shuffle客户端

根据前面的介绍,如果没有部署外部的Shuffle服务,即spark.shuffle.service.enabled属性为false时,NettyBlockTransferService不但通过OneForOneStreamManager与NettyBlockRpcServer对外提供Block上传与下载的服务,也将作为默认的Shuffle客户端。NettyBlockTransferService作为Shuffle客户端,具有发起上传和下载请求并接收服务端响应的能力。NettyBlockTransferService的两个方法——fetchBlocks和uploadBlock将具有此功能

3.1 发送下载远端Block的请求

NettyBlockTransferService的fetchBlocks方法的实现如下:

//org.apache.spark.network.netty.NettyBlockTransferService
override def fetchBlocks(
    host: String,
    port: Int,
    execId: String,
    blockIds: Array[String],
    listener: BlockFetchingListener): Unit = {
  logTrace(s"Fetch blocks from $host:$port (executor id $execId)")
  try {
    val blockFetchStarter = new RetryingBlockFetcher.BlockFetchStarter {
      override def createAndStart(blockIds: Array[String], listener: BlockFetchingListener) {
        val client = clientFactory.createClient(host, port)
        new OneForOneBlockFetcher(client, appId, execId, blockIds.toArray, listener).start()
      }
    }
    val maxRetries = transportConf.maxIORetries()
    if (maxRetries > 0) {//创建Block的重试线程
      new RetryingBlockFetcher(transportConf, blockFetchStarter, blockIds, listener).start()
    } else {
      blockFetchStarter.createAndStart(blockIds, listener)
    }
  } catch {
    case e: Exception =>
      logError("Exception while beginning fetchBlocks", e)
      blockIds.foreach(listener.onBlockFetchFailure(_, e))
  }
}

执行步骤如下:

  • 1)创建RetryingBlockFetcher.BlockFetchStarter的匿名实现类的实例blockFetchStarter,此匿名类实现了BlockFetchStarter接口的createAndStart方法
  • 2)获取spark.$module.io.maxRetries属性(NettyBlockTransferService的module为shuffle)的值作为下载请求的最大重试次数maxRetries
  • 3)只有配置的spark.shuffle.io.maxRetries属性大于0,maxRetries才有效,此时将创建RetryingBlockFetcher并调用RetryingBlockFetcher的start方法,否则直接调用blockFetchStarter的createAndStart方法

RetryingBlockFetcher的start方法只是调用 了fetchAllOutstanding方法

public void start() {
  fetchAllOutstanding();
}

fetchAllOutstanding方法的实现如下:

//org.apache.spark.network.shuffle.RetryingBlockFetcher
private void fetchAllOutstanding() {
  String[] blockIdsToFetch;
  int numRetries;
  RetryingBlockFetchListener myListener;
  synchronized (this) {
    blockIdsToFetch = outstandingBlocksIds.toArray(new String[outstandingBlocksIds.size()]);
    numRetries = retryCount;
    myListener = currentListener;
  }
  //启动对Block的获取,并可能对获取失败的Block进行重试
  try {
    fetchStarter.createAndStart(blockIdsToFetch, myListener);
  } catch (Exception e) {
    logger.error(String.format("Exception while beginning fetch of %s outstanding blocks %s",
      blockIdsToFetch.length, numRetries > 0 ? "(after " + numRetries + " retries)" : ""), e);
    if (shouldRetry(e)) {
      initiateRetry();
    } else {
      for (String bid : blockIdsToFetch) {
        listener.onBlockFetchFailure(bid, e);
      }
    }
  }
}

执行步骤如下:

  • 1)调用fetchStarter(即blockFetchStarter)的createAndStart方法。其中myListener为RetryingBlockFetchListener,RetryingBlockFetchListener是BlockFetchingListener的实现类。
  • 2)如果上一步执行时抛出了异常,则调用shoudRetry方法判断是否需要重试。shouldRetry方法的判断依据是:异常是IOException并且当前的重试次数retryCount小于最大重试次数maxRetries
  • 3)当需要重试时调用initiateRetry方法再次重试

根据对发送获取远端Block的请求的分析,无论是请求一次还是异步多次重试,最后都落实到调用blockFetchStarter的createAndStart方法。blockFetchStarter的createAndStart方法首先创TransportClient,然后创建OneForOneBLockFetcher并调用其start方法。

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值