BlockTransferService是BlockManager的子组件之一,抽象类BlockTransferService有个实现类NettyBlockTransferService,BlockManager实际采用了NettyBlockTransferService提供的Block传输服务。
为什么要把由Netty实现的网络服务组件也放到存储体系里,由于Spark是分布式部署的,每个Task(准确说是任务尝试)最终都运行在不同的机器节点上。map任务的输出结果直接存储到map任务所在机器的存储体系中,reduce任务极有可能不在同一机器上运行,所以需要远程下载map任务的中间输出。NettyBlockTransferService提供了可以被其它节点的客户端访问的Shuffle服务。
有了Shuffle的服务端,那么也需要相应的Shuffle客户端,以便当前节点将Block上传到其它节点或者从其它节点下载Block到本地。BlockManager中创建Shuffle客户端的代码如下:
//org.apache.spark.storage.BlockManager
private[spark] val shuffleClient = if (externalShuffleServiceEnabled) {
val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores)
new ExternalShuffleClient(transConf, securityManager, securityManager.isAuthenticationEnabled(),
securityManager.isSaslEncryptionEnabled())
} else {
blockTransferService
}
由代码可知,如果部署了外部的Shuffle服务,则需要配置spark.shuffle.service.enabled属性为true(此属性将决定externalShuffleServiceEnabled的值,默认是false),此时将创建ExternalShuffleClient。但在默认情况下,NettyBlockTransferService也会作为Shuffle的客户端
1 初始化NettyBlockTransferService
NettyBlockTransferService只有在其init方法被调用 ,即被初始化后才提供服务。根据块管理器BlockManager可知,BlockManager在初始化的时候,将调用NettyBlockTransferService的init方法
//org.apache.spark.network.netty.NettyBlockTransferService
override def init(blockDataManager: BlockDataManager): Unit = {
val rpcHandler = new NettyBlockRpcServer(conf.getAppId, serializer, blockDataManager)
var serverBootstrap: Option[TransportServerBootstrap] = None
var clientBootstrap: Option[TransportClientBootstrap] = None
if (authEnabled) {
serverBootstrap = Some(new SaslServerBootstrap(transportConf, securityManager))
clientBootstrap = Some(new SaslClientBootstrap(transportConf, conf.getAppId, securityManager,
securityManager.isSaslEncryptionEnabled()))
}
transportContext = new TransportContext(transportConf, rpcHandler)
clientFactory = transportContext.createClientFactory(clientBootstrap.toSeq.asJava)
server = createServer(serverBootstrap.toList)
appId = conf.getAppId
logInfo(s"Server created on ${hostName}:${server.getPort}")
}
其初始化步骤:
- 1)创建NettyBlockRpcServer,NettyBlockRpcServer继承了RpcHandler,服务端对客户端的Block读写请求的处理都交给了RpcHandler的实现类,因此NettyBlockRpcServer将处理Block块的RPC请求
- 2)准备服务引导程序TransportServerBootstrap和客户端引导程序TransportClientBootstrap
- 3)创建TransportContext
- 4)创建传输客户端工厂TransportClientFactory
- 5)创建TransportServer
- 6)获取当前应用的ID
2 NettyBlockRpcServer详解
下面重点来看看NettyBlockTransferService与NettyRpcEnv的最大区别——使用的RpcHandler实现类不同,NettyRpcEnv采用了NettyRpcHandler,而NettyBlockTransferService则采用了NettyBlockRpcServer
2.1 OneForOneStreamManager的实现
NettyBlockRpcServer中使用OneForOneStreamManager来提供一对一的流服务。OneForOneStreamManager实现StreamManager的registerChannel、getChunk、connectionTerminated、checkAuthorization、registerStream五个方法。根据文章《Spark内置的RPC框架》中第6段——服务端RpcHandler详解,可知TransportRequesthandler的processFetchRequest方法用到了StreamManager的checkAuthorization、registerChannel和getChunk三个方法,OneForOneStreamManager将处理ChunkFetchRequest类型的消息。
OneForOneStreamManager中使用StreamState来维护流的状态
//org.apache.spark.network.server.OneForOneStreamManager
private static class StreamState {
final String appId;
final Iterator<ManagedBuffer> buffers;
Channel associatedChannel = null;
int curChunk = 0;
StreamState(String appId, Iterator<ManagedBuffer> buffers) {
this.appId = appId;
this.buffers = Preconditions.checkNotNull(buffers);
}
}
StreamState包含以下属性:
- appId:请求流所属的应用程序ID。此属性只有在ExternalShuffleClient启用后才会用到
- buffers:ManagedBuffer的缓冲
- associatedChannel:与当前流相关联的Channel
- curChunk:为了保证客户端按顺序每次请求一个块,所以用此属性跟踪客户端当前接收到的ManagedBuffer的索引
有了对StreamState的了解,下面来看看OneForOneStreamManager的成员属性
- nextStreamId:用于生成数据流的标识,类型为AtomicLong
- streams:维护streamId与StreamState之间映射关系的缓存
下面学习OneForOneStreamManager中的方法
(1)checkAuthorization方法用以校验客户端是否有权限从给定的流中读取
//org.apache.spark.network.server.OneForOneStreamManager
@Override
public void checkAuthorization(TransportClient client, long streamId) {
if (client.getClientId() != null) {
StreamState state = streams.get(streamId);
Preconditions.checkArgument(state != null, "Unknown stream ID.");
if (!client.getClientId().equals(state.appId)) {
throw new SecurityException(String.format(
"Client %s not authorized to read stream %d (app %s).",
client.getClientId(),
streamId,
state.appId));
}
}
}
如果没有配置对管理进行SASL认证(《Spark内置的RPC框架》第7段——服务端引导程序TransportServerBootstrap,其中描述了服务的SalsServerBootstrap对管理进行SASL认证的内容),TransportClient的clientId为null,因而实际上并不走权限检查。当启用了SASL认证,客户端需要给TransportClient的clientId赋值,因此才会走此检查。checkAuthorization方法检查的办法很简单,即将TransportClient的clientId属性值与streamId对应的StreamState的appId的值进行相等比较。
(2)registerChannel方法用于注册管道,其实际的作用是将一个流和一条(只能是一条)客户端的TCP连接关联起来,这可以保证对于单个的流只会有一个客户端读取。流关闭之后就永远不能够重用了。
//org.apache.spark.network.server.OneForOneStreamManager
@Override
public void registerChannel(Channel channel, long streamId) {
if (streams.containsKey(streamId)) {
streams.get(streamId).associatedChannel = channel;
}
}
(3)getChunk方法用于获取单个的块(块被封装为ManagedBuffer)
@Override
public ManagedBuffer getChunk(long streamId, int chunkIndex) {
StreamState state = streams.get(streamId);//从streams中获取StreamState
if (chunkIndex != state.curChunk) {
throw new IllegalStateException(String.format(
"Received out-of-order chunk index %s (expected %s)", chunkIndex, state.curChunk));
} else if (!state.buffers.hasNext()) {
throw new IllegalStateException(String.format(
"Requested chunk index beyond end %s", chunkIndex));
}
state.curChunk += 1;//将StreamState的curChunk加1,为下次接收请求做好准备
ManagedBuffer nextChunk = state.buffers.next();//从buffers缓冲中获取ManagedBuffer
if (!state.buffers.hasNext()) {//buffers缓冲中的ManagedBuffer,已经全部被客户端获取
logger.trace("Removing stream id {}", streamId);
streams.remove(streamId);
}
return nextChunk;
}
执行步骤如下:
- 1)从streams中获取StreamState。如果要获取的块的索引与StreamState的curChunk属性不相等,则说明顺序有问题。如果要获取的块的索引超出了buffers缓冲的大小,这说明请求了一个超出范围的块
- 2)将StreamState的curChunk加1,为下次接收请求做好准备
- 3)从buffers缓冲中获取ManagedBuffer。如果buffers缓冲已经迭代到了末端,那么说明当前的块全部被客户端获取了,需要将streamId与对应的StreamState从streams中移除
- 4)返回获取的ManagedBuffer
(4)registerStream方法用于向OneForOneStreamManager的streams缓存中注册流
public long registerStream(String appId, Iterator<ManagedBuffer> buffers) {
long myStreamId = nextStreamId.getAndIncrement();
streams.put(myStreamId, new StreamState(appId, buffers));
return myStreamId;
}
registerStream方法首先生成一个新streamId,然后创建StreamState对象,最后将streamId与StreamState对象之间的映射关系放入streams中。
2.2 NettyBlockRpcServer的实现
介绍了NettyBlockRpcServer的内部组件OneForOneStreamManager,下面来看看NettyBlockRpcServer的实现
//org.apache.spark.network.netty.NettyBlockRpcServer
class NettyBlockRpcServer(
appId: String,
serializer: Serializer,
blockManager: BlockDataManager)
extends RpcHandler with Logging {
private val streamManager = new OneForOneStreamManager()
override def receive(
client: TransportClient,
rpcMessage: ByteBuffer,
responseContext: RpcResponseCallback): Unit = {
val message = BlockTransferMessage.Decoder.fromByteBuffer(rpcMessage)
logTrace(s"Received request: $message")
message match {
case openBlocks: OpenBlocks => //打开(读取)Block
val blocks: Seq[ManagedBuffer] =
openBlocks.blockIds.map(BlockId.apply).map(blockManager.getBlockData)
val streamId = streamManager.registerStream(appId, blocks.iterator.asJava)
logTrace(s"Registered streamId $streamId with ${blocks.size} buffers")
responseContext.onSuccess(new StreamHandle(streamId, blocks.size).toByteBuffer)
case uploadBlock: UploadBlock => //上传Block
val (level: StorageLevel, classTag: ClassTag[_]) = {
serializer
.newInstance()
.deserialize(ByteBuffer.wrap(uploadBlock.metadata))
.asInstanceOf[(StorageLevel, ClassTag[_])]
}
val data = new NioManagedBuffer(ByteBuffer.wrap(uploadBlock.blockData))
val blockId = BlockId(uploadBlock.blockId)
blockManager.putBlockData(blockId, data, level, classTag)
responseContext.onSuccess(ByteBuffer.allocate(0))
}
}
override def getStreamManager(): StreamManager = streamManager
}
根据上述代码,NettyBlockRpcServer实现了需要回复客户端的receive和getStreamManager两个方法,其中getStreamManager方法将返回OneForOneStreamManager。NettyBlockRpcServer的receive方法将分别接收以下两种消息。
(1)OpenBlocks:打开(读取)Block。其处理步骤:
- 1)取出OpenBlocks消息携带的BlockId数组
- 2)调用BlockManager的getBlockData方法,获取数组中每一个BlockId对应的Block(返回值为ManagedBuffer的序列)
- 3)调用OneForOneStreamManager的registerStream方法,将ManagedBuffer序列注册到OneForOneStreamManager的streams缓存
- 4)创建StreamHandler消息(包含streamId和ManagedBuffer序列的大小),并通过响应上下文回复客户端
(2)UploadBlock:上传Block。其处理步骤:
- 1)对UploadBlock消息携带的元数据metadata进行序列化,得到存储级别(StorageLevel)和类型标记(上传Block的类型)
- 2)将UploadBlock消息携带的Block数据(即blockData),封装为NioManagedBuffer
- 3)获取UploadBlock消息携带的BlockId
- 4)调用BlockManager的putBlockData方法,将Block存入本地存储体系
- 5)通过响应上下文回复客户端
3 Shuffle客户端
根据前面的介绍,如果没有部署外部的Shuffle服务,即spark.shuffle.service.enabled属性为false时,NettyBlockTransferService不但通过OneForOneStreamManager与NettyBlockRpcServer对外提供Block上传与下载的服务,也将作为默认的Shuffle客户端。NettyBlockTransferService作为Shuffle客户端,具有发起上传和下载请求并接收服务端响应的能力。NettyBlockTransferService的两个方法——fetchBlocks和uploadBlock将具有此功能
3.1 发送下载远端Block的请求
NettyBlockTransferService的fetchBlocks方法的实现如下:
//org.apache.spark.network.netty.NettyBlockTransferService
override def fetchBlocks(
host: String,
port: Int,
execId: String,
blockIds: Array[String],
listener: BlockFetchingListener): Unit = {
logTrace(s"Fetch blocks from $host:$port (executor id $execId)")
try {
val blockFetchStarter = new RetryingBlockFetcher.BlockFetchStarter {
override def createAndStart(blockIds: Array[String], listener: BlockFetchingListener) {
val client = clientFactory.createClient(host, port)
new OneForOneBlockFetcher(client, appId, execId, blockIds.toArray, listener).start()
}
}
val maxRetries = transportConf.maxIORetries()
if (maxRetries > 0) {//创建Block的重试线程
new RetryingBlockFetcher(transportConf, blockFetchStarter, blockIds, listener).start()
} else {
blockFetchStarter.createAndStart(blockIds, listener)
}
} catch {
case e: Exception =>
logError("Exception while beginning fetchBlocks", e)
blockIds.foreach(listener.onBlockFetchFailure(_, e))
}
}
执行步骤如下:
- 1)创建RetryingBlockFetcher.BlockFetchStarter的匿名实现类的实例blockFetchStarter,此匿名类实现了BlockFetchStarter接口的createAndStart方法
- 2)获取spark.$module.io.maxRetries属性(NettyBlockTransferService的module为shuffle)的值作为下载请求的最大重试次数maxRetries
- 3)只有配置的spark.shuffle.io.maxRetries属性大于0,maxRetries才有效,此时将创建RetryingBlockFetcher并调用RetryingBlockFetcher的start方法,否则直接调用blockFetchStarter的createAndStart方法
RetryingBlockFetcher的start方法只是调用 了fetchAllOutstanding方法
public void start() {
fetchAllOutstanding();
}
fetchAllOutstanding方法的实现如下:
//org.apache.spark.network.shuffle.RetryingBlockFetcher
private void fetchAllOutstanding() {
String[] blockIdsToFetch;
int numRetries;
RetryingBlockFetchListener myListener;
synchronized (this) {
blockIdsToFetch = outstandingBlocksIds.toArray(new String[outstandingBlocksIds.size()]);
numRetries = retryCount;
myListener = currentListener;
}
//启动对Block的获取,并可能对获取失败的Block进行重试
try {
fetchStarter.createAndStart(blockIdsToFetch, myListener);
} catch (Exception e) {
logger.error(String.format("Exception while beginning fetch of %s outstanding blocks %s",
blockIdsToFetch.length, numRetries > 0 ? "(after " + numRetries + " retries)" : ""), e);
if (shouldRetry(e)) {
initiateRetry();
} else {
for (String bid : blockIdsToFetch) {
listener.onBlockFetchFailure(bid, e);
}
}
}
}
执行步骤如下:
- 1)调用fetchStarter(即blockFetchStarter)的createAndStart方法。其中myListener为RetryingBlockFetchListener,RetryingBlockFetchListener是BlockFetchingListener的实现类。
- 2)如果上一步执行时抛出了异常,则调用shoudRetry方法判断是否需要重试。shouldRetry方法的判断依据是:异常是IOException并且当前的重试次数retryCount小于最大重试次数maxRetries
- 3)当需要重试时调用initiateRetry方法再次重试
根据对发送获取远端Block的请求的分析,无论是请求一次还是异步多次重试,最后都落实到调用blockFetchStarter的createAndStart方法。blockFetchStarter的createAndStart方法首先创TransportClient,然后创建OneForOneBLockFetcher并调用其start方法。