BlockManagerMaster的作用是对存在于Executor或Driver上的BlockManager进行统一管理。Executor与Driver关于BlockManager的交互都依赖于BlockManagerMaster,比如Executor需要向Driver发送注册BlockManager、更新Executor上Block的最新信息、询问所需要Block目前所在的位置及当Executor运行结束需要将此Executor移除等。但是Driver与Executor却位于不同机器中,该怎么实现呢?
在Spark执行环境一节中有介绍过,Driver上的BlockManagerMaster会实例化并且注册BlockManagerMasterEndpoint。无论是Driver还是Executor,它们的BlockManagerMaster的driverEndpoint属性都将持有BlockManagerMasterEndpoint的RpcEndpiointRef。无论是Driver还是Executor,每个BlockManager都拥有自己的BlockManagerSlaveEndpoint,且BlockManager的slaveEndpoint属性保存着各自BlockManagerSlaveEndpoint的RpcEndpointRef。BlockManagerMaster负责发送消息,BlockManagerMasterEndpoint负责消息的接收与处理,BlockManagerSlaveEndpoint则接收BlockManagerMasterEndpoint下发的命令。
1 BlockManagerMaster的职责
BlockManagerMaster负责发送各种与存储体系相关的信息,这些消息的类型如下:
- RemoveExecutor(移除Executor)
- RegisterBlockManager(注册BlockManager)
- UpdateBlockInfo(更新Block信息)
- GetLocations(获取Block的位置)
- GetLocationsMultipleBlockIds(获取多个Block的位置)
- GetPeers(获取其它BlockManager的BlockManagerId)
- GetExecutorEndpointRef(获取Executor的EndpointRef引用)
- RemoveBlock(移除Block)
- RemoveRdd(移除Rdd Block)
- RemoveShuffle(移除Shuffle Block)
- RemoveBroadcast(移除Broadcast Block)
- GetMemoryStatus(获取指定的BlockManager的内存状态)
- GetStorageStatus(获取存储状态)
- GetMatchingBlockIds(获取匹配过滤条件的Block)
- HasCachedBlocks(指定的Executor上是否有缓存的Block)
- StopBlockManagerMaster(停止BlockManagerMaster)
可以看到,BlockManagerMaster能够发送的消息类型多种多样,为了更容易理解,以RegisterBlockManager为例:
//org.apache.spark.storage.BlockManagerMaster
def registerBlockManager(
blockManagerId: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef): Unit = {
logInfo(s"Registering BlockManager $blockManagerId")
tell(RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint))
logInfo(s"Registered BlockManager $blockManagerId")
}
根据代码,注册BlockManager的实质是向BlockManagerMasterEndpoint发送RegisterBlockManager消息,RegisterBlockManager将携带要注册的BlockManager的blockManagerId、最大内存大小及slaveEndpoint(即BlockManagerSlaveEndpoint)
2 BlockManagerMasterEndpoint详解
BlockManagerMasterEndpoint接收Driver或Executor上BlockManagerMaster发送的消息,对所有的BlockManager统一管理BlockManager的属性,这些属性的作用如下:
- blockManagerInfo:BlockManagerId与BlockManagerInfo之间映射关系的缓存
- blockManagerIdByExecutor:Executor ID 与 BlockManagerId之间映射关系的缓存
- blockLocations:BlockId与存储了此BlockId对应Block的BlockManager的BlockManagerId之间的一对多关系缓存
- topologyMapper:对集群所有节点的拓扑结构的映射
BlockManagerMasterEndpoint重写了特质RpcEndpoint的receiveAndReply方法,用于接收BlockManager相关的消息:
//org.apache.spark.storage.BlockManagerMasterEndpoint
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint) =>
register(blockManagerId, maxMemSize, slaveEndpoint)
context.reply(true)
case _updateBlockInfo @
UpdateBlockInfo(blockManagerId, blockId, storageLevel, deserializedSize, size) =>
context.reply(updateBlockInfo(blockManagerId, blockId, storageLevel, deserializedSize, size))
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
case GetLocations(blockId) =>
context.reply(getLocations(blockId))
case GetLocationsMultipleBlockIds(blockIds) =>
context.reply(getLocationsMultipleBlockIds(blockIds))
case GetPeers(blockManagerId) =>
context.reply(getPeers(blockManagerId))
case GetExecutorEndpointRef(executorId) =>
context.reply(getExecutorEndpointRef(executorId))
case GetMemoryStatus =>
context.reply(memoryStatus)
case GetStorageStatus =>
context.reply(storageStatus)
case GetBlockStatus(blockId, askSlaves) =>
context.reply(blockStatus(blockId, askSlaves))
case GetMatchingBlockIds(filter, askSlaves) =>
context.reply(getMatchingBlockIds(filter, askSlaves))
case RemoveRdd(rddId) =>
context.reply(removeRdd(rddId))
case RemoveShuffle(shuffleId) =>
context.reply(removeShuffle(shuffleId))
case RemoveBroadcast(broadcastId, removeFromDriver) =>
context.reply(removeBroadcast(broadcastId, removeFromDriver))
case RemoveBlock(blockId) =>
removeBlockFromWorkers(blockId)
context.reply(true)
case RemoveExecutor(execId) =>
removeExecutor(execId)
context.reply(true)
case StopBlockManagerMaster =>
context.reply(true)
stop()
case BlockManagerHeartbeat(blockManagerId) =>
context.reply(heartbeatReceived(blockManagerId))
case HasCachedBlocks(executorId) =>
blockManagerIdByExecutor.get(executorId) match {
case Some(bm) =>
if (blockManagerInfo.contains(bm)) {
val bmInfo = blockManagerInfo(bm)
context.reply(bmInfo.cachedBlocks.nonEmpty)
} else {
context.reply(false)
}
case None => context.reply(false)
}
}
BlockManagerMasterEndpoint接收的消息类型正好与BlockManagerMaster所发送的消息一一对应。选取RegisterBlockManager消息来介绍BlockManagerMasterEndpoint是如何接收和处理RegisterBlockManager消息。
private def register(id: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef) {
val time = System.currentTimeMillis()
if (!blockManagerInfo.contains(id)) {
blockManagerIdByExecutor.get(id.executorId) match {
case Some(oldId) =>
// A block manager of the same executor already exists, so remove it (assumed dead)
logError("Got two different block manager registrations on same executor - "
+ s" will replace old one $oldId with new one $id")
removeExecutor(id.executorId)
case None =>
}
logInfo("Registering block manager %s with %s RAM, %s".format(
id.hostPort, Utils.bytesToString(maxMemSize), id))
blockManagerIdByExecutor(id.executorId) = id
blockManagerInfo(id) = new BlockManagerInfo(
id, System.currentTimeMillis(), maxMemSize, slaveEndpoint)
}
listenerBus.post(SparkListenerBlockManagerAdded(time, id, maxMemSize))
}
执行步骤如下:
- 1)如果BlockManagerInfo缓存中不存在此BlockManagerId,则进入第5步
- 2)移除blockManagerIdByExecutor、blockManagerInfo、blockLocations等缓存中与此BlockManagerId有关的所有缓存信息。
- 3)将executorId与传入的BlockManagerId的对应关系添加到blockManagerIdByExecutor中
- 4)将BlockManagerId与BlockManagerInfo的对应关系添加到缓存blockManagerInfo
- 5)向listenerBus投递SparkListenerBlockManagerAdded类型的事件。
3 BlockManagerSlaveEndpoint详解
BlockManagerSlaveEndpoint用于接收BlockManagerMasterEndpoint的命令并执行相应的操作。BlockManagerSlaveEndpoint也重写了RpcEndpoint的receiveAndReply方法
//org.apache.spark.storage.BlockManagerSlaveEndpoint
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case RemoveBlock(blockId) =>
doAsync[Boolean]("removing block " + blockId, context) {
blockManager.removeBlock(blockId)
true
}
case RemoveRdd(rddId) =>
doAsync[Int]("removing RDD " + rddId, context) {
blockManager.removeRdd(rddId)
}
case RemoveShuffle(shuffleId) =>
doAsync[Boolean]("removing shuffle " + shuffleId, context) {
if (mapOutputTracker != null) {
mapOutputTracker.unregisterShuffle(shuffleId)
}
SparkEnv.get.shuffleManager.unregisterShuffle(shuffleId)
}
case RemoveBroadcast(broadcastId, _) =>
doAsync[Int]("removing broadcast " + broadcastId, context) {
blockManager.removeBroadcast(broadcastId, tellMaster = true)
}
case GetBlockStatus(blockId, _) =>
context.reply(blockManager.getStatus(blockId))
case GetMatchingBlockIds(filter, _) =>
context.reply(blockManager.getMatchingBlockIds(filter))
case TriggerThreadDump =>
context.reply(Utils.getThreadDump())
}
以Driver删除Block为例来介绍BlockManagerSlaveEndpoint的作用。Driver节点删除Block依赖于BlockManagerMaster提供的removeBlock方法
//org.apache.spark.storage.BlockManagerMaster
def removeBlock(blockId: BlockId) {
driverEndpoint.askWithRetry[Boolean](RemoveBlock(blockId))
}
根据代码,BlockManagerMaster提供的removeBlock方法实际向BlockManagerMasterEndpoint发送RemoveBlock类型的消息。根据第二段BlockManagerMasterEndpoint详解的代码,BlockManagerMasterEndpoint的receiveAndReply方法的实现,BlockManagerMasterEndpoint首先调用removeBlockFromWorkers,然后向客户端回复true。
//org.apache.spark.storage.BlockManagerMasterEndpoint
private def removeBlockFromWorkers(blockId: BlockId) {
val locations = blockLocations.get(blockId)
if (locations != null) {
locations.foreach { blockManagerId: BlockManagerId =>
val blockManager = blockManagerInfo.get(blockManagerId)
if (blockManager.isDefined) {
blockManager.get.slaveEndpoint.ask[Boolean](RemoveBlock(blockId))
}
}
}
}
- 1)从缓存blockLocations中获取BlockId所对应Block实际存储的位置集合locations
- 2)遍历locations,向每个节点的BlockManagerSlaveEndpoint发送RemoveBlock消息
BlockManagerSlaveEndpoint接收到RemoveBlock消息,将调用BlockManager的removeBlock删除Block。