目录
1)BlockManagerMaster和BlockManager的启动过程
背景
-
由Driver和Executor中的BlockManager组成,采用Master-Slave架构,通过SparkRPC进行通信
-
瞄一下大概的流程图
BlockManager是一个类,内部有一个成员变量BlockManagerMaster,他们属于主从结构
BlockManager存在于Executor和Driver当中,负责Block的操作
这个时候你会有疑问:block是什么来的,要怎么管理啊
BlockManagerMasterEndPoint和BlockManagerSlaveEndpoint又是啥吗? 他们是主从结构,Master负责管理,Slaver负责工作,那BlockManagerMasterEndPoint仅仅存在于Driver中就可以理解了
当然我也想知道他们两个是如何进行主从合作的
组件 备注 BlockManager 负责对Block的管理,管理整个Spark运行时的数据读写 BlockManagerMaster 负责对BlockManager的管理和协调,代理BlockManager与Driver上的BlockManagerMasterEndpoint进行通信 BlockManagerMasterEndpoint 只存在于Driver上,由其向BlockManager下发指令 BlockManagerSlaveEndpoint 接受BlockManagerMasterEndpoint下发下来的命令,如:获取Block状态,根据BlockID获取Block,删除Block等
问题
居于上述的背景我的问题如下
1)BlockManagerMaster和BlockManager的启动过程
2)BlockManager如何将数据写入内存
3)BlockManager如何将数据写入本地磁盘
4)BlockManager如何读写远程数据
希望接下来能解决上面的几个问题
1)BlockManagerMaster和BlockManager的启动过程
BlockManager是一个类。
BlockManagerMaster也是一个类,而且是BlockManager的成员变量
初始化入口
-
driver在SparkContext中进行初始化BlockManager
-
executor中初始化BlockManager
-
接下来来围绕driver的初始化梳理BlockManager,在SparkEnv中的创建:BlockManagerMaster、NettyBlockTransferService、BlockManagerMasterEndpoint、BlockManager。
BlockManagerMaster
这么说BlockManagerMaster中包含两个RpcEndpointRef,分别是
//val DRIVER_ENDPOINT_NAME = "BlockManagerMaster",也就是driverEndpoint
rpcEnv.setupEndpoint("BlockManagerMaster", BlockManagerMasterEndpoint)
//val DRIVER_HEARTBEAT_ENDPOINT_NAME = "BlockManagerMasterHeartbeat",也就是driverHeartbeatEndPoint
rpcEnv.setupEndpoint("BlockManagerMasterHeartbeat", BlockManagerMasterHeartbeatEndpoint)
那他在driver中创建的是
driver中name为
上述的BlockManagerMaster和BlockManagerMasterHeartbeat
executor中name为
host + ":" + port
NettyBlockTransferService
到底是干嘛的呢,看下面的解析,已经就是操作block的Service
而且是操作远程的数据的
初始化
看下图,关键的两个方法,下载block和上传block,底层是通过netty来实现数据的上传和下载
远程数据下载
override def fetchBlocks(
//需要传入ip和port
host: String,
port: Int,
execId: String,
blockIds: Array[String],
listener: BlockFetchingListener,
tempFileManager: DownloadFileManager): Unit = {
if (logger.isTraceEnabled) {
logger.trace(s"Fetch blocks from $host:$port (executor id $execId)")
}
try {
val maxRetries = transportConf.maxIORetries()
//创建远程块数据下载的启动模块,并实现启动方法
val blockFetchStarter = new RetryingBlockFetcher.BlockFetchStarter {
override def createAndStart(blockIds: Array[String],
listener: BlockFetchingListener): Unit = {
try {
//创建传输客户端,用于连接远程节点
val client = clientFactory.createClient(host, port, maxRetries > 0)
//启动一对一的数据块获取
new OneForOneBlockFetcher(client, appId, execId, blockIds, listener,
transportConf, tempFileManager).start()
} catch {
case e: IOException =>
Try {
driverEndPointRef.askSync[Boolean](IsExecutorAlive(execId))
} match {
case Success(v) if v == false =>
throw new ExecutorDeadException(s"The relative remote executor(Id: $execId)," +
" which maintains the block data to fetch is dead.")
case _ => throw e
}
}
}
}
if (maxRetries > 0) {
// Note this Fetcher will correctly handle maxRetries == 0; we avoid it just in case there's
// a bug in this code. We should remove the if statement once we're sure of the stability.
new RetryingBlockFetcher(transportConf, blockFetchStarter, blockIds, listener).start()
} else {
blockFetchStarter.createAndStart(blockIds, listener)
}
} catch {
case e: Exception =>
logger.error("Exception while beginning fetchBlocks", e)
blockIds.foreach(listener.onBlockFetchFailure(_, e))
}
}
org.apache.spark.network.shuffle.OneForOneBlockFetcher#start
一对一接收方法中就是向远程节点发送rpc请求获取数据,然后在回调函数中等待接收数据
public void start() {
//想远程节点发送rpc请求,并在回调函数中监听远程节点的响应
client.sendRpc(message.toByteBuffer(), new RpcResponseCallback() {
@Override
public void onSuccess(ByteBuffer response) {
try {
//创建流处理器处理远程节点返回的数据
streamHandle = (StreamHandle) BlockTransferMessage.Decoder.fromByteBuffer(response);
logger.trace("Successfully opened blocks {}, preparing to fetch chunks.", streamHandle);
// Immediately request all chunks -- we expect that the total size of the request is
// reasonable due to higher level chunking in [[ShuffleBlockFetcherIterator]].
// 遍历获取远程节点提供的block数据
for (int i = 0; i < streamHandle.numChunks; i++) {
if (downloadFileManager != null) {
client.stream(OneForOneStreamManager.genStreamChunkId(streamHandle.streamId, i),
new DownloadCallback(i));
} else {
client.fetchChunk(streamHandle.streamId, i, chunkCallback);
}
}
} catch (Exception e) {
logger.error("Failed while starting block fetches after success", e);
failRemainingBlocks(blockIds, e);
}
}
@Override
public void onFailure(Throwable e) {
logger.error("Failed while starting block fetches", e);
failRemainingBlocks(blockIds, e);
}
});
}
远程数据上传
org.apache.spark.network.netty.NettyBlockTransferService#uploadBlock
override def uploadBlock(
hostname: String,
port: Int,
execId: String,
blockId: BlockId,
blockData: ManagedBuffer,
level: StorageLevel,
classTag: ClassTag[_]): Future[Unit] = {
val result = Promise[Unit]()
val client = clientFactory.createClient(hostname, port)
// StorageLevel and ClassTag are serialized as bytes using our JavaSerializer.
// Everything else is encoded using our binary protocol.
// 序列化元数据
val metadata = JavaUtils.bufferToArray(serializer.newInstance().serialize((level, classTag)))
// We always transfer shuffle blocks as a stream for simplicity with the receiving code since
// they are always written to disk. Otherwise we check the block size.
// 如果上传的数据量超过一定量则通过流式处理器上传
val asStream = (blockData.size() > conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM) ||
blockId.isShuffle)
//上传成功或者失败的回调函数
val callback = new RpcResponseCallback {
override def onSuccess(response: ByteBuffer): Unit = {
if (logger.isTraceEnabled) {
logger.trace(s"Successfully uploaded block $blockId${if (asStream) " as stream" else ""}")
}
result.success((): Unit)
}
override def onFailure(e: Throwable): Unit = {
logger.error(s"Error while uploading $blockId${if (asStream) " as stream" else ""}", e)
result.failure(e)
}
}
//根据是否需要流处理进而走不通的逻辑
if (asStream) {
//如果是流式处理,则封装流处理器,然后分批上传
val streamHeader = new UploadBlockStream(blockId.name, metadata).toByteBuffer
client.uploadStream(new NioManagedBuffer(streamHeader), blockData, callback)
} else {
// 如果数据量比较小,则一次性传输完,而不需要分批处理
// Convert or copy nio buffer into array in order to serialize it.
val array = JavaUtils.bufferToArray(blockData.nioByteBuffer())
client.sendRpc(new UploadBlock(appId, execId, blockId.name, metadata, array).toByteBuffer,
callback)
}
result.future
}
BlockManagerMasterEndpoint
先看看这个类中的属性
-
BlockManagerMasterEndpoint会根据是否是在Driver节点上进行注册获得对应的RpcRef
这么说setupEndpoint实际的作用是把 一个name和 一个RpcEndpoint 进行绑定,形成一个RpcEndpointRef(NettyRpcEndpointRef)
看看是如何绑定的
def registerRpcEndpoint(name: String, endpoint: RpcEndpoint): NettyRpcEndpointRef = {
val addr = RpcEndpointAddress(nettyEnv.address, name)
val endpointRef = new NettyRpcEndpointRef(nettyEnv.conf, addr, nettyEnv)
synchronized {
if (stopped) {
throw new IllegalStateException("RpcEnv has been stopped")
}
if (endpoints.containsKey(name)) {
throw new IllegalArgumentException(s"There is already an RpcEndpoint called $name")
}
// This must be done before assigning RpcEndpoint to MessageLoop, as MessageLoop sets Inbox be
// active when registering, and endpointRef must be put into endpointRefs before onStart is
// called.
endpointRefs.put(endpoint, endpointRef)
var messageLoop: MessageLoop = null
try {
messageLoop = endpoint match {
case e: IsolatedRpcEndpoint =>
new DedicatedMessageLoop(name, e, this)
case _ =>
sharedLoop.register(name, endpoint)
sharedLoop
}
endpoints.put(name, messageLoop)
} catch {
case NonFatal(e) =>
endpointRefs.remove(endpoint)
throw e
}
}
endpointRef
}
发现新大陆
org.apache.spark.rpc.netty.NettyRpcEnv#setupEndpoint
org.apache.spark.rpc.netty.Dispatcher
这里面维护着一个Map,维护 endpointRefs.put(endpoint, endpointRef)
org.apache.spark.rpc.netty.Dispatcher#registerRpcEndpoint
这么说,还是通过一个endpoint到Dispatcher这个map中取到endpointRef(而这里面会包含endPoint的名字)
一开始我开以为 endPointRef和名字是唯一绑定的
NettyRpcEnv这个又是何方神圣啊
其实就是保存NettyRpc通讯时需要的一些属性
返回的是RpcEndpointRef,是一个引用,这是一个抽象类,它的实现类如下
这么说RpcEndpointRef要求子类要重写这些方法,实现一些RPC的操作
主要有 地址(ip:port)rpc名字 发送信息的功能,还有异步发送信息的等
看看RpcEndpointRef的子类NettyRpcEndpointRef
类构造方法需要的参数是sparkConf endpointAddress(ip:port :name) NettyRpcEnv
是的,这里主要实现父类的抽象方法。写信息,接受信息等,留意一下client: TransportClient = _ 难道发送信息主要靠这个类来传输吗
到这里为止,大概知道 Ref是啥了。一个绑定endpointAddress、可以进行NettyRPC发送信息的 类而已
BlockManager
2)block的一些相关类
1. 标记BlockId
简单来说,可以通过BlockId的类型,判断这个block的类型
-
命名规则
2.块信息BlockInfo
划重点,这里classTag存储类型,用于序列化和反序列化
val tellMaster: Boolean block存储被改变时通知master,一般都为true,广播时因为数据不需要变动设置为false
3.维护者BlockInfoManager
4.存储级别StorageLevel
可以执行存储在哪些位置
同时也可以指定副本的数量
5.状态信息BlockStatus
封装向BlockManager查询Block返回信息
存储等级
占用内存大小
占用磁盘大小
是否在存储体系
6.数据及长度BlockResult
返回Block结果
数据
读取Block方法:Memory、Disk、Hadoop、Network
字节长度
7.转换可序列化数据BlockData
如何将Block转换成可序列化的数据
BlockData的三个实现类:ByteBufferBlockData、EncryptedBlockData、DiskBlockData
8.持久化Store
-
磁盘存储DiskStore
-
内存存储MemoryStore
3)DiskStore读写本地磁盘
-
使用DiskBlockManager来维护 逻辑Block 和 磁盘上的Block之间的映射
-
DiskBlockManager
-
获取文件方法介绍
-
e.spark.storage.DiskBlockManager#getFile
-
DiskStore写入方法介绍
-
org.apache.spark.storage.DiskStore#
put
putBytes
继续看下一个方法
def put(blockId: BlockId)(writeFunc: WritableByteChannel => Unit): Unit = {
//通过diskManager.getFile获取文件标识,如果存在抛异常
if (contains(blockId)) {
throw new IllegalStateException(s"Block $blockId is already present in the disk store")
}
logDebug(s"Attempting to put block $blockId")
val startTime = System.currentTimeMillis
//重点一:获取要存储的文件对象,注意这块的File可以看做是一个逻辑上的文件,并没有数据写入呢
val file = diskManager.getFile(blockId)
//创建文件输出流,封装输出流的channel并返回
val out = new CountingWritableChannel(openForWrite(file))
var threwException: Boolean = true
try {
//重点二:通过channel将数据写入file文件//写方法 writeFunc: WritableByteChannel => Unit 参考putBytes
writeFunc(out)
//记录磁盘存储的block块信息
blockSizes.put(blockId, out.getCount) //记录Block大小
threwException = false
} finally {
try {
out.close()
} catch {
case ioe: IOException =>
if (!threwException) {
threwException = true
throw ioe
}
} finally {
if (threwException) {
remove(blockId)
}
}
}
val finishTime = System.currentTimeMillis
logDebug("Block %s stored as %s file on disk in %d ms".format(
file.getName,
Utils.bytesToString(file.length()),
finishTime - startTime))
}
getBytes
通过diskManager获取blockId对应的文件信息和大小
def getBytes(f: File, blockSize: Long): BlockData = securityManager.getIOEncryptionKey() match {
case Some(key) =>
// Encrypted blocks cannot be memory mapped; return a special object that does decryption
// and provides InputStream / FileRegion implementations for reading the data.
// 加密的块数据不直接返回,而是返回一个封装后的对象,其可以解密提取块中数据
new EncryptedBlockData(f, blockSize, conf, key)
case _ =>
// 返回封装的磁盘数据块
new DiskBlockData(minMemoryMapBytes, maxMemoryMapBytes, f, blockSize)
}
可以看到这里返回的磁盘数据并不是最终的字节或者文件数据,而是一种封装的数据块对象
4)MemoryStore 读写内存
MemoryStore维护一个LinkedHashMap,里面是BlockId和MemoryEntry ,本身依赖于MemoryManager管理内存模型
putBytes 数据储存方法
putIteratorAsValues 存储非序列化的block
putIteratorAsBytes 将block对象序列化后存储
getBytes & getValues 获取内存数据方法
putBytes
def putBytes[T: ClassTag](
blockId: BlockId,
size: Long,
memoryMode: MemoryMode,
_bytes: () => ChunkedByteBuffer): Boolean = {
//确认该block块数据尚未被存储过
require(!contains(blockId), s"Block $blockId is already present in the MemoryStore")
//重点一:通过内存管理器memoryManager申请指定大小的内存,如果申请到再进行存储操作,申请不到则直接返回false
if (memoryManager.acquireStorageMemory(blockId, size, memoryMode)) {
// We acquired enough memory for the block, so go ahead and put it
//调用传入的参数方法,获取block的数据
val bytes = _bytes()
//确认获取到的数据大小和声明的大小相同
assert(bytes.size == size)
//重点二:将block数据封装成entry对象存储
val entry = new SerializedMemoryEntry[T](bytes, memoryMode, implicitly[ClassTag[T]])
entries.synchronized {
//将entry对象存到LinkedHashMap集合中,即存到内存里
entries.put(blockId, entry)
}
logInfo("Block %s stored as bytes in memory (estimated size %s, free %s)".format(
blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed)))
true
} else {
false
}
}
可以看到写代码很简单,不过有两个重点需要留意下,第一个就是内存管理器memoryManager模块,这是一个很重要的模块,其管理我们内存的使用与清理。虽然这里不细讲它,但它真的很重要。第二个重点是将数据封装成entry对象存储,entry有两个实现类,分别是SerializedMemoryEntry和DeserializedMemoryEntry,表示序列化和反序列化后的entry信息,从这可以看出,spark内存存储默认都是要序列化的,序列化后会放在LinkedHashMap集合
getBytes
def getBytes(blockId: BlockId): Option[ChunkedByteBuffer] = {
//通过blockId从LinkedHashMap内存中获取entry对象
val entry = entries.synchronized { entries.get(blockId) }
entry match {
case null => None
case e: DeserializedMemoryEntry[_] =>
throw new IllegalArgumentException("should only call getBytes on serialized blocks")
//通过模式匹配,验证entry类型,并提取序列化类中的数据信息
case SerializedMemoryEntry(bytes, _, _) => Some(bytes)
}
}
-
MemoryEntry
DeserializedMemoryEntry用于存储不需要序列化的对象
SerializedMemoryEntry用于存放序列化后的对象
参考链接:spark源码(六)spark如何通过BlockManager控制数据的读写_spark blockmanager_Interest1_wyt的博客-CSDN博客