spark-BlockManager源码学习

毛小聪

已于 2023-06-18 13:33:15 修改

阅读量589

点赞数

文章标签： spark 大数据分布式

于 2023-06-18 13:32:07 首次发布

本文链接：https://blog.csdn.net/weixin_45448034/article/details/131270462

版权

背景

问题

1）BlockManagerMaster和BlockManager的启动过程

初始化入口

BlockManagerMaster

NettyBlockTransferService

远程数据下载

远程数据上传

BlockManagerMasterEndpoint

3.维护者BlockInfoManager

背景

由Driver和Executor中的BlockManager组成，采用Master-Slave架构，通过SparkRPC进行通信

瞄一下大概的流程图

BlockManager是一个类，内部有一个成员变量BlockManagerMaster,他们属于主从结构

BlockManager存在于Executor和Driver当中，负责Block的操作

这个时候你会有疑问：block是什么来的，要怎么管理啊

BlockManagerMasterEndPoint和BlockManagerSlaveEndpoint又是啥吗？他们是主从结构，Master负责管理，Slaver负责工作，那BlockManagerMasterEndPoint仅仅存在于Driver中就可以理解了

当然我也想知道他们两个是如何进行主从合作的

组件	备注
BlockManager	负责对Block的管理，管理整个Spark运行时的数据读写
BlockManagerMaster	负责对BlockManager的管理和协调，代理BlockManager与Driver上的BlockManagerMasterEndpoint进行通信
BlockManagerMasterEndpoint	只存在于Driver上，由其向BlockManager下发指令
BlockManagerSlaveEndpoint	接受BlockManagerMasterEndpoint下发下来的命令，如：获取Block状态，根据BlockID获取Block，删除Block等

问题

居于上述的背景我的问题如下

1）BlockManagerMaster和BlockManager的启动过程

2）BlockManager如何将数据写入内存

3）BlockManager如何将数据写入本地磁盘

4）BlockManager如何读写远程数据

希望接下来能解决上面的几个问题

1）BlockManagerMaster和BlockManager的启动过程

BlockManager是一个类。

BlockManagerMaster也是一个类，而且是BlockManager的成员变量

初始化入口

driver在SparkContext中进行初始化BlockManager

executor中初始化BlockManager
接下来来围绕driver的初始化梳理BlockManager,在SparkEnv中的创建：BlockManagerMaster、NettyBlockTransferService、BlockManagerMasterEndpoint、BlockManager。

BlockManagerMaster

这么说BlockManagerMaster中包含两个RpcEndpointRef，分别是

//val DRIVER_ENDPOINT_NAME = "BlockManagerMaster",也就是driverEndpoint
rpcEnv.setupEndpoint("BlockManagerMaster", BlockManagerMasterEndpoint)


//val DRIVER_HEARTBEAT_ENDPOINT_NAME = "BlockManagerMasterHeartbeat"，也就是driverHeartbeatEndPoint
rpcEnv.setupEndpoint("BlockManagerMasterHeartbeat", BlockManagerMasterHeartbeatEndpoint)

那他在driver中创建的是

driver中name为
上述的BlockManagerMaster和BlockManagerMasterHeartbeat

executor中name为
host + ":" + port

NettyBlockTransferService

到底是干嘛的呢,看下面的解析，已经就是操作block的Service

而且是操作远程的数据的

初始化

看下图，关键的两个方法，下载block和上传block，底层是通过netty来实现数据的上传和下载

远程数据下载

override def fetchBlocks(
    //需要传入ip和port
      host: String,
      port: Int,
      execId: String,
      blockIds: Array[String],
      listener: BlockFetchingListener,
      tempFileManager: DownloadFileManager): Unit = {
    if (logger.isTraceEnabled) {
      logger.trace(s"Fetch blocks from $host:$port (executor id $execId)")
    }
    try {
      val maxRetries = transportConf.maxIORetries()
      //创建远程块数据下载的启动模块，并实现启动方法
      val blockFetchStarter = new RetryingBlockFetcher.BlockFetchStarter {
        override def createAndStart(blockIds: Array[String],
            listener: BlockFetchingListener): Unit = {
          try {
            //创建传输客户端，用于连接远程节点
            val client = clientFactory.createClient(host, port, maxRetries > 0)
            //启动一对一的数据块获取
            new OneForOneBlockFetcher(client, appId, execId, blockIds, listener,
              transportConf, tempFileManager).start()
          } catch {
            case e: IOException =>
              Try {
                driverEndPointRef.askSync[Boolean](IsExecutorAlive(execId))
              } match {
                case Success(v) if v == false =>
                  throw new ExecutorDeadException(s"The relative remote executor(Id: $execId)," +
                    " which maintains the block data to fetch is dead.")
                case _ => throw e
              }
          }
        }
      }

      if (maxRetries > 0) {
        // Note this Fetcher will correctly handle maxRetries == 0; we avoid it just in case there's
        // a bug in this code. We should remove the if statement once we're sure of the stability.
        new RetryingBlockFetcher(transportConf, blockFetchStarter, blockIds, listener).start()
      } else {
        blockFetchStarter.createAndStart(blockIds, listener)
      }
    } catch {
      case e: Exception =>
        logger.error("Exception while beginning fetchBlocks", e)
        blockIds.foreach(listener.onBlockFetchFailure(_, e))
    }
  }

org.apache.spark.network.shuffle.OneForOneBlockFetcher#start

一对一接收方法中就是向远程节点发送rpc请求获取数据，然后在回调函数中等待接收数据

public void start() {
	  //想远程节点发送rpc请求，并在回调函数中监听远程节点的响应
    client.sendRpc(message.toByteBuffer(), new RpcResponseCallback() {
      @Override
      public void onSuccess(ByteBuffer response) {
        try {
			//创建流处理器处理远程节点返回的数据
          streamHandle = (StreamHandle) BlockTransferMessage.Decoder.fromByteBuffer(response);
          logger.trace("Successfully opened blocks {}, preparing to fetch chunks.", streamHandle);

          // Immediately request all chunks -- we expect that the total size of the request is
          // reasonable due to higher level chunking in [[ShuffleBlockFetcherIterator]].
			// 遍历获取远程节点提供的block数据
          for (int i = 0; i < streamHandle.numChunks; i++) {
            if (downloadFileManager != null) {
              client.stream(OneForOneStreamManager.genStreamChunkId(streamHandle.streamId, i),
                new DownloadCallback(i));
            } else {
              client.fetchChunk(streamHandle.streamId, i, chunkCallback);
            }
          }
        } catch (Exception e) {
          logger.error("Failed while starting block fetches after success", e);
          failRemainingBlocks(blockIds, e);
        }
      }

      @Override
      public void onFailure(Throwable e) {
        logger.error("Failed while starting block fetches", e);
        failRemainingBlocks(blockIds, e);
      }
    });
  }

远程数据上传

org.apache.spark.network.netty.NettyBlockTransferService#uploadBlock

override def uploadBlock(
      hostname: String,
      port: Int,
      execId: String,
      blockId: BlockId,
      blockData: ManagedBuffer,
      level: StorageLevel,
      classTag: ClassTag[_]): Future[Unit] = {
    val result = Promise[Unit]()
    val client = clientFactory.createClient(hostname, port)

    // StorageLevel and ClassTag are serialized as bytes using our JavaSerializer.
    // Everything else is encoded using our binary protocol.
    // 序列化元数据
    val metadata = JavaUtils.bufferToArray(serializer.newInstance().serialize((level, classTag)))

    // We always transfer shuffle blocks as a stream for simplicity with the receiving code since
    // they are always written to disk. Otherwise we check the block size.
    // 如果上传的数据量超过一定量则通过流式处理器上传
    val asStream = (blockData.size() > conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM) ||
      blockId.isShuffle)
    //上传成功或者失败的回调函数
    val callback = new RpcResponseCallback {
      override def onSuccess(response: ByteBuffer): Unit = {
        if (logger.isTraceEnabled) {
          logger.trace(s"Successfully uploaded block $blockId${if (asStream) " as stream" else ""}")
        }
        result.success((): Unit)
      }

      override def onFailure(e: Throwable): Unit = {
        logger.error(s"Error while uploading $blockId${if (asStream) " as stream" else ""}", e)
        result.failure(e)
      }
    }
    //根据是否需要流处理进而走不通的逻辑
    if (asStream) {
      //如果是流式处理，则封装流处理器，然后分批上传
      val streamHeader = new UploadBlockStream(blockId.name, metadata).toByteBuffer
      client.uploadStream(new NioManagedBuffer(streamHeader), blockData, callback)
    } else {
      // 如果数据量比较小，则一次性传输完，而不需要分批处理
      // Convert or copy nio buffer into array in order to serialize it.
      val array = JavaUtils.bufferToArray(blockData.nioByteBuffer())

      client.sendRpc(new UploadBlock(appId, execId, blockId.name, metadata, array).toByteBuffer,
        callback)
    }

    result.future
  }

BlockManagerMasterEndpoint

先看看这个类中的属性

BlockManagerMasterEndpoint会根据是否是在Driver节点上进行注册获得对应的RpcRef

这么说setupEndpoint实际的作用是把一个name和一个RpcEndpoint 进行绑定，形成一个RpcEndpointRef（NettyRpcEndpointRef）

看看是如何绑定的

def registerRpcEndpoint(name: String, endpoint: RpcEndpoint): NettyRpcEndpointRef = {
    val addr = RpcEndpointAddress(nettyEnv.address, name)
    val endpointRef = new NettyRpcEndpointRef(nettyEnv.conf, addr, nettyEnv)
    synchronized {
      if (stopped) {
        throw new IllegalStateException("RpcEnv has been stopped")
      }
      if (endpoints.containsKey(name)) {
        throw new IllegalArgumentException(s"There is already an RpcEndpoint called $name")
      }

      // This must be done before assigning RpcEndpoint to MessageLoop, as MessageLoop sets Inbox be
      // active when registering, and endpointRef must be put into endpointRefs before onStart is
      // called.
      endpointRefs.put(endpoint, endpointRef)

      var messageLoop: MessageLoop = null
      try {
        messageLoop = endpoint match {
          case e: IsolatedRpcEndpoint =>
            new DedicatedMessageLoop(name, e, this)
          case _ =>
            sharedLoop.register(name, endpoint)
            sharedLoop
        }
        endpoints.put(name, messageLoop)
      } catch {
        case NonFatal(e) =>
          endpointRefs.remove(endpoint)
          throw e
      }
    }
    endpointRef
  }

发现新大陆

org.apache.spark.rpc.netty.NettyRpcEnv#setupEndpoint

org.apache.spark.rpc.netty.Dispatcher

这里面维护着一个Map，维护 endpointRefs.put(endpoint, endpointRef)

org.apache.spark.rpc.netty.Dispatcher#registerRpcEndpoint

这么说，还是通过一个endpoint到Dispatcher这个map中取到endpointRef(而这里面会包含endPoint的名字)

一开始我开以为 endPointRef和名字是唯一绑定的

NettyRpcEnv这个又是何方神圣啊

其实就是保存NettyRpc通讯时需要的一些属性

返回的是RpcEndpointRef，是一个引用，这是一个抽象类，它的实现类如下

这么说RpcEndpointRef要求子类要重写这些方法，实现一些RPC的操作

主要有地址（ip:port）rpc名字发送信息的功能，还有异步发送信息的等

看看RpcEndpointRef的子类NettyRpcEndpointRef

类构造方法需要的参数是sparkConf endpointAddress(ip：port :name) NettyRpcEnv

是的，这里主要实现父类的抽象方法。写信息，接受信息等,留意一下client: TransportClient = _ 难道发送信息主要靠这个类来传输吗

到这里为止，大概知道 Ref是啥了。一个绑定endpointAddress、可以进行NettyRPC发送信息的类而已

BlockManager

2）block的一些相关类

1. 标记BlockId

简单来说，可以通过BlockId的类型，判断这个block的类型

命名规则

2.块信息BlockInfo

划重点，这里classTag存储类型，用于序列化和反序列化

val tellMaster: Boolean block存储被改变时通知master，一般都为true，广播时因为数据不需要变动设置为false

3.维护者BlockInfoManager

4.存储级别StorageLevel

可以执行存储在哪些位置

同时也可以指定副本的数量

5.状态信息BlockStatus

封装向BlockManager查询Block返回信息

存储等级

占用内存大小

占用磁盘大小

是否在存储体系

6.数据及长度BlockResult

返回Block结果

数据

读取Block方法：Memory、Disk、Hadoop、Network

字节长度

7.转换可序列化数据BlockData

如何将Block转换成可序列化的数据

BlockData的三个实现类：ByteBufferBlockData、EncryptedBlockData、DiskBlockData

8.持久化Store

磁盘存储DiskStore
内存存储MemoryStore

3）DiskStore读写本地磁盘

使用DiskBlockManager来维护逻辑Block 和磁盘上的Block之间的映射
DiskBlockManager
获取文件方法介绍
e.spark.storage.DiskBlockManager#getFile
DiskStore写入方法介绍
org.apache.spark.storage.DiskStore#

put

putBytes

继续看下一个方法

  def put(blockId: BlockId)(writeFunc: WritableByteChannel => Unit): Unit = {
    //通过diskManager.getFile获取文件标识，如果存在抛异常
    if (contains(blockId)) {
      throw new IllegalStateException(s"Block $blockId is already present in the disk store")
    }
    logDebug(s"Attempting to put block $blockId")
    val startTime = System.currentTimeMillis 
         //重点一：获取要存储的文件对象，注意这块的File可以看做是一个逻辑上的文件，并没有数据写入呢
    val file = diskManager.getFile(blockId)   
        //创建文件输出流，封装输出流的channel并返回
    val out = new CountingWritableChannel(openForWrite(file)) 
    var threwException: Boolean = true
    try {
         //重点二：通过channel将数据写入file文件//写方法 writeFunc: WritableByteChannel => Unit 参考putBytes
      writeFunc(out)  
        //记录磁盘存储的block块信息
      blockSizes.put(blockId, out.getCount) //记录Block大小
      threwException = false
    } finally {
      try {
        out.close()
      } catch {
        case ioe: IOException =>
          if (!threwException) {
            threwException = true
            throw ioe
          }
      } finally {
         if (threwException) {
          remove(blockId)
        }
      }
    }
    val finishTime = System.currentTimeMillis
    logDebug("Block %s stored as %s file on disk in %d ms".format(
      file.getName,
      Utils.bytesToString(file.length()),
      finishTime - startTime))
  }

getBytes

通过diskManager获取blockId对应的文件信息和大小

 def getBytes(f: File, blockSize: Long): BlockData = securityManager.getIOEncryptionKey() match {
    case Some(key) =>
      // Encrypted blocks cannot be memory mapped; return a special object that does decryption
      // and provides InputStream / FileRegion implementations for reading the data.
      // 加密的块数据不直接返回，而是返回一个封装后的对象，其可以解密提取块中数据
      new EncryptedBlockData(f, blockSize, conf, key)

    case _ =>
      // 返回封装的磁盘数据块
      new DiskBlockData(minMemoryMapBytes, maxMemoryMapBytes, f, blockSize)
  }

可以看到这里返回的磁盘数据并不是最终的字节或者文件数据，而是一种封装的数据块对象

4）MemoryStore 读写内存

MemoryStore维护一个LinkedHashMap，里面是BlockId和MemoryEntry ，本身依赖于MemoryManager管理内存模型

putBytes 数据储存方法

putIteratorAsValues 存储非序列化的block

putIteratorAsBytes 将block对象序列化后存储

getBytes & getValues 获取内存数据方法

putBytes

def putBytes[T: ClassTag](
      blockId: BlockId,
      size: Long,
      memoryMode: MemoryMode,
      _bytes: () => ChunkedByteBuffer): Boolean = {
    //确认该block块数据尚未被存储过
    require(!contains(blockId), s"Block $blockId is already present in the MemoryStore")
    //重点一：通过内存管理器memoryManager申请指定大小的内存，如果申请到再进行存储操作，申请不到则直接返回false
    if (memoryManager.acquireStorageMemory(blockId, size, memoryMode)) {
      // We acquired enough memory for the block, so go ahead and put it
       //调用传入的参数方法，获取block的数据
      val bytes = _bytes()
        //确认获取到的数据大小和声明的大小相同
      assert(bytes.size == size)
        //重点二：将block数据封装成entry对象存储
      val entry = new SerializedMemoryEntry[T](bytes, memoryMode, implicitly[ClassTag[T]])
      entries.synchronized {
          //将entry对象存到LinkedHashMap集合中，即存到内存里
        entries.put(blockId, entry)
      }
      logInfo("Block %s stored as bytes in memory (estimated size %s, free %s)".format(
        blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed)))
      true
    } else {
      false
    }
  }

可以看到写代码很简单，不过有两个重点需要留意下，第一个就是内存管理器memoryManager模块，这是一个很重要的模块，其管理我们内存的使用与清理。虽然这里不细讲它，但它真的很重要。第二个重点是将数据封装成entry对象存储，entry有两个实现类，分别是SerializedMemoryEntry和DeserializedMemoryEntry，表示序列化和反序列化后的entry信息，从这可以看出，spark内存存储默认都是要序列化的，序列化后会放在LinkedHashMap集合

getBytes

def getBytes(blockId: BlockId): Option[ChunkedByteBuffer] = {
    //通过blockId从LinkedHashMap内存中获取entry对象
    val entry = entries.synchronized { entries.get(blockId) }
    entry match {
      case null => None
      case e: DeserializedMemoryEntry[_] =>
        throw new IllegalArgumentException("should only call getBytes on serialized blocks")
        //通过模式匹配，验证entry类型，并提取序列化类中的数据信息
      case SerializedMemoryEntry(bytes, _, _) => Some(bytes)
    }
  }

MemoryEntry

DeserializedMemoryEntry用于存储不需要序列化的对象

SerializedMemoryEntry用于存放序列化后的对象

参考链接：spark源码（六）spark如何通过BlockManager控制数据的读写_spark blockmanager_Interest1_wyt的博客-CSDN博客

毛小聪

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
spark-BlockManager源码学习

可以看到写代码很简单，不过有两个重点需要留意下，第一个就是内存管理器memoryManager模块，这是一个很重要的模块，其管理我们内存的使用与清理。第二个重点是将数据封装成entry对象存储，entry有两个实现类，分别是SerializedMemoryEntry和DeserializedMemoryEntry，表示序列化和反序列化后的entry信息，从这可以看出，spark内存存储默认都是要序列化的，序列化后会放在LinkedHashMap集合。是的，这里主要实现父类的抽象方法。
复制链接

扫一扫