spark-core_30:Executor初始化过程env.blockManager.initialize(conf.getAppId)源码分析

29 篇文章 4 订阅

在(spark-core_28及spark-core_29:Executor初始化过程env.blockManager.initialize(conf.getAppId)-NettyBlockTransferService.init()源码分析)分析了NettyBlockTransferService.init()做了如下四件事

/**NettyBlockTransferService.init(this)做了如下事情:
  1.创建RpcServer:NettyBlockRpcServer,为每个请求打开或上传注册在BlockManager中的任意Block块,每一次Chunk的传输相当于一次shuffle;
  2.构建TransportContext:TransportContext:包含创建{TransportServer:nettyServer},{TransportClientFactory用来创建TransportClient}的上下文,并使用{TransportChannelHandler}设置Netty Channel管道。
  3.客户端工厂TransportClientFactory:这个工厂实例通过使用createClient创建客户端{TransportClient},这个工厂实例维护一个到其他主机的连接池,并应为相同的远程主机返回相同的TransportClient。 它还为所有TransportClient共享单个线程池。
  4.创建Netty服务器TransportServer:包括编解码,还有入站事件都加到TransportServer这个nettySever中(上个各个类是围绕NettyServer的干活的)
  */

1,接着blockManager.initialize往后执行

def initialize(appId: String): Unit = {
 
//SparkEnv.create初始化进来的:BlockTransferService:NettyBlockTransferService,它是块传输服务
 
blockTransferService.init(this)
 
// 默认情况下动态资源分配被关闭掉,所以得到的是NettyBlockTransferService,而它是BlockTransferService的子类,
  // 而BlockTransferService是ShuffleClient子类,所以init(appId)没有做任何实现

  shuffleClient.init(appId)

 
/**
    * 初始化BlockManagerId是BlockManager的唯一标识
    * 该方法做了:实例化BlockManagerId之后放到ConcurrentHashMap中,这个map的key和value的类型都是BlockManagerId
    *
    * executorId:如果是driver创建SparkEnv:值driver,如果CoarseGrainedExecutorBackend则是具体的数值串
    * NettyBlockTransferService.hostName:是driver本机hostname,如果CoarseGrainedExecutorBackend则worker的ip
    * NettyBlockTransferService.port : 是nettySever的port
    */

  blockManagerId
= BlockManagerId(
   
executorId, blockTransferService.hostName, blockTransferService.port)

===》BlockManagerId的初始化过程:

private[spark] object BlockManagerId{

 
/**
   * Returns a
[[org.apache.spark.storage.BlockManagerId]] for the given configuration.
   *
   * @param execId ID of theexecutor.  如是driver值是"driver"串,如果CoarseGrainedExecutorBackend则是具体的数值串
   * @param host Host name of theblock manager.如果 CoarseGrainedExecutorBackend则worker的ip
   * @param port Port of the blockmanager. 是nettySever的port
   * @return A new
[[org.apache.spark.storage.BlockManagerId]]
.
   */

 
def apply(execId: String, host: String, port: Int):BlockManagerId =
   
//实例化BlockManagerId之后放到ConcurrentHashMap中,这个map的key和value的类型都是BlockManagerId
    getCachedBlockManagerId(new BlockManagerId(execId, host, port))

 
def apply(in: ObjectInput): BlockManagerId = {
   
val obj= new BlockManagerId()
   
obj.readExternal(in)
    getCachedBlockManagerId(obj)
  }

  val blockManagerIdCache= new ConcurrentHashMap[BlockManagerId, BlockManagerId]()

 
def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = {
   
blockManagerIdCache.putIfAbsent(id, id)
   
blockManagerIdCache.get(id)
 
}
}

2,再查看BlockManagerMaster.registerBlockManager

def initialize(appId: String): Unit = {
  //SparkEnv.create初始化进来的:BlockTransferService:NettyBlockTransferService,
它是块传输服务
    blockTransferService.init(this)
  // 默认情况下动态资源分配被关闭掉,所以得到的是NettyBlockTransferService,
而它是BlockTransferService的子类,
  // 而BlockTransferService是ShuffleClient子类,所以init(appId)没有做任何实现
  shuffleClient.init(appId)

  /**
    * 初始化BlockManagerId是BlockManager的唯一标识,每个BlockManagerId里面有
executorId、CoarseGrainedExecutorBackend所在的ip,port
    * 该方法做了:实例化BlockManagerId之后放到ConcurrentHashMap中,
这个map的key和value的类型都是BlockManagerId
    *
    * executorId:如果是driver创建SparkEnv:值driver,如果CoarseGrainedExecutorBackend
则是具体的数值串
    * NettyBlockTransferService.hostName:是driver本机hostname,
如果CoarseGrainedExecutorBackend则worker的ip
    * NettyBlockTransferService.port : 是nettySever的port
    */
  blockManagerId = BlockManagerId(
    executorId, blockTransferService.hostName, blockTransferService.port)
  //可优化的地方,当spark集群被多个应用程序共享时,开启资源动态分配非常有用,
默认是关闭的,在standalone模式中要开启很简单
  //先开启spark.dynamicAllocation.enabled为true, 再开启spark.shuffle.service.enabled
为true就可以
  shuffleServerId = if (externalShuffleServiceEnabled) {
    logInfo(s"external shuffle service port = $externalShuffleServicePort")
    BlockManagerId(executorId, blockTransferService.hostName, externalShuffleServicePort)
  } else {
    blockManagerId
  }
  /**    实参:
    *   master对象是BlockManagerMaster在diver、executor都会生成BlockManagerMaster: 
diver上的BlockManagerMaster负责对Executor上的BlockManager进行管理,
    *   它里面有BlockManagerMasterEndpoint引用,Executor上通过获取的它的引用,
然后给它发消息实现和Driver交互
        slaveEndpoint: BlockManagerSlaveEndpoint的作用是得到master命令来执行相关操作,
如从slave 的BlockManger中移除block
        blockManagerId:  BlockManagerId("driver或executor的数值", 
driver或worker的host, nettyserver的port)
    */
  master.registerBlockManager(blockManagerId, maxMemory, slaveEndpoint)

  // Register Executors' configuration with the local shuffle service, if one should exist.
  if (externalShuffleServiceEnabled && !blockManagerId.isDriver) {
    registerWithExternalShuffleServer()
  }
}

3,该方法生成BlockManagerInfo放到BlockManagerMasterEndpoint成员blockManagerInfo对应HashMap[BlockManagerId, BlockManagerInfo]集合中BlockManagerInfo管理所有BlockManagerId,而BlockManagerId是BlockManager的唯一标识,同时它还有BlockManagerSlaveEndpoint(是driver和slave交互用的)

/** Register theBlockManager's id with the driver.
  * 注册blockManagerId到driver上
  * 参数:
  * slaveEndpoint对应BlockManagerSlaveEndpoint,它的作用是得到master命令来执行相关操作,如从slave 的BlockManger中移除block 它里面有mapOutputTracker:
      //如果是executor:对应MapOutputTrackerWorker,会从driver中的MapOutputTrackerMaster得到map out 的信息
      // 如果是driver:对应MapOutputTrackerMaster,使用TimeStampedHashMap来跟踪 map的输出信息
  * blockManagerId:BlockManagerId("driver或executor的数值", driver或worker的host, nettyserver的port)
  *
  * 调用流程是
  * 1,sparkContext中或Executor初始化时被调用:blockManager.initialize(_applicationId)
  * 2,blockManager.initialize调用BlockManagerMaster.registerBlockManager
  * 3,tell会调用BlockManagerMasterEndpoint.receiveAndReply,将RegisterBlockManager这个case class放进去
  * */

def registerBlockManager(
   
blockManagerId: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef): Unit = {
 
logInfo("Trying to register BlockManager")
 
tell(RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint))
 
logInfo("Registered BlockManager")
}

4,调用tell给BlockManagerMasterEndpoint发送信息

/** Send aone-way message to the master endpoint, to which we expect it to reply withtrue.
  * 发送一条信息给driverEndpoint:BlockManagerMasterEndpoint,并且必让回复信息是true
  * message的值如:RegisterBlockManager(blockManagerId,maxMemSize, slaveEndpoint)
  * */

private def tell(message: Any) {
 
//driverEndpoint : BlockManagerMasterEndpoint
 
if (!driverEndpoint.askWithRetry[Boolean](message)) {
   
throw new SparkException("BlockManagerMasterEndpointreturned false, expected true.")
 
}
}

5,Executor向driver 的BlockManagerMasterEndpoint进行交互,注册BlockManagerId

private[spark]
class BlockManagerMasterEndpoint(
   
override val rpcEnv: RpcEnv,
   
val isLocal: Boolean,
   
conf: SparkConf,
   
listenerBus: LiveListenerBus)
 
extends ThreadSafeRpcEndpoint withLogging {

 
// Mapping from block manager id to the block manager'sinformation.
  // 缓存所有的BlockManagerId及其BlockManagerInfo,而BlockManagerInfo存放的是它所在的Executor中所有Block的信息

  private val blockManagerInfo = new mutable.HashMap[BlockManagerId, BlockManagerInfo]

 
// Mapping from executor ID to block manager ID.
  // 缓存executorId与其拥有的BlockManagerId之间的映射关系

  private val blockManagerIdByExecutor = new mutable.HashMap[String, BlockManagerId]

 
// Mapping from block id to the set of block managersthat have the block.
  // 缓存Block与BlockManagerId的映射关系

  private val blockLocations = new JHashMap[BlockId, mutable.HashSet[BlockManagerId]]

 
private val askThreadPool = ThreadUtils.newDaemonCachedThreadPool("block-manager-ask-thread-pool")
 
private implicit val askExecutionContext = ExecutionContext.fromExecutorService(askThreadPool)

 
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] ={
 
   /**
      * 参数:
      * slaveEndpoint对应BlockManagerSlaveEndpoint,它的作用是得到master命令来执行相关操作,如从slave 的BlockManger中移除block
      * 调用流程是
      * 1,sparkContext中或executor初始化时被调用:_env.blockManager.initialize(_applicationId)
      * 2,blockManager.initialize调用BlockManagerMaster.registerBlockManager
      * 3,BlockManagerMaster的tell方法会调用BlockManagerMasterEndpoint.receiveAndReply,将RegisterBlockManager这个case class放进去
      * 4,BlockManagerMasterEndpoint再调用自身的register方法将blockManagerId注册到BlockMangerInfo
      *
      *  slaveEndpoint: BlockManagerSlaveEndpoint的作用是得到master命令来执行相关操作,如从slave 的BlockManger中移除block
          blockManagerId:BlockManagerId("driver", driver的host,nettyserver的port)
      */

   
case RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint) =>
     
register(blockManagerId, maxMemSize, slaveEndpoint)
     
//要返回true不然BlockManagerMasterEndPoint会报错
      context.reply(true)

==>调用register注册BlockManagerId:该方法最终生成BlockManagerInfo放到BlockManagerMasterEndpoint成员blockManagerInfo对应HashMap[BlockManagerId, BlockManagerInfo]集合中  BlockManagerInfo管理所有BlockManagerId,而BlockManagerId是BlockManager的唯一标识,同时它还有BlockManagerSlaveEndpoint(是driver和slave交互用的)

/**
  *
  * 参数:
  * slaveEndpoint:BlockManagerSlaveEndpoint,它的作用是得到master命令来执行相关操作,如从slave 的BlockManger中移除block
  * 它里面有mapOutputTracker:
      //如果是executor:则是MapOutputTrackerWorker,会从driver中的MapOutputTrackerMaster得到map out 的信息
      // 如果是driver:则是MapOutputTrackerMaster,使用TimeStampedHashMap来跟踪 map的输出信息
  * blockManagerId: BlockManagerId("driver或executor的数值", driver或worker的host, nettyserver的port)
    */

private def register(id: BlockManagerId, maxMemSize:Long, slaveEndpoint: RpcEndpointRef) {
 
val time= System.currentTimeMillis()
 
//blockManagerInfo对应HashMap[BlockManagerId, BlockManagerInfo]缓存所有的BlockManagerId及其BlockManagerInfo,
  // 而BlockManagerInfo存放的是它所在的Executor中所有Block的信息

  if (!blockManagerInfo.contains(id)) {
   
// blockManagerIdByExecutor对应HashMap[String, BlockManagerId]缓存executorId与其拥有的BlockManagerId之间的映射关系
    blockManagerIdByExecutor.get(id.executorId) match {
     
case Some(oldId) =>
       
// A block manager of the same executor already exists,so remove it (assumed dead)
       
logError("Gottwo different block manager registrations on same executor - "
    
       + s"will replace old one $oldId with new one $id")
       
removeExecutor(id.executorId)
      case None=>
   
}
    logInfo("Registering block manager %s with %s RAM, %s".format(
     
id.hostPort, Utils.bytesToString(maxMemSize), id))
   
//将BlockMangerId放到HashMap[String, BlockManagerId],它的key就是executor的id
    blockManagerIdByExecutor(id.executorId) = id
   
//BlockManagerInfo管理所有BlockManagerId,而BlockManagerId是BlockManager的唯一标识,同时它还有BlockManagerSlaveEndpoint(是driver和slave交互用的)
    blockManagerInfo(id) = new BlockManagerInfo(
     
id, System.currentTimeMillis(), maxMemSize, slaveEndpoint)
 
}

//向LiveListenerBus发送SparkListenerBlockManagerAdded事件
  listenerBus.post(SparkListenerBlockManagerAdded(time, id, maxMemSize))
}

到此:BlockManager. initialize()方法执行结束


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值