查看(spark-core_25:Master通知Worker启动CoarseGrainedExecutorBackend进程及CoarseGrainedExecutorBackend初始化源码分析)
//sparkContext在初始化时也调用了_env.blockManager.initialize(_applicationId)执行的过程差不多
private[spark] class BlockManager(
executorId: String, //executorId:如是driver值是"driver"串,如果CoarseGrainedExecutorBackend则是具体的数值串
rpcEnv: RpcEnv,
val master: BlockManagerMaster, //diver上的BlockManagerMaster负责对Executor上的BlockManager进行管理,
// 它里面有BlockManagerMasterEndpoint引用,Executor上通过获取的它的引用,然后给Endpoint发消息实现和Driver交互
defaultSerializer: Serializer,
val conf: SparkConf,
memoryManager: MemoryManager, //默认使用UnifiedMemoryManager
mapOutputTracker: MapOutputTracker, //如果是executor:MapOutputTrackerWorker,会从driver中的MapOutputTrackerMaster得到map out 的信息
// 如果是driver:MapOutputTrackerMaster,使用TimeStampedHashMap来跟踪 map的输出信息
shuffleManager: ShuffleManager,
blockTransferService: BlockTransferService,
securityManager: SecurityManager,
numUsableCores: Int)
extends BlockDataManager withLogging {
…为了分析流程,将不太相关的代码去掉,读者可以对着源码跟我一块分析
/**
* Initializes the BlockManager withthe given appId. This is not performed in the constructor as
* the appId may not be known atBlockManager instantiation time (in particular for the driver,
* where it is only learned afterregistration with the TaskScheduler).
*
* This method initializes theBlockTransferService and ShuffleClient, registers with the
* BlockManagerMaster, starts theBlockManagerWorker endpoint, and registers with a local shuffle
* service if configured.
*
* 该方法在sparkContext或在Executor初始化时被调用:_env.blockManager.initialize(_applicationId)
* 该方法的作用:
* 1,用给定的appId初始化BlockManager。 (特别是对于仅在向TaskScheduler注册之后的驱动程序)
* 2,blockTransferService.init(this)创建一个NettyServer
* 3,生成BlockManagerId("driver",driver的host, nettyserver的port),它是每个BlockManager的唯一标识
* 3.1,实例化生成BlockManagerSlaveEndpoint的作用是得到master命令来执行相关操作,如从slave 的BlockManger中移除block
* 4,master.registerBlockManager():生成BlockManagerInfo放到BlockManagerMasterEndpoint成员blockManagerInfo对应HashMap[BlockManagerId,BlockManagerInfo]集合中
* BlockManagerInfo管理所有BlockManagerId,而BlockManagerId是BlockManager的唯一标识,同时它还有BlockManagerSlaveEndpoint(是driver和slave交互用的)
*
*参数appId的值类似: app-20180404172558-0000
*/
def initialize(appId: String): Unit = {
//SparkEnv.create初始化进来的:BlockTransferService:NettyBlockTransferService,它是块传输服务
/** NettyBlockTransferService.init(this)做了如下事情:
1.创建RpcServer:NettyBlockRpcServer,为每个请求打开或上传注册在BlockManager中的任意Block块,每一次Chunk的传输相当于一次shuffle;
2.构建TransportContext:TransportContext:包含创建{TransportServer:nettyServer},{TransportClientFactory用来创建TransportClient}的上下文,并使用{TransportChannelHandler}设置NettyChannel管道。
3.客户端工厂TransportClientFactory:这个工厂实例通过使用createClient创建客户端{TransportClient}, 这个工厂实例维护一个到其他主机的连接池,并应为相同的远程主机返回相同的TransportClient。 它还为所有TransportClient共享单个线程池。
4.创建Netty服务器TransportServer:包括编解码,还有入站事件都加到TransportServer这个nettySever中(上面各个类是围绕NettyServer来干活的)
*/
blockTransferService.init(this)
。。.
}
1,NettyBlockTransferService这个块传输服务是如何初始化nettySever的,NettyBlockTransferService.init()做了上面注释说的四件事
/**
* A BlockTransferService that uses Nettyto fetch a set of blocks at at time.
* 它是由SparkEnv.create()方法初始化出来的
* 在单位时间内使用netty取得block块集合,blockTransferService默认为NettyBlockTransferService(提供web服务及客户端,获取远程节点上的Block集合)
* numCores:如果master是local模式会将driver对应节点cpu的线程数取出来,如果是集群模式则返回0
* numCores如果是CoarseGrainedExecutorBackend创建的SparkEnv则它的值是:
* SparkConf的"spark.executor.cores"的值决定(我这设置了1所以是1),如果没有值,只启动一个CoarseGrainedExecutorBackend,把worker所有可用的core给它
*/
class NettyBlockTransferService(conf: SparkConf, securityManager: SecurityManager, numCores: Int) extendsBlockTransferService {
…
//通过参数中的BlockDataManager(是BlockManager的父类)来初始化传输服务,通过BlockDataManager可以得到本地的Block(getBlockData)和put本地block(putBlockData)
//该方法会被BlockManager中的initialize()方法调用
override def init(blockDataManager:BlockDataManager): Unit = {
/** conf.getAppId: app-20180508234845-0000
* serializer: JavaSerializer()
* blockDataManager : BlockManager实例
*/
val rpcHandler = new NettyBlockRpcServer(conf.getAppId, serializer, blockDataManager)
…
}
2,初始化NettyBlockRpcServer:作用为每个请求打开或上传注册在BlockManager中的任意Block块,每一次Chunk的传输相当于一次shuffle
class NettyBlockRpcServer(
appId: String, // app-20180508234845-0000
serializer: Serializer, //JavaSerializer
blockManager: BlockDataManager) //BlockManager实例
extends RpcHandler with Logging {
//StreamManager允许注册Iterator<ManagedBuffer> ,通过TransportClient客户端可以得到各自的chunks,每个注册的buffer是一个chunk
private val streamManager = new OneForOneStreamManager()
//通过openBlock和uploadBlock,可以打开和上传注册在BlockManager中的Block
override def receive(
client: TransportClient,
rpcMessage: ByteBuffer,
responseContext: RpcResponseCallback): Unit = {
val message= BlockTransferMessage.Decoder.fromByteBuffer(rpcMessage)
logTrace(s"Received request: $message")
message match {
case openBlocks:OpenBlocks =>
val blocks:Seq[ManagedBuffer] =
….
case uploadBlock:UploadBlock =>
// StorageLevel is serialized as bytes using ourJavaSerializer.
…
}
}
override def getStreamManager(): StreamManager = streamManager
//再看一下OneForOneStreamManager,是由getStreamManager()进行调用的
/**
* StreamManager which allowsregistration of an Iterator<ManagedBuffer>, which are individually fetched as chunks by the client.Each registered buffer is one chunk.
*
* StreamManager允许注册Iterator<ManagedBuffer> ,通过TransportClient客户端可以得到各自的chunks,每个注册的buffer是一个chunk
*/
public class OneForOneStreamManager extends StreamManager {
private final Logger logger = LoggerFactory.getLogger(OneForOneStreamManager.class);
private final AtomicLong nextStreamId;
private final ConcurrentHashMap<Long, StreamState>streams;
/** State of a single stream. */
private static class StreamState {
。。。
}
//BlockManager.initialize==>NettyBlockTransferService.init()==>newNettyBlockRpcServer初始化时调用的
//设置成员:nextStreamId:AtomicLong,赋小于Integer.MAX_VALUE*1000的值
//streams:变成ConcurrentHashMap<Long,StreamState>()
public OneForOneStreamManager(){
// For debugging purposes, start with a random stream idto help identifying different streams.
// This does not need to be globallyunique, only unique to this class.
nextStreamId = new AtomicLong((long) new Random().nextInt(Integer.MAX_VALUE) * 1000);
streams = new ConcurrentHashMap<Long, StreamState>();
}
。。。。
3,再回到NettyBlockTransferService.init方法
override def init(blockDataManager: BlockDataManager): Unit = {
/** conf.getAppId: app-20180508234845-0000
* serializer: JavaSerializer()
* blockDataManager : BlockManager实例
* NettyBlockRpcServer作用: 为每个请求打开或上传注册在BlockManager中的任意Block块,每一次Chunk的传输相当于一次shuffle
*/
val rpcHandler= new NettyBlockRpcServer(conf.getAppId, serializer, blockDataManager)
var serverBootstrap:Option[TransportServerBootstrap] = None
var clientBootstrap:Option[TransportClientBootstrap] = None
if (authEnabled) {//默认是false,不开启认证
serverBootstrap = Some(new SaslServerBootstrap(transportConf, securityManager))
clientBootstrap = Some(new SaslClientBootstrap(transportConf, conf.getAppId, securityManager,
securityManager.isSaslEncryptionEnabled()))
}
//TransportContext:包含创建{TransportServer:nettyServer},{TransportClientFactory用来创建TransportClient}的上下文,并使用{TransportChannelHandler}设置Netty Channel管道。
//实例化TransportContext将给成员赋值 conf:TransportConf它可以通过成员子类ConfigProvider和sparkConf关联、rpcHandler:NettyBlockRpcServer、closeIdleConnections:false、同时给出站的编码器、入站解码器,赋具体实例
transportContext = new TransportContext(transportConf, rpcHandler)
==》先看一下NettyBlockTransferService的成员transportConf
//numCores:如果master是local模式会将driver对应节点cpu的线程数取出来,如果是集群模式则返回0
/** SparkTransportConf.fromSparkConf():会按numCores给sparkConf的spark.shuffle.io.serverThreads或spark.shuffle.io.clientThreads设置线程数,
是给netty的server或client使用的,如果没有给sparkConf设置值则这个值是小于等于8
fromSparkConf方法:返回TransportConf实例:会按fromSparkConf第二个参数给TransportConf为module给它的成员变量设置key类型
如:SPARK_NETWORK_IO_MODE_KEY:spark.shuffle.io.mode、 SPARK_NETWORK_IO_SERVERTHREADS_KEY: spark.shuffle.io.serverThreads
*/
private val transportConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numCores)
===>SparkTransportConf按写的源码看是通过TransportConf成员ConfigProvider的子类,和SparkConf建立关系,同时按module参数,设置key的变量值,
==》同时将指定spark.shuffle.io.serverThreads、spark.shuffle.io.clientThreads的线程数是实参numUsableCores的值
/**
* Provides a utility for transformingfrom a SparkConf inside a Spark JVM (e.g., Executor,
* Driver, or a standalone shuffleservice) into a TransportConf with details on our environment
* like the number of cores that areallocated to this JVM.
* 提供一个工具类,将当前Spark JVM中的SparkConf(例如,Executor,Driver或独立shuffle服务)转换为TransportConf,
* 并提供有关环境的详细信息,例如分配给此JVM的核心数。
*/
object SparkTransportConf {
* spark默认8个netty 线程,在实践中2-4个cores需要10Gb/s的传输,每个core需要初始大于32MB的出堆内存这个线程值可以通过serverThreads和clientThreads手动匹配来更新这个值
*/
private val MAX_DEFAULT_NETTY_THREADS = 8
/**
* Utility for creating a [[TransportConf]] froma [[SparkConf]].
* @param _conf the [[SparkConf]]
* @param module the module name 如shuffle
* 如果numUsableCores是非0的值,将会限制server和client的线程,只能使用给定义的cores数,而不是整个服务器的cores
* @param numUsableCores ifnonzero, this will restrict the server and client threads to only use the givennumber of cores, rather than all of the machine's cores.
* This restriction will only occur ifthese properties are not already set.
*
从sparkConf创建一个TransportConf
* numUsableCores:如果master是local模式会将driver对应节点cpu的线程数取出来,如果是集群模式则返回0
* numUsableCores:如果是CoarseGrainedExecutorBackend创建的SparkEnv则它的值是:
*SparkConf的"spark.executor.cores"的值决定(我这设置了1所以是1),如果没有值,只启动一个CoarseGrainedExecutorBackend,把worker所有可用的core给它
*/
def fromSparkConf(_conf: SparkConf, module:String, numUsableCores:Int = 0): TransportConf = {
val conf= _conf.clone
// Specify thread configuration based on our JVM'sallocation of cores (rather than necessarily
// assuming we have all the machine'scores).
// NB: Only set ifserverThreads/clientThreads not already set.
//defaultNumThreads:返回给netty的client和server的线程池对应core的数量,如果是numUsableCores是0 会返回当前小于等8的线程数
val numThreads= defaultNumThreads(numUsableCores)
//我这边将spark.executor.cores设置成1,所以numThreads的值是1,即spark.shuffle.io.serverThreads、spark.shuffle.io.clientThreads的值是1
conf.setIfMissing(s"spark.$module.io.serverThreads", numThreads.toString)
conf.setIfMissing(s"spark.$module.io.clientThreads", numThreads.toString)
//ConfigProvider是抽像类,需要实现抽像方法,它的作用就是帮助实例化TransportConf
//按module变量值去设置它们成员变量的值如:SPARK_NETWORK_IO_MODE_KEY:spark.shuffle.io.mode、
//SPARK_NETWORK_IO_SERVERTHREADS_KEY: spark.shuffle.io.serverThreads
new TransportConf(module, new ConfigProvider {
override def get(name: String): String = conf.get(name)
})
}
/**
* Returns the default number ofthreads for both the Netty client and server thread pools.
* If numUsableCores is 0, we will useRuntime get an approximate number of available cores.
* 返回默认的线程数,给netty的client和server的线程池使用,如果是numUsableCores是0 会返回当前小于等8的线程数
*/
private def defaultNumThreads(numUsableCores:Int): Int = {
val availableCores=
if (numUsableCores> 0) numUsableCores else Runtime.getRuntime.availableProcessors()
math.min(availableCores, MAX_DEFAULT_NETTY_THREADS)
}
}
===》再回到NettyBlockTransferService.init方法看一下new TransportContext(transportConf, rpcHandler)
override def init(blockDataManager: BlockDataManager): Unit = {
。。。
//TransportContext:包含创建{TransportServer:nettyServer},{TransportClientFactory用来创建TransportClient}的上下文,并使用{TransportChannelHandler}设置Netty Channel管道。
//实例化TransportContext将给成员赋值 conf:TransportConf它可以通过成员子类ConfigProvider和sparkConf关联、rpcHandler:NettyBlockRpcServer、closeIdleConnections:false、同时给出站的编码器、入站解码器,赋具体实例
transportContext = new TransportContext(transportConf, rpcHandler)
===》初始化TransportContext
* TransportContext包含创建{TransportServer},{TransportClientFactory}的上下文,并使用{TransportChannelHandler}设置Netty Channel管道。
*
* TransportClient提供两种通信协议,控制平面RPC和数据平面“块取出”。
* RPC的处理在TransportContext的范围之外(即,由用户提供的处理程序)执行,并且它负责设置可使用零拷贝IO以块形式流式传输通过数据平面的流。
*
* TransportServer和TransportClientFactory都为每个通道创建一个TransportChannelHandler。
* 由于每个TransportChannelHandler都包含一个TransportClient,因此可以使服务器进程在现有通道上将消息发送回客户端。
*/
public class TransportContext {
private final Logger logger = LoggerFactory.getLogger(TransportContext.class);
//这此成员值看下面构造方法
private final TransportConf conf;
private final RpcHandler rpcHandler;
private final boolean closeIdleConnections;
private final MessageEncoder encoder;
private final MessageDecoder decoder;
public TransportContext(TransportConf conf, RpcHandler rpcHandler) {
this(conf, rpcHandler, false);
}
/**
*
* @param conf TransportConf实例:会按fromSparkConf第二个参数给TransportConf为module给它的成员变量设置key类型。如:SPARK_NETWORK_IO_MODE_KEY:spark.shuffle.io.mode、 SPARK_NETWORK_IO_SERVERTHREADS_KEY: spark.shuffle.io.serverThreads
* @param rpcHandler: NettyBlockRpcServer作用: 为每个请求打开或上传注册在BlockManager中的任意Block块,每一次Chunk的传输相当于一次shuffle
* @param closeIdleConnections 如果是上面两个参数的构造方法,它的值是false
*/
public TransportContext(
TransportConf conf,
RpcHandler rpcHandler,
boolean closeIdleConnections) {
this.conf = conf; //TransportConf,它的成员ConfigProvider子类关联到SparkConf
this.rpcHandler = rpcHandler; //NettyBlockRpcServer
//是MessageToMessageEncoder编码器用于出站事件
this.encoder = new MessageEncoder();
//是MessageToMessageDecoder解码器用于入站事件
this.decoder = new MessageDecoder();
//默认是false
this.closeIdleConnections = closeIdleConnections;
}
4,再回来再回到NettyBlockTransferService.init,看一下TransportContext. createClientFactory()创建TransportClientFactory
override def init(blockDataManager: BlockDataManager): Unit = {
/** conf.getAppId: app-20180508234845-0000
* serializer: JavaSerializer()
* blockDataManager : BlockManager实例
* NettyBlockRpcServer作用: 为每个请求打开或上传注册在BlockManager中的任意Block块,每一次Chunk的传输相当于一次shuffle
*/
val rpcHandler= new NettyBlockRpcServer(conf.getAppId, serializer, blockDataManager)
var serverBootstrap:Option[TransportServerBootstrap] = None
var clientBootstrap:Option[TransportClientBootstrap] = None
if (authEnabled) {//默认是false,不开启认证
serverBootstrap = Some(new SaslServerBootstrap(transportConf, securityManager))
clientBootstrap = Some(new SaslClientBootstrap(transportConf, conf.getAppId, securityManager,
securityManager.isSaslEncryptionEnabled()))
}
//TransportContext:包含创建{TransportServer:nettyServer},{TransportClientFactory用来创建TransportClient}的上下文,并使用{TransportChannelHandler}设置Netty Channel管道。
//实例化TransportContext将给成员赋值 conf:TransportConf它可以通过成员子类ConfigProvider和sparkConf关联
// 、rpcHandler:NettyBlockRpcServer、closeIdleConnections:false、同时给出站的编码器、入站解码器,赋具体实例
transportContext = new TransportContext(transportConf, rpcHandler)
/** 没有开启ssl所以clientBootstrap是空的
*
* TransportClientFactory:这个工厂实例通过使用createClient创建{TransportClient}。
* 这个工厂实例维护一个到其他主机的连接池,并应为相同的远程主机返回相同的TransportClient。 它还为所有TransportClient共享单个线程池。
*
* 在返回新客户端之前初始化运行给定TransportClientBootstraps的ClientFactory。Bootstraps会被同步执行,并且必须运行成功才能创建Client
* 给这个实例TransportClientFactory:赋成员
* context:TransportContext,
* conf:TransportConf会通过它成员:ConfigProvider子类关联SparkConf
* 还有初始化netty的NioSocketChannel.class、NioEventLoopGroup线程组、ByteBuf分配器PooledByteBufAllocator
*/
clientFactory = transportContext.createClientFactory(clientBootstrap.toSeq.asJava)
==》TransportContext.createClientFactory就是将TransportClientFactory工厂初始化了一下
/**
* Initializes a ClientFactory which runsthe given TransportClientBootstraps prior to returning a new Client. Bootstraps will be executedsynchronously, and must run successfully in order to create a Client.
* 在返回新客户端之前初始化运行给定TransportClientBootstraps的ClientFactory。Bootstraps会被同步执行,并且必须运行成功才能创建Client 给这个实例TransportClientFactory:赋成员
* context:TransportContext,
* conf:TransportConf会通过它成员:ConfigProvider子类关联SparkConf
* 还有初始化netty的NioSocketChannel.class、NioEventLoopGroup线程组、ByteBuf分配器PooledByteBufAllocator
**/
public TransportClientFactory createClientFactory(List<TransportClientBootstrap> bootstraps) {
return new TransportClientFactory(this, bootstraps);
}
===》会将TransportContext、TransportConf,按IOMode的枚举类型,默认就是NIO, 得到NioSocketChannel.class
基于IOMode枚举创建Netty的EventLoopGroup线程组、创建一个池化的ByteBuf分配器PooledByteBufAllocator给它的在成员变量。
public TransportClientFactory(
TransportContext context,
List<TransportClientBootstrap> clientBootstraps) {
//保证TransportContext不为空
this.context = Preconditions.checkNotNull(context);
//得到TransportConf实例:会按NettyBlockTransferService.fromSparkConf第二个参数给TransportConf为module给它的成员变量设置key类型
//如:SPARK_NETWORK_IO_MODE_KEY:spark.shuffle.io.mode、 SPARK_NETWORK_IO_SERVERTHREADS_KEY: spark.shuffle.io.serverThreads
this.conf = context.getConf();
//空的ArrayList
this.clientBootstraps = Lists.newArrayList(Preconditions.checkNotNull(clientBootstraps));
//并发ConcurrentHashMap
this.connectionPool = new ConcurrentHashMap<SocketAddress, ClientPool>();
//numConnectionsPerPeer: spark.shuffle.io.numConnectionsPerPeer这个key在sparkConf没有值所以是到默认值1
this.numConnectionsPerPeer = conf.numConnectionsPerPeer();
this.rand= new Random();
//conf.ioMode(): SPARK.SHUFFLE.IO.MODE,会返回一个NIO, EPOLL的枚举值。会返回NIO串,然后变成NIO检举值
IOMode ioMode = IOMode.valueOf(conf.ioMode());
//按NIO,返回如:NioSocketChannel.class
this.socketChannelClass = NettyUtils.getClientChannelClass(ioMode);
// TODO:Make thread pool name configurable.
/**
* conf.clientThreads():对应spark.shuffle.io.clientThreads:是NettyBlockTransferService初始化时==>SparkTransportConf.fromSparkConf
* 调用SparkTransportConf里面的对应ConfigProvider的get方法实现是:SparkConf.get(SPARK_NETWORK_IO_CLIENTTHREADS_KEY)
* 而这个spark.shuffle.io.clientThreads,就是对应CoarseGrainedExecutorBackend的core的数量,我的案例设置成1了
*
* NettyUtils.createEventLoop:基于IOMode枚举创建Netty的EventLoopGroup线程组
*/
this.workerGroup = NettyUtils.createEventLoop(ioMode, conf.clientThreads(), "shuffle-client");
/**
* conf.preferDirectBufs(): 找key是:spark.shuffle.io.preferDirectBufs,查看SparkConf没有这个key则返回true
* conf.clientThreads()的值是1
*NettyUtils.createPooledByteBufAllocator():创建一个池化的ByteBuf分配器PooledByteBufAllocator
*/
this.pooledAllocator = NettyUtils.createPooledByteBufAllocator(
conf.preferDirectBufs(), false /* allowCache */, conf.clientThreads());
}
再回到NettyBlockTransferService.init,创建NettyServer
(查看:spark-core_29:Executor初始化过程env.blockManager.initialize(conf.getAppId)-NettyBlockTransferService.init()-NettyServer创建源码分析)