Spark是一个分布式计算框架,当 我们提交一个任务,会划分为多个子任务分发到集群的各个节点进行计算,这里思考一个问题,Spark是如何进行消息的传递,如何将任务分发到各个节点,并且如何将计算结果汇总起来的呢?
实际上,Spark内部使用Akka进行消息的传递,心跳报告等,使用Netty提供RPC服务进行数据的上传与下载功能。这点与Flink类似。
块管理器BlockManager是Spark存储体系中的核心组件,在Driver以及Executor中都会创建BlockManager,在BlockManager初始化时,它会初始化一些组件,其具体源码如下:
private[spark] class BlockManager(
executorId: String,
actorSystem: ActorSystem,
val master: BlockManagerMaster,
defaultSerializer: Serializer,
maxMemory: Long,
val conf: SparkConf,
mapOutputTracker: MapOutputTracker,
shuffleManager: ShuffleManager,
blockTransferService: BlockTransferService,
securityManager: SecurityManager,
numUsableCores: Int)
extends BlockDataManager with Logging {
//磁盘存储管理器
val diskBlockManager = new DiskBlockManager(this, conf)
private val blockInfo = new TimeStampedHashMap[BlockId, BlockInfo]
// Actual storage of where blocks are kept
private var tachyonInitialized = false
//内存存储
private[spark] val memoryStore = new MemoryStore(this, maxMemory)
//磁盘存储
private[spark] val diskStore = new DiskStore(this, diskBlockManager)
//TachyonStore
private[spark] lazy val tachyonStore: TachyonStore = {
val storeDir = conf.get("spark.tachyonStore.baseDir", "/tmp_spark_tachyon")
val appFolderName = conf.get("spark.tachyonStore.folderName")
val tachyonStorePath = s"$storeDir/$appFolderName/${this.executorId}"
val tachyonMaster = conf.get("spark.tachyonStore.url", "tachyon://localhost:19998")
val tachyonBlockManager =
new TachyonBlockManager(this, tachyonStorePath, tachyonMaster)
tachyonInitialized = true
new TachyonStore(this, tachyonBlockManager)
}
private[spark]
val externalShuffleServiceEnabled = conf.getBoolean("spark.shuffle.service.enabled", false)
// Port used by the external shuffle service. In Yarn mode, this may be already be
// set through the Hadoop configuration as the server is launched in the Yarn NM.
private val externalShuffleServicePort =
Utils.getSparkOrYarnConfig(conf, "spark.shuffle.service.port", "7337").toInt
// Check that we're not using external shuffle service with consolidated shuffle files.
if (externalShuffleServiceEnabled
&& conf.getBoolean("spark.shuffle.consolidateFiles", false)
&& shuffleManager.isInstanceOf[HashShuffleManager]) {
throw new UnsupportedOperationException("Cannot use external shuffle service with consolidated"
+ " shuffle files in hash-based shuffle. Please disable spark.shuffle.consolidateFiles or "
+ " switch to sort-based shuffle.")
}
var blockManagerId: BlockManagerId = _
// Address of the server that serves this executor's shuffle files. This is either an external
// service, or just our own Executor's BlockManager.
private[spark] var shuffleServerId: BlockManagerId = _
// Client to read other executors' shuffle files. This is either an external service, or just the
// standard BlockTransferService to directly connect to other Executors.
//创建shuffleClient,如果有外部ShuffleService,那么创建外部ShuffleClient,否则为blockTransferService
private[spark] val shuffleClient = if (externalShuffleServiceEnabled) {
val transConf = SparkTransportConf.fromSparkConf(conf, numUsableCores)
new ExternalShuffleClient(transConf, securityManager, securityManager.isAuthenticationEnabled())
} else {
blockTransferService
}
// Whether to compress broadcast variables that are stored
private val compressBroadcast = conf.getBoolean("spark.broadcast.compress", true)
// Whether to compress shuffle output that are stored
private val compressShuffle = conf.getBoolean("spark.shuffle.compress", true)
// Whether to compress RDD partitions that are stored serialized
private val compressRdds = conf.getBoolean("spark.rdd.compress", false)
// Whether to compress shuffle output temporarily spilled to disk
private val compressShuffleSpill = conf.getBoolean("spark.shuffle.spill.compress", true)
private val slaveActor = actorSystem.actorOf(
Props(new BlockManagerSlaveActor(this, mapOutputTracker)),
name = "BlockManagerActor" + BlockManager.ID_GENERATOR.next)
// Pending re-registration action being executed asynchronously or null if none is pending.
// Accesses should synchronize on asyncReregisterLock.
private var asyncReregisterTask: Future[Unit] = null
private val asyncReregisterLock = new Object
//非广播block清理器
private val metadataCleaner = new MetadataCleaner(
MetadataCleanerType.BLOCK_MANAGER, this.dropOldNonBroadcastBlocks, conf)
//广播block清理器
private val broadcastCleaner = new MetadataCleaner(
MetadataCleanerType.BROADCAST_VARS, this.dropOldBroadcastBlocks, conf)
// Field related to peer block managers that are necessary for block replication
@volatile private var cachedPeers: Seq[BlockManagerId] = _
private val peerFetchLock = new Object
private var lastPeerFetchTime = 0L
/* The compression codec to use. Note that the "lazy" val is necessary because we want to delay
* the initialization of the compression codec until it is first used. The reason is that a Spark
* program could be using a user-defined codec in a third party jar, which is loaded in
* Executor.updateDependencies. When the BlockManager is initialized, user level jars hasn't been
* loaded yet. */
//压缩算法
private lazy val compressionCodec: CompressionCodec = CompressionCodec.createCodec(conf)
主要完成了以下初始化操作:
- Shuffle客户端ShuffleClient
- BlockManagerMaster:对所有的Executor上的BlockManager进行统一管理
- 磁盘块管理器DiskBlockManager
- 内存存储MemoryStore
- 磁盘存储DiskStore
- Tachyon存储TachyonStore
- 非广播Block清理器metadataCleaner和广播Block清理器broadcastCleaner
- 压缩算法的实现compressionCodec
在初始化shuffleClient的时候可以看到它会判断是否有外部ShuffleClient,如果有则创建ExternalShuffleClient,否则创建BlockTransferService。
BlockTransferService是一个抽象类,它的默认实现类为NettyBlockTransferService
,进入到该类的init()
方法中,源码如下:
override def init(blockDataManager: BlockDataManager): Unit = {
val (rpcHandler: RpcHandler, bootstrap: Option[TransportClientBootstrap]) = {
//创建RPC服务
val nettyRpcHandler = new NettyBlockRpcServer(serializer, blockDataManager)
if (!authEnabled) {
(nettyRpcHandler, None)
} else {
(new SaslRpcHandler(nettyRpcHandler, securityManager),
Some(new SaslClientBootstrap(transportConf, conf.getAppId, securityManager)))
}
}
//构造TransportContext
transportContext = new TransportContext(transportConf, rpcHandler)
//创建RPC客户端工厂TransportClientFactory
clientFactory = transportContext.createClientFactory(bootstrap.toList)
//创建Netty服务器TransportServer
server = transportContext.createServer(conf.getInt("spark.blockManager.port", 0))
appId = conf.getAppId
logInfo("Server created on " + server.getPort)
}
它的初始化过程主要完成了以下任务;
- 创建RPCServer:NettyBlockRpcServer
- 构建TransportContext
- 构建RPC客户端工厂TransportClientFactory
- 创建Netty服务器TransportServer
Block RPC服务
当map任务与reduce任务处于不同节点的时候,reduce端就需要从远端节点下载map任务的中间输出,因此需要打开NettyBlockRpcServer
,它提供了下载block的功能,有时候为了容错,我们需要将Block的数据备份到其他节点上,所以它还提供了上传的功能。源码如下:
class NettyBlockRpcServer(
serializer: Serializer,
blockManager: BlockDataManager)
extends RpcHandler with Logging {
private val streamManager = new OneForOneStreamManager()
override def receive(
client: TransportClient,
messageBytes: Array[Byte],
responseContext: RpcResponseCallback): Unit = {
val message = BlockTransferMessage.Decoder.fromByteArray(messageBytes)
logTrace(s"Received request: $message")
message match {
case openBlocks: OpenBlocks =>
val blocks: Seq[ManagedBuffer] =
openBlocks.blockIds.map(BlockId.apply).map(blockManager.getBlockData)
val streamId = streamManager.registerStream(blocks.iterator)
logTrace(s"Registered streamId $streamId with ${blocks.size} buffers")
responseContext.onSuccess(new StreamHandle(streamId, blocks.size).toByteArray)
case uploadBlock: UploadBlock =>
// StorageLevel is serialized as bytes using our JavaSerializer.
val level: StorageLevel =
serializer.newInstance().deserialize(ByteBuffer.wrap(uploadBlock.metadata))
val data = new NioManagedBuffer(ByteBuffer.wrap(uploadBlock.blockData))
blockManager.putBlockData(BlockId(uploadBlock.blockId), data, level)
responseContext.onSuccess(new Array[Byte](0))
}
}
override def getStreamManager(): StreamManager = streamManager
}
传输上下文TransportContext
TransportContext
用于维护传输上下文,它既可以创建Netty服务,也可以创建Netty访问客户端,其构造器如下:
public class TransportContext {
private final Logger logger = LoggerFactory.getLogger(TransportContext.class);
//主要控制Netty框架提供shuffle的IO交互客户端和服务端线程数量
private final TransportConf conf;
//负责shuffle的IO服务端在接收到客户端的RPC请求后,提供打开Block或者上传Block的服务
private final RpcHandler rpcHandler;
//在shuffle的IO服务端对客户端传来的数据进行解析,防止丢包和解析错误
private final MessageEncoder encoder;
//在shuffle的IO客户端对 消息内容进行编码,防止服务端丢包和解析错误
private final MessageDecoder decoder;
public TransportContext(TransportConf conf, RpcHandler rpcHandler) {
this.conf = conf;
this.rpcHandler = rpcHandler;
this.encoder = new MessageEncoder();
this.decoder = new MessageDecoder();
}
构建RPC客户端工厂TransportClientFactory
TransportClientFactory
是Netty的客户端工厂,用于通过连接远端的Executor的BlockManager中的TransportServer提供RPC服务下载或者上传的Block。
public TransportClientFactory(
TransportContext context,
//用于缓存客户端列表
List<TransportClientBootstrap> clientBootstraps) {
this.context = Preconditions.checkNotNull(context);
this.conf = context.getConf();
this.clientBootstraps = Lists.newArrayList(Preconditions.checkNotNull(clientBootstraps));
//用于缓存客户端连接
this.connectionPool = new ConcurrentHashMap<SocketAddress, ClientPool>();
//节点之间取连接的连接数,可以通过参数spark.shuffle.io.numConnectionsPerPeer来配置,默认为1
this.numConnectionsPerPeer = conf.numConnectionsPerPeer();
this.rand = new Random();
IOMode ioMode = IOMode.valueOf(conf.ioMode());
//客户端channel被创建时使用的类,可以使用属性spark.shuffle.io.mode来 配置,默认为NioSocketChannel
this.socketChannelClass = NettyUtils.getClientChannelClass(ioMode);
// TODO: Make thread pool name configurable.
this.workerGroup = NettyUtils.createEventLoop(ioMode, conf.clientThreads(), "shuffle-client");
//对本地线程缓存禁用的分配器
this.pooledAllocator = NettyUtils.createPooledByteBufAllocator(
conf.preferDirectBufs(), false /* allowCache */, conf.clientThreads());
}
Netty服务器TransportServer
TransportServer
用于远端节点可以访问本地Executor的BlockManager中的TransportServer提供的RPC服务下载或者上传Block。
/** Creates a TransportServer that binds to the given port, or to any available if 0. */
public TransportServer(TransportContext context, int portToBind) {
this.context = context;
this.conf = context.getConf();
init(portToBind);
}
从上面代码中可以看到它调用了init()
方法进行初始化,实际上就是Netty服务器的初始化,这里不做过多解释。
远程拉取Shuffle文件
在NettyBlockTransferService
的fetchBlocks
方法中用于获取远程的shuffle文件:
override def fetchBlocks(
host: String,
port: Int,
execId: String,
blockIds: Array[String],
listener: BlockFetchingListener): Unit = {
logTrace(s"Fetch blocks from $host:$port (executor id $execId)")
try {
val blockFetchStarter = new RetryingBlockFetcher.BlockFetchStarter {
override def createAndStart(blockIds: Array[String], listener: BlockFetchingListener) {
val client = clientFactory.createClient(host, port)
new OneForOneBlockFetcher(client, appId, execId, blockIds.toArray, listener).start()
}
}
val maxRetries = transportConf.maxIORetries()
if (maxRetries > 0) {
// Note this Fetcher will correctly handle maxRetries == 0; we avoid it just in case there's
// a bug in this code. We should remove the if statement once we're sure of the stability.
new RetryingBlockFetcher(transportConf, blockFetchStarter, blockIds, listener).start()
} else {
blockFetchStarter.createAndStart(blockIds, listener)
}
} catch {
case e: Exception =>
logError("Exception while beginning fetchBlocks", e)
blockIds.foreach(listener.onBlockFetchFailure(_, e))
}
}
上传shuffle文件
NettyBlockTransferService
的uploadBlock
方法用于上传shuffle文件到远程executor,首先它会创建客户单,然后使用hostname以及port连接选中的BlockManager,然后将block序列化为字节数组进行上传,然后上传块的状态信息,源码如下:
override def uploadBlock(
hostname: String,
port: Int,
execId: String,
blockId: BlockId,
blockData: ManagedBuffer,
level: StorageLevel): Future[Unit] = {
val result = Promise[Unit]()
val client = clientFactory.createClient(hostname, port)
// StorageLevel is serialized as bytes using our JavaSerializer. Everything else is encoded
// using our binary protocol.
val levelBytes = serializer.newInstance().serialize(level).array()
// Convert or copy nio buffer into array in order to serialize it.
val nioBuffer = blockData.nioByteBuffer()
val array = if (nioBuffer.hasArray) {
nioBuffer.array()
} else {
val data = new Array[Byte](nioBuffer.remaining())
nioBuffer.get(data)
data
}
client.sendRpc(new UploadBlock(appId, execId, blockId.toString, levelBytes, array).toByteArray,
new RpcResponseCallback {
override def onSuccess(response: Array[Byte]): Unit = {
logTrace(s"Successfully uploaded block $blockId")
result.success()
}
override def onFailure(e: Throwable): Unit = {
logError(s"Error while uploading block $blockId", e)
result.failure(e)
}
})
result.future
}
欢迎加入大数据学习交流群:731423890