SparkEnv
源码版本2.4.7
SparkContext有一核心属性_env即SparkEnv,持有所有spark运行实例的环境对象,序列化、RPCEnv、块管理、MapOutputTracker(shuffle输出相关的记录)等等
首先看看其基本属性
class SparkEnv (
val executorId: String,
private[spark] val rpcEnv: RpcEnv,
val serializer: Serializer,
val closureSerializer: Serializer,
val serializerManager: SerializerManager,
val mapOutputTracker: MapOutputTracker,
val shuffleManager: ShuffleManager,
val broadcastManager: BroadcastManager,
val blockManager: BlockManager,
val securityManager: SecurityManager,
val metricsSystem: MetricsSystem,
val memoryManager: MemoryManager,
val outputCommitCoordinator: OutputCommitCoordinator,
val conf: SparkConf)
RpcEnv 各个组件之间通信的执行环境
SerializerManager 对象在通用网络传输或者写入序列化
MapOutputTracker 用于跟踪Map阶段任务的输出状态,此状态便于Reduce阶段任务获取地址及中间结果
ShuffleManager 负责管理本地及远程的Block数据的shuffle操作
BroadcastManager 将配置信息和序列化后的RDD、Job以及ShuffleDependency等信息在本地存储
BlockManager 负责对Block的管理,管理整个Spark运行时的数据读写的
SecurityManager 主要对账户、权限及身份认证进行设置与管理。
MemoryManager 一个抽象的内存管理器,用于执行内存如何在执行和存储之间共享。
MetricsSystem 各种指标的度量系统
sparkenv创建过程
/**
* Helper method to create a SparkEnv for a driver or an executor.
*/
private def create(
conf: SparkConf,
executorId: String,
bindAddress: String,
advertiseAddress: String,
port: Option[Int],
isLocal: Boolean,
numUsableCores: Int,
ioEncryptionKey: Option[Array[Byte]],
listenerBus: LiveListenerBus = null,
mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv = {
val isDriver = executorId == SparkContext.DRIVER_IDENTIFIER
// Listener bus is only used on the driver
if (isDriver) {
assert(listenerBus != null, "Attempted to create driver SparkEnv with null listener bus!")
}
val securityManager = new SecurityManager(conf, ioEncryptionKey)
if (isDriver) {
securityManager.initializeAuth()
}
ioEncryptionKey.foreach {
_ =>
if (!securityManager.isEncryptionEnabled()) {
logWarning("I/O encryption enabled without RPC encryption: keys will be visible on the " +
"wire.")
}
}
val systemName = if (isDriver) driverSystemName else executorSystemName
val rpcEnv = RpcEnv.create(systemName, bindAddress, advertiseAddress, port.getOrElse(-1), conf,
securityManager, numUsableCores, !isDriver)
// Figure out which port RpcEnv actually bound to in case the original port is 0 or occupied.
if (isDriver) {
conf.set("spark.driver.port", rpcEnv.address.port.toString)
}
// Create an instance of the class with the given name, possibly initializing it with our conf
def instantiateClass[