TorrentBroadcast
TorrentBroadcast继承了Broadcast,是Broadcast的一个子类,实现了Broadcast中的getValue()、doUnpersist()、doDestroy()方法
diver会将序列化的对象按照指定大小默认为4mb,切分为块后保存到BlockManager。当excutor运行时,若需要的块不在本地的BlockManager中时,会去其他的diver或excutor上的BlockManager获取。
源码清单和我的理解注释
初始化变量
/**
* Value of the broadcast object on executors. This is reconstructed by [[readBroadcastBlock]],
* which builds this value by reading blocks from the driver and/or other executors.
* On the driver, if the value is required, it is read lazily from the block manager.
*
* _value是executor上广播变量的值,通过从executor和/或其他executor读取块生成此值
*/
@transient private lazy val _value: T = readBroadcastBlock()
/** The compression codec to use, or None if compression is disabled */
//使用压缩解码器,如果禁止使用压缩解码器则为无
@transient private var compressionCodec: Option[CompressionCodec] = _
/** Size of each block. Default value is 4MB. This value is only read by the broadcaster. */
//每个块的大小,默认值为4MB,只通过BroadcastBlockId()方法读取
@transient private var blockSize: Int = _
private val broadcastId = BroadcastBlockId(id)
/** Total number of blocks this broadcast variable contains. */
//此广播变量总共占据了几个数据块
private val numBlocks: Int = writeBlocks(obj)
/** Whether to generate checksum for blocks or not. */
//是否生成块的校验和
private var checksumEnabled: Boolean = false
/** The checksum for all the blocks. */
//所有块的校验和
private var checksums: Array[Int] = _
根据conf中的配置参数来得到变量compressionCodec 、blockSize 、checksumEnabled 的值
private def setConf(conf: SparkConf) {
compressionCodec = if (conf.getBoolean("spark.broadcast.compress", true)) {
Some(CompressionCodec.createCodec(conf))
} else {
None
}
// Note: use getSizeAsKb (not bytes) to maintain compatibility if no units are provided
blockSize = conf.getSizeAsKb("spark.broadcast.blockSize", "4m").toInt * 1024
checksumEnabled = conf.getBoolean("spark.broadcast.checksum", true)
}
setConf(SparkEnv.get.conf)
摧毁广播变量
/**
* Divide the object