Spark Streaming源码解读之Executor容错安全性
Executor的容错性主要有两种方式
1) WAL日志
2) 借助Spark RDD自身的容错机制
分别体现在receivedBlockHandler的两种实现上(ReceiverSupervisorImpl.scala 55-68)
private val receivedBlockHandler: ReceivedBlockHandler = { if (WriteAheadLogUtils.enableReceiverLog(env.conf)) { if (checkpointDirOption.isEmpty) { throw new SparkException( "Cannot enable receiver write-ahead log without checkpoint directory set. " + "Please use streamingContext.checkpoint() to set the checkpoint directory. " + "See documentation for more details.") } new WriteAheadLogBasedBlockHandler(env.blockManager, env.serializerManager, receiver.streamId, receiver.storageLevel, env.conf, hadoopConf, checkpointDirOption.get) } else { new BlockManagerBasedBlockHandler(env.blockManager, receiver.storageLevel) } }
第一种实现为WriteAheadLogBasedBlockHandler (ReceivedBlockHandler.scala 125-134)
private[streaming] class WriteAheadLogBasedBlockHandler( blockManager: BlockManager, serializerManager: SerializerManager, streamId: Int, storageLevel: StorageLevel, conf: SparkConf, hadoopConf: Configuration, checkpointDir: String, clock: Clock = new SystemClock )
实例化时需要指明checkpoint所在路径。
注:checkpoint一般会在HDFS上,默认有3份副本;指定storgaeLevel时没有必要再制定副本数目。
第二种实现为BlockManagerBasedBlockHandler(ReceivedBlockHandler.scala 69-70)
private[streaming] class BlockManagerBasedBlockHandler( blockManager: BlockManager, storageLevel: StorageLevel)借助RDD自身容错时,实例化要简单一些,重要的是指定storageLevel
默认为MEMORY_AND_DISK_SER_2
storageLevel有如下选项可选择如下(StorageLevel.scala 39-45 属于Spark Core):
class StorageLevel private( private var _useDisk: Boolean, private var _useMemory: Boolean, private var _useOffHeap: Boolean, private var _deserialized: Boolean, private var _replication: Int = 1) extends Externalizabl