前文可知,streaming 会持续不断的接收数据,同时不断的生成job,不断的提交job给集群运行。
而安全性有两部分,数据的容错和运行时计算的容错。
众所周知,Streaming是运行在Spark Core之上的,而Spark Core中RDD是有非常强悍的容错机制的,Streaming借助Spark Core 可以保证运行时的安全性。
那么有一个前提,必须在数据安全的前提下,运行时计算的容错机制才可以发挥作用,那么是如何确保数据的安全性的呢?也就是说,对executor接收数据的安全容错,主要是指接收到的数据的安全容错。至于在数据安全容错的基础之上的调度的安全容错,基本是借助spark core,当然spark streaming框架自己的driver也有一部分。
- Executor的WAL
- 数据重放
关于数据容错,大家能想到的最简单的方式是,在接收到数据时,自动保存若干个副本,在当前副本有问题时,使用另外一个副本。
另一种方式就是数据源支持重放,所谓重放就是反复的可取已有的数据。比如过去10秒钟的数据读取后再计算阶段出现问题,可以重新读取这十秒钟的数据。
还是以SocketReceiver为例:
// SocketInputDStream.scala line 76
while(!isStopped && iterator.hasNext) {
store(iterator.next)
}
调用supervisor的pushSingle
// Receiver.scala line 118
def store(dataItem: T) {
supervisor.pushSingle(dataItem)
}
调用blockGenerator.
// ReceiverSupervisorImpl.scala line 118
def pushSingle(data: Any) {
defaultBlockGenerator.addData(data)
}
将数据加入到当前缓存中
// BlockGenerator.scala line 160 spark 1.6.0
def addData(data: Any): Unit = {
if (state == Active) {
waitToPush()
synchronized {
if (state == Active) {
currentBuffer += data
} else {
throw new SparkException(
"Cannot add data as BlockGenerator has not been started or has been stopped")
}
}
} else {
throw new SparkException(
"Cannot add data as BlockGenerator has not been started or has been stopped")
}
}
而于此同时,BlockGenerator实例化时的定时器定时将缓存中的数据加入待push队列中[blocksForPushing]。
// BlockGenerator.scala line 234
var newBlock: Block = null
synchronized {
if (currentBuffer.nonEmpty) {
val newBlockBuffer = currentBuffer
currentBuffer = new ArrayBuffer[Any]
val blockId = StreamBlockId(receiverId, time - blockIntervalMs)
listener.onGenerateBlock(blockId)
newBlock = new Block(blockId, newBlockBuffer)
}
}
if (newBlock != null) {
// 加入到待推送队列中,此队列为阻塞队列,使用put方法会队列大小超过指定数量时阻塞。
blocksForPushing.put(newBlock) // put is blocking when queue is full
}
同时,BlockGenerator实例化时的线程通知 监听器 将 队列中的数据 推给BlockManager
// BlockGenerator.scala line 109
private val blockPushingThread = new Thread() { override def run() { keepPushingBlocks() } }
// BlockGenerator.scala line 166
while (areBlocksBeingGenerated) {
Option(blocksForPushing.poll(10, TimeUnit.MILLISECONDS)) match {
case Some(block) => pushBlock(block)
case None =>
}
}
// BlockGenerator.scala line 295
private def pushBlock(block: Block) {
listener.onPushBlock(block.id, block.buffer)
logInfo("Pushed block " + block.id)
}
// ReceiverSupervisorImpl.scala line 108
def onPushBlock(blockId: StreamBlockId, arrayBuffer: ArrayBuffer[_]) {
pushArrayBuffer(arrayBuffer, None, Some(blockId))
}
// ReceiverSupervisorImpl.scala line 123
def pushArrayBuffer(
arrayBuffer: ArrayBuffer[_],
metadataOption: Option[Any],
blockIdOption: Option[StreamBlockId]
) {
pushAndReportBlock(ArrayBufferBlock(arrayBuffer), metadataOption, blockIdOption)
}
按照不同的receivedBlockHandler实现,取不同的实例。假设时WAL的实现。
// ReceiverSupervisorImpl.scala line 150
def pushAndReportBlock(
receivedBlock: ReceivedBlock,
metadataOption: Option[Any],
blockIdOption: Option[StreamBlockId]
) {
val blockId = blockIdOption.getOrElse(nextBlockId)
val time = System.currentTimeMillis
val blockStoreResult = receivedBlockHandler.storeBlock(blockId, receivedBlock)
logDebug(s"Pushed block $blockId in ${(System.currentTimeMillis - time)} ms")
val numRecords = blockStoreResult.numRecords
val blockInfo = ReceivedBlockInfo(streamId, numRecords, metadataOption, blockStoreResult)
// 从远程worker节点将存储的Block元数据通过rpc通信通知Driver
trackerEndpoint.askWithRetry[Boolean](AddBlock(blockInfo))
logDebug(s"Reported block $blockId")
}
// ReceiverSupervisorImpl.scala line 53
private val receivedBlockHandler: ReceivedBlockHandler = {
if (WriteAheadLogUtils.enableReceiverLog(env.conf)) {
if (checkpointDirOption.isEmpty) {
throw new SparkException(
"Cannot enable receiver write-ahead log without checkpoint directory set. " +
"Please use streamingContext.checkpoint() to set the checkpoint directory. " +
"See documentation for more details.")
}
new WriteAheadLogBasedBlockHandler(env.blockManager, receiver.streamId,
receiver.storageLevel, env.conf, hadoopConf, checkpointDirOption.get)
} else {
new BlockManagerBasedBlockHandler(env.blockManager, receiver.storageLevel)
}
}
WriteAheadLogBasedBlockHandler的实现。
大家可以看到,再本案例中,既将数据存入BlockManager,同时又写入WAL,而且WAL默认是 MEMORY_AND_DISK_SER_2
最后返回BlockStoreResult,里面是 Block的元数据
// WriteAheadLogBasedBlockHandler.scala line 166
def storeBlock(blockId: StreamBlockId, block: ReceivedBlock): ReceivedBlockStoreResult = {
var numRecords = None: Option[Long]
// Serialize the block so that it can be inserted into both
val serializedBlock = block match {
// 此处是ArrayBufferBlock ,见ReceiverSupervisorImpl.scala line 123
case ArrayBufferBlock(arrayBuffer) =>
numRecords = Some(arrayBuffer.size.toLong)
// spark core的BlockManager序列化
blockManager.dataSerialize(blockId, arrayBuffer.iterator)
case IteratorBlock(iterator) =>
val countIterator = new CountingIterator(iterator)
val serializedBlock = blockManager.dataSerialize(blockId, countIterator)
numRecords = countIterator.count
serializedBlock
case ByteBufferBlock(byteBuffer) =>
byteBuffer
case _ =>
throw new Exception(s"Could not push $blockId to block manager, unexpected block type")
}
// Store the block in block manager,Spark Core的BlockManager,store block
val storeInBlockManagerFuture = Future {
val putResult =
blockManager.putBytes(blockId, serializedBlock, effectiveStorageLevel, tellMaster = true)
if (!putResult.map { _._1 }.contains(blockId)) {
throw new SparkException(
s"Could not store $blockId to block manager with storage level $storageLevel")
}
}
// Store the block in write ahead log,写入wal,写入的是数据本身
val storeInWriteAheadLogFuture = Future {
writeAheadLog.write(serializedBlock, clock.getTimeMillis())
}
// Combine the futures, wait for both to complete, and return the write ahead log record handle
val combinedFuture = storeInBlockManagerFuture.zip(storeInWriteAheadLogFuture).map(_._2)
val walRecordHandle = Await.result(combinedFuture, blockStoreTimeout)
WriteAheadLogBasedStoreResult(blockId, numRecords, walRecordHandle)
}
ReceiverTracker接收到AddBlock的消息,使用driver
// ReceiverTracker.scala line 495
case AddBlock(receivedBlockInfo) =>
if (WriteAheadLogUtils.isBatchingEnabled(ssc.conf, isDriver = true)) {
walBatchingThreadPool.execute(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
if (active) {
// 异步添加 receivedBlockInfo的元数据
context.reply(addBlock(receivedBlockInfo))
} else {
throw new IllegalStateException("ReceiverTracker RpcEndpoint shut down.")
}
}
})
} else {
context.reply(addBlock(receivedBlockInfo))
}
// ReceiverTracker.scala line 320
private def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
receivedBlockTracker.addBlock(receivedBlockInfo)
}
// ReceivedBlockTracker.scala line 84
/** Add received block. This event will get written to the write ahead log (if enabled). */
def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
try {
// 元数据写入wal
val writeResult = writeToLog(BlockAdditionEvent(receivedBlockInfo))
if (writeResult) {
synchronized {
getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo
}
logDebug(s"Stream ${receivedBlockInfo.streamId} received " +
s"block ${receivedBlockInfo.blockStoreResult.blockId}")
} else {
logDebug(s"Failed to acknowledge stream ${receivedBlockInfo.streamId} receiving " +
s"block ${receivedBlockInfo.blockStoreResult.blockId} in the Write Ahead Log.")
}
writeResult
} catch {
case NonFatal(e) =>
logError(s"Error adding block $receivedBlockInfo", e)
false
}
}
而当需要恢复的时候,会按照是否有checkpoint,直接从Driver恢复出状态
//
private def recoverPastEvents(): Unit = synchronized {
// Insert the recovered block information
def insertAddedBlock(receivedBlockInfo: ReceivedBlockInfo) {
logTrace(s"Recovery: Inserting added block $receivedBlockInfo")
receivedBlockInfo.setBlockIdInvalid()
getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo
}
// Insert the recovered block-to-batch allocations and clear the queue of received blocks
// (when the blocks were originally allocated to the batch, the queue must have been cleared).
def insertAllocatedBatch(batchTime: Time, allocatedBlocks: AllocatedBlocks) {
logTrace(s"Recovery: Inserting allocated batch for time $batchTime to " +
s"${allocatedBlocks.streamIdToAllocatedBlocks}")
streamIdToUnallocatedBlockQueues.values.foreach { _.clear() }
timeToAllocatedBlocks.put(batchTime, allocatedBlocks)
lastAllocatedBatchTime = batchTime
}
// Cleanup the batch allocations
def cleanupBatches(batchTimes: Seq[Time]) {
logTrace(s"Recovery: Cleaning up batches $batchTimes")
timeToAllocatedBlocks --= batchTimes
}
writeAheadLogOption.foreach { writeAheadLog =>
logInfo(s"Recovering from write ahead logs in ${checkpointDirOption.get}")
writeAheadLog.readAll().asScala.foreach { byteBuffer =>
logTrace("Recovering record " + byteBuffer)
Utils.deserialize[ReceivedBlockTrackerLogEvent](
byteBuffer.array, Thread.currentThread().getContextClassLoader) match {
case BlockAdditionEvent(receivedBlockInfo) =>
insertAddedBlock(receivedBlockInfo)
case BatchAllocationEvent(time, allocatedBlocks) =>
insertAllocatedBatch(time, allocatedBlocks)
case BatchCleanupEvent(batchTimes) =>
cleanupBatches(batchTimes)
}
}
}
}
上述为WAL的方式。而BlockManagerBasedBlockHandler则完全是依赖于BlockManager。
而另外一种方式就是数据重放,这个目前只有与kafka配合,在这个场景中,kafka已经不仅仅是消息队列了,更多的承担着数据存储的功能,而天然的副本也节约了streaming 数据接收阶段的容错开销。
只要streaming接收到的数据未向kafka 发起ack请求,则kafka会认为此消息未被消费。若数据处理中遭遇失败,则恢复后依然可以从未ack的数据开始。