本期内容:
1. Spark Streaming初始化源码图解
2. Spark Streaming关闭源码图解
首先我们来看接收数据的过程,通过ssc.socketTextStream方法来创建:
/** * Create a input stream from TCP source hostname:port. Data is received using * a TCP socket and the receive bytes is interpreted as UTF8 encoded `\n` delimited * lines. * @param hostname Hostname to connect to for receiving data * @param port Port to connect to for receiving data * @param storageLevel Storage level to use for storing the received objects * (default: StorageLevel.MEMORY_AND_DISK_SER_2) */ def socketTextStream( hostname: String, port: Int, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2 ): ReceiverInputDStream[String] = withNamedScope("socket text stream") { socketStream[String](hostname, port, SocketReceiver.bytesToLines, storageLevel) }
其实方法重载最后创建的是一个SocketInputDStream.newSocketInputDStream[T](this, hostname, port, converter, storageLevel)
SocketReceiver.bytesToLines:
/**
* This methods translates the data froman inputstream (say, from a socket)
* to '\n' delimited strings and returnsan iterator to access the strings.
*/
def bytesToLines(inputStream: InputStream): Iterator[String] = {
val dataInputStream= new BufferedReader(new InputStreamReader(inputStream,"UTF-8"))
new NextIterator[String] {
protected override def getNext() = {
val nextValue= dataInputStream.readLine()
if (nextValue== null) {
finished = true
}
nextValue
}
protected override def close() {
dataInputStream.close()
}
}
}
ssc.awaittermination:
/**
* Wait for the execution to stop. Anyexceptions that occurs during the execution
* will be thrown in this thread.
*/
def awaitTermination() {
waiter.waitForStopOrError()
}
用于等待执行的停止,再结束全部运行程序。
ContextWaiter.waitForStopOrError:
/** * Return `true` if it's stopped; or throw the reported error if `notifyError` has been called; or * `false` if the waiting time detectably elapsed before return from the method. */ def waitForStopOrError(timeout: Long = -1): Boolean = { lock.lock() try { if (timeout < 0) { while (!stopped && error == null) { condition.await() } } else { var nanos = TimeUnit.MILLISECONDS.toNanos(timeout) while (!stopped && error == null && nanos > 0) { nanos = condition.awaitNanos(nanos) } } // If already had error, then throw it if (error != null) throw error // already stopped or timeout stopped } finally { lock.unlock() } }
线程没停止,而且没有error时,会一直等待。