本期内容:
1. Spark Streaming初始化源码图解
2. Spark Streaming关闭源码图解
Spark Streaming的
StreamingContext是采用装饰器模式,对SparkContext的封装。是在Spark Core的基础上加了一些功能,所有的实际上还是用Spark Core去实现。
batchDuration是在Spark Core的基础上新加的参数。Driver上有一个定时器,用于定时产生job,就是使用了batchDuration。Executor上的定时器是用于定时接收和存储数据。
看看StreamGraph中的成员。
有生成的JobScheduler对象。
progressListener是StreamingListener、SparkListener的子类对象,用于处理进度。
uiTab用于Spark网页UI。
socketTextStream:
/**
* Create a input stream from TCP source hostname:port. Data is received using
* a TCP socket and the receive bytes is interpreted as UTF8 encoded `\n` delimited
* lines.
* @param hostname Hostname to connect to for receiving data
* @param port Port to connect to for receiving data
* @param storageLevel Storage level to use for storing the received objects
* (default: StorageLevel.MEMORY_AND_DISK_SER_2)
*/
def socketTextStream(
hostname: String,
port: Int,
storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2
): ReceiverInputDStream[String] = withNamedScope("socket text stream") {
socketStream[String](hostname, port, SocketReceiver.
bytesToLines
, storageLevel)
}
SocketReceiver.bytesToLines:
/**
* This methods translates the data from an inputstream (say, from a socket)
* to '\n' delimited strings and returns an iterator to access the strings.
*/
def bytesToLines(inputStream: InputStream): Iterator[String] = {
val dataInputStream = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"))
new NextIterator[String] {
protected override def getNext() = {
val nextValue = dataInputStream.readLine()
if (nextValue == null) {
finished = true
}
nextValue
}
protected override def close() {
dataInputStream.close()
}
}
}
把接收的数据转成用行的迭代器来表示。
再说说ssc.awaittermination:
/**
* Wait for the execution to stop. Any exceptions that occurs during the execution
* will be thrown in this thread.
*/
def awaitTermination() {
waiter.waitForStopOrError()
}
用于等待执行的停止,再结束全部运行程序。
ContextWaiter.waitForStopOrError:
/**
* Return `true` if it's stopped; or throw the reported error if `notifyError` has been called; or
* `false` if the waiting time detectably elapsed before return from the method.
*/
def waitForStopOrError(timeout: Long = -1): Boolean = {
lock.lock()
try {
if (timeout < 0) {
while (!stopped && error == null) {
condition.await()
}
} else {
var nanos = TimeUnit.MILLISECONDS.toNanos(timeout)
while (!stopped && error == null && nanos > 0) {
nanos = condition.awaitNanos(nanos)
}
}
// If already had error, then throw it
if (error != null) throw error
// already stopped or timeout
stopped
} finally {
lock.unlock()
}
}
线程没停止,而且没有error时,会一直等待。