Spark Streaming UI这块是本篇额外的内容,与主题无关,只是把它记录下来
Spark Streaming UI上一组统计数字的含义
Streaming
- Started at: 1433563238275(Spark Streaming开始运行的时间)
- Time since start: 3 minutes 51 seconds(Spark Streaming已经运行了多长时间)
- Network receivers: 2(Receiver个数)
- Batch interval: 1 second(每个Batch的时间间隔,即接收多长时间的数据就生成一个Batch,或者说是RDD)
- Processed batches: 231 (已经处理的Batch个数,不管Batch中是否有数据,都会计算在内,)
- Waiting batches: 0 (等待处理的Batch数据,如果这个值很大,表明Spark的处理速度较数据接收的速度慢,需要增加计算能力或者降低接收速度)
- Received records: 66 (已经接收到的数据,每读取一次,读取到的所有数据称为一个record)
- Processed records: 66 (已经处理的record)
(Processed batches + Waiting batches) * Batch Interval = Time Since Start
Spark Streaming Checkpoint的一个坑
源代码:
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
object SparkStreamingCheckpointEnabledTest {
def main(args: Array[String]) {
val checkpointDirectory = "file:///d:/data/chk_streaming"
def funcToCreateSSC(): StreamingContext = {
val conf = new SparkConf().setAppName("NetCatWordCount")
conf.setMaster("local[3]")
val ssc = new StreamingContext(conf, Seconds(1))
ssc.checkpoint(checkpointDirectory)
ssc
}
val ssc = StreamingContext.getOrCreate(checkpointDirectory, funcToCreateSSC)
val numStreams = 2
val streams = (1 to numStreams).map(i => ssc.socketTextStream("localhost", 9999))
val lines = ssc.union(streams)
lines.print()
ssc.start()
ssc.awaitTermination()
}
}
以上代码是错误的,因为停掉Driver后再次重启,将无法启动,解决办法是将streams的操作放到funcToCreateSSC函数里,ssc返回前
object SparkStreamingCheckpointEnabledTest {
def process(streams: Seq[DStream[String]], ssc: StreamingContext) {
val lines = ssc.union(streams)
lines.print
}
def main(args: Array[String]) {
val checkpointDirectory = "file:///d:/data/chk_streaming"
def funcToCreateSSC(): StreamingContext = {
val conf = new SparkConf().setAppName("NetCatWordCount")
conf.setMaster("local[3]")
val ssc = new StreamingContext(conf, Seconds(1))
ssc.checkpoint(checkpointDirectory)
val numStreams = 2
val streams = (1 to numStreams).map(i => ssc.socketTextStream("localhost", 9999))
process(streams, ssc)
ssc
}
val ssc = StreamingContext.getOrCreate(checkpointDirectory, funcToCreateSSC)
ssc.start()
ssc.awaitTermination()
}
}