它可以通过前述的 Kafka, Flume等数据源

最新推荐文章于 2022-07-04 00:24:54 发布

Taylor.tian

最新推荐文章于 2022-07-04 00:24:54 发布

阅读量465

点赞数

关于StreamingContext有几个值得注意的地方：

1.StreamingContext启动后，增加新的操作将不起作用。也就是说在StreamingContext启动之前，要定义好所有的计算逻辑
2.StreamingContext停止后，不能重新启动。也就是说要重新计算的话，需要重新运行整个程序。
3.在单个JVM中，一段时间内不能出现两个active状态的StreamingContext
4.调用StreamingContext的stop方法时，SparkContext也将被stop掉，如果希望StreamingContext关闭时，保留SparkContext,则需要在stop方法中传入参数stopSparkContext=false
/**
* Stop the execution of the streams immediately (does not wait for all received data
* to be processed). By default, if stopSparkContext is not specified, the underlying
* SparkContext will also be stopped. This implicit behavior can be configured using the
* SparkConf configuration spark.streaming.stopSparkContextByDefault.
*
* @param stopSparkContext If true, stops the associated SparkContext. The underlying SparkContext
* will be stopped regardless of whether this StreamingContext has been
* started.
*/
def stop(
stopSparkContext: Boolean = conf.getBoolean(“spark.streaming.stopSparkContextByDefault”, true)
): Unit = synchronized {
stop(stopSparkContext, false)
}
5.SparkContext对象可以被多个StreamingContexts重复使用，但需要前一个StreamingContexts停止后再创建下一个StreamingContext对象。

3. InputDStreams及Receivers
InputDStream指的是从数据流的源头接受的输入数据流，在前面的StreamingWordCount程序当中，val lines = ssc.textFileStream(args(0)) 就是一种InputDStream。除文件流外，每个input DStream都关联一个Receiver对象，该Receiver对象接收数据源传来的数据并将其保存在内存中以便后期Spark处理。

Spark Streaimg提供两种原生支持的流数据源：

Basic sources（基础流数据源）。直接通过StreamingContext API创建，例如文件系统（本地文件系统及分布式文件系统）、Socket连接及Akka的Actor。
文件流（File Streams）的创建方式:
a. streamingContext.fileStreamKeyClass, ValueClass, InputFormatClass
b. streamingContext.textFileStream(dataDirectory)
实时上textFileStream方法最终调用的也是fileStream方法
def textFileStream(directory: String): DStream[String] = withNamedScope(“text file stream”) {
fileStreamLongWritable, Text, TextInputFormat.map(_._2.toString)
}

基于Akka Actor流数据的创建方式：
streamingContext.actorStream(actorProps, actor-name)

基于Socket流数据的创建方式：
ssc.socketTextStream(hostname: String,port: Int,storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2）

基于RDD队列的流数据创建方式：
streamingContext.queueStream(queueOfRDDs)

Taylor.tian

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
它可以通过前述的 Kafka, Flume等数据源

关于StreamingContext有几个值得注意的地方：1.StreamingContext启动后，增加新的操作将不起作用。也就是说在StreamingContext启动之前，要定义好所有的计算逻辑 2.StreamingContext停止后，不能重新启动。也就是说要重新计算的话，需要重新运行整个程序。 3.在单个JVM中，一段时间内不能出现两个active状态的Streamin
复制链接

扫一扫