sparkstreaming官方文档笔记

最新推荐文章于 2024-05-10 22:51:01 发布

风筝中有风

最新推荐文章于 2024-05-10 22:51:01 发布

阅读量294

点赞数

分类专栏： Spark 大数据

本文链接：https://blog.csdn.net/Fighingbigdata/article/details/78741622

版权

大数据同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

Spark

3 篇文章 0 订阅

订阅专栏

1、sparksteaming 入门例子

注：代码摘自spark官方文档 http://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-example

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._ // not necessary since Spark 1.3

// Create a local StreamingContext with two working thread and batch interval of 1 second.
// The master requires 2 cores to prevent from a starvation scenario.

val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))

// Create a DStream that will connect to hostname:port, like localhost:9999
val lines = ssc.socketTextStream("localhost", 9999)

// Split each line into words
val words = lines.flatMap(_.split(" "))

import org.apache.spark.streaming.StreamingContext._ // not necessary since Spark 1.3
// Count each word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)

// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print()

ssc.start()             // Start the computation
ssc.awaitTermination()  // Wait for the computation to terminate

然后，开启一个终端窗口，作为数据源输入： nc -lk 9999

进入spark环境目录，执行workcount实时统计例子： ./bin/run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999

2、DStream 数据源

1）、TCP scoket

如上例子；

通过StreamingContext API 读取文件数据源streamingContext.textFileStream(dataDirectory)

2）、Advanced Sources

也可以从kafka、flume、kinesis（这个工作中还真没使用过）消费数据，这也是典型的sparkstreaming实时处理流程；

3）、Custom Sources

根据业务场景定制数据源；

之前工作涉及浅显的spark技术，由于最近工作也不怎么用，工作之余，就重新学习一下，共勉！

风筝中有风

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sparkstreaming官方文档笔记

1、sparksteaming 入门例子注：代码摘自spark官方文档 http://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-exampleimport org.apache.spark._import org.apache.spark.streaming._import
复制链接

扫一扫

专栏目录