sparkstreaming下的第一个word count程序（python版）

最新推荐文章于 2020-11-25 20:31:26 发布

donger__chen

最新推荐文章于 2020-11-25 20:31:26 发布

阅读量333

点赞数

分类专栏： spark 文章标签： spark sparkstreaming pyspark python 数据流

本文链接：https://blog.csdn.net/donger__chen/article/details/102246132

版权

spark 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

首先从socket中读取数据，然后通过sparkstreaming统计输入的单词个数

1.通过下面命令开启端口（报错则需安装 nc）

nc -lk 9999

2.编写sparkstreaming.py代码

from pyspark import SparkContext
from pyspark.streaming import StreamingContext

# Create a local StreamingContext with two working thread and batch interval of 1 second
#至少需要2个核，因为需要有一个核用于读取数据
sc = SparkContext("local[2]", "NetworkWordCount")
#间隔一秒读取一次数据流
ssc = StreamingContext(sc, 1)


# Create a DStream that will connect to hostname:port, like localhost:9999
lines = ssc.socketTextStream("localhost", 9999)

# Split each line into words
words = lines.flatMap(lambda line: line.split(" "))

# Count each word in each batch
pairs = words.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)

# Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.pprint()

ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate

该段代码的作用是，每隔1s时间，从9999端口读取该时间段内输入的数据，并统计读取到的数据的word count。

3.spark-submit --master local sparkstreaming.py运行上述代码。

当在步骤1的窗口中输入数据，则在运行spark的窗口可以看到统计结果。

donger__chen

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sparkstreaming下的第一个word count程序（python版）

首先从socket中读取数据，然后通过sparkstreaming统计输入的单词个数1.通过下面命令开启端口（报错则需安装 nc）nc -lk 99992.编写sparkstreaming.py代码from pyspark import SparkContextfrom pyspark.streaming import StreamingContext# Create a...
复制链接

扫一扫