structured_streaming.py代码
from pyspark.sql import SparkSession
from pyspark.sql.functions import split
from pyspark.sql.functions import explode
import sys
if __name__=="__main__":
spark=SparkSession\
.builder\
.appName("StructuredNetworkWordCount")\
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
lines=spark\
.readStream\
.format("socket")\
.option("host",sys.argv[1])\
.option("port",int(sys.argv[2]))\
.load()
words=lines.select(
explode(
split(lines.value," ")
).alias("word")
)
wordCounts=words.groupBy("word").count()
query=wordCounts\
.writeStream\
.outputMode("complete")\
.format("console")\
.trigger(processingTime="8 seconds")\
.start()
query.awaitTermination()
窗口spark-master
搭建集群~
再开一个窗口,同样是 spark-master窗口
运行命令:spark-submit xxx.py spark-master 9006
即:spark-submit structured_streaming.py spark-master 9006
第一个窗口:
第二个窗口显示