一、搭建spark环境
https://blog.csdn.net/starkpan/article/details/86437089
二、实例
1、spark中自带的例子
打开两个termial(或shell终端)
一个终端输入,通过nc进行内容传输
nc -lk 9999
a a a b b b c c c c c
一个终端启动sparkStream实例,这里是spark自带的example,如果执行报异常,请查看路径
spark-submit --master local[2] --class org.apache.spark.examples.streaming.NetworkWordCount --name NetworkWordCount /Users/panstark/Documents/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/jars/spark-examples_2.11-2.1.0.jar localhost 9999
git代码
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala
2、编程实例
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* sparkStreaming
* 通过netWork传输实现字数统计
* nc -lk 9999
*/
object NetworkWordCount {
def main(args: Array[String]): Unit = {
//创建sparkConf
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
/**
*创建StreamingContext
*/
val ssc = new StreamingContext(sparkConf,Seconds(5))
//创建socket接收端
val lines = ssc.socketTextStream("localhost",9999)
//对接受到的内容进行处理
val result = lines.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
//打印。
result.print()
ssc.start()
ssc.awaitTermination()
}
}
通过nc传输数据
nc -lk 9999
a a a b b b c c c c c