Spark Streaming介绍:
基于Spark之上的流处理(rdd)
流:source ==> compute ==> store
离线是特殊的流
letting you write streaming jobs,the same way you write batch jobs
out of the box 开箱即用 OOTB(内置的)
DStream整合RDD就需要使用transform算子
Steanming、Core 、SQL比较:
Steanming : DStream <= represents a continuous stream of data
Core:RDD
SQL: DF/DS
Streaming入口:StreamingContext
Core:SparkContext
SQL:
SparkSession
SQLContext/HiveContext
编程模型一 socketTextStream:
import org.apache.spark._
import org.apache.spark.streaming._
val ssc = new StreamingContext(sc, Seconds(5))
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "