SparkStreaming中的Window操作
- 窗口函数定义
- 1. 创建topic
- 2. 开启生产者topic
- 3. window(windowLength, slideInterval)
- 4. countByWindow(windowLength,slideInterval)
- 5. countByValueAndWindow(windowLength,slideInterval, [numTasks])
- 6. reduceByWindow(func,windowLength,slideInterval)
- 7. reduceByKeyAndWindow(func,windowLength, slideInterval, [numTasks])
- 8. transform:业务需求需要更改数据结构是可以使用transform完成转化工作
- 9. transform 转换成spark sql 进行流式处理
- 10. SparkStreaming 的driver、executor的执行顺序以及 foreach 及 foreach partition 的效率对比
- maven依赖
窗口函数定义
就是在DStream流上,以一个可配置的长度为窗口,以一个可配置的速率向前移动窗口,根据窗口函数的具体内容,分别对当前窗口中的这一波数据采取某个对应的操作算子。
需要注意的是窗口长度,和窗口移动速率需要是batch time的整数倍。
1. 创建topic
kafka-topics.sh --create --zookeeper 192.168.116.60:2181 --topic sparkKafkaDemo --partitions 1 --replication-factor 1
2. 开启生产者topic
kafka-console-producer.sh --topic sparkKafkaDemo --broker-list 127.0.0.1:9092
提醒:每次写程序需要修改 object类名、setAppName、 (ConsumerConfig.GROUP_ID_CONFIG, “kafkaGroup2”)
3. window(windowLength, slideInterval)
该操作由一个DStream对象调用,传入一个窗口长度参数,一个窗口移动速率参数,然后将当前时刻当前长度窗口中的元素取出形成一个新的DStream。
package cn.bright.kafka.Spark
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
/**
* @Author Bright
* @Date 2020/12/22
* @Description window
*/
object SparkWindowDemo {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo")
val streamingContext = new StreamingContext(conf,Seconds(2))
streamingContext.checkpoint("checkpoint")
val kafkaParams = Map(
(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup2")
)
val kafkaStream:InputDStream[ConsumerRecord[String,String]]
= KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
)
// 注意:窗口长度,窗口移动速率需要是batch time的整数倍
val numStream: DStream[(String, Int)] = kafkaStream.flatMap(line => line.value().toString.split("\\s+"))
.map((_, 1))
.window(Seconds(8),Seconds(6)) //流窗口 采集周期时间长度为8秒,窗口移动速率参数为6秒
numStream.print()
streamingContext.start()
streamingContext.awaitTermination()
}
}
3.1 测试
input:
output:
4. countByWindow(windowLength,slideInterval)
返回指定长度窗口中的元素个数。
注:需要设置checkpoint
package cn.bright.kafka.Spark
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* @Author Bright
* @Date 2020/12/22
* @Description 返回指定窗口元素的个数
*/
object SparkWindowDemo2 {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo2")
val streamingContext = new StreamingContext(conf,Seconds(2))
// 设置checkpoint idea项目根目录创建文件夹 checkpoint
streamingContext.checkpoint("checkpoint")
val kafkaParams = Map(
(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup2")
)
val kafkaStream:InputDStream[ConsumerRecord[String,String]]
= KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
)
val numStream: DStream[Long] = kafkaStream.flatMap(line => line.value().toString.split("\\s+"))
.map((_, 1))
.countByWindow(Seconds(8), Seconds(6))
numStream.print()
streamingContext.start()
streamingContext.awaitTermination()
}
}
4.1 测试
input:
>java
>java
>scala
>scala
output:
-------------------------------------------
Time: 1608721234000 ms
-------------------------------------------
1
-------------------------------------------
Time: 1608721240000 ms
-------------------------------------------
1
-------------------------------------------
Time: 1608721246000 ms
-------------------------------------------
2
5. countByValueAndWindow(windowLength,slideInterval, [numTasks])
统计当前时间窗口中元素值相同的元素的个数
注:需要设置checkpoint
package cn.bright.kafka.Spark
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* @Author Bright
* @Date 2020/12/22
* @Description
*/
object SparkWindowDemo3 {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo3")
val streamingContext = new StreamingContext(conf,Seconds(2))
// 设置checkpoint idea项目根目录创建文件夹 checkpoint
streamingContext.checkpoint("checkpoint")
val kafkaParams = Map(
(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup3")
)
val kafkaStream:InputDStream[ConsumerRecord[String,String]]
= KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
)
val numStream: DStream[(String,Long)] = kafkaStream
.flatMap(line => line.value().toString.split("\\s+"))
.countByValueAndWindow(Seconds(8), Seconds(6))
numStream.print()
streamingContext.start()
streamingContext.awaitTermination()
}
}
5.1 测试
input:
>java
>java
>java
>scala
>scala
>scala
output:
-------------------------------------------
Time: 1608721828000 ms
-------------------------------------------
(java,1)
-------------------------------------------
Time: 1608721834000 ms
-------------------------------------------
(scala,1)
(java,2)
-------------------------------------------
Time: 1608721840000 ms
-------------------------------------------
(scala,2)
6. reduceByWindow(func,windowLength,slideInterval)
在调用DStream上首先取窗口函数的元素形成新的DStream,然后在窗口元素形成的DStream上进行reduce。
package cn.bright.kafka.Spark
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* @Author Bright
* @Date 2020/12/22
* @Description
*/
object SparkWindowDemo4 {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo4")
val streamingContext = new StreamingContext(conf,Seconds(2))
streamingContext.checkpoint("checkpoint")
val kafkaParams = Map(
(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup4")
)
val kafkaStream:InputDStream[ConsumerRecord[String,String]]
= KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
)
val numStream: DStream[String] = kafkaStream
.flatMap(line => line.value().toString.split("\\s+"))
.reduceByWindow(_ + ":" + _, Seconds(8), Seconds(2))
numStream.print()
streamingContext.start()
streamingContext.awaitTermination()
}
}
6.1 测试
input:
>java
>java
>java
>scala
>scala
>scala
>scala
output:
-------------------------------------------
Time: 1608722104000 ms
-------------------------------------------
java:java:java
-------------------------------------------
Time: 1608722106000 ms
-------------------------------------------
java:java:java:scala:scala
-------------------------------------------
Time: 1608722108000 ms
-------------------------------------------
java:java:java:scala:scala:scala:scala
-------------------------------------------
Time: 1608722110000 ms
-------------------------------------------
java:java:java:scala:scala:scala:scala
-------------------------------------------
Time: 1608722112000 ms
-------------------------------------------
scala:scala:scala:scala
-------------------------------------------
Time: 1608722114000 ms
-------------------------------------------
scala:scala
7. reduceByKeyAndWindow(func,windowLength, slideInterval, [numTasks])
reduceByKeyAndWindow的数据源是基于该DStream的窗口长度中的所有数据进行计算。该操作有一个可选的并发数参数。
package cn.bright.kafka.Spark
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* @Author Bright
* @Date 2020/12/22
* @Description
*/
object SparkWindowDemo5 {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo5")
val streamingContext = new StreamingContext(conf,Seconds(2))
streamingContext.checkpoint("checkpoint")
val kafkaParams = Map(
(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup5")
)
val kafkaStream:InputDStream[ConsumerRecord[String,String]]
= KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
)
val numStream: DStream[(String,Int)] = kafkaStream
.flatMap(line => line.value().toString.split("\\s+"))
.map((_, 1))
//.reduceByKeyAndWindow((x:Int,y:Int)=>{x+y}, Seconds(8), Seconds(2))
.reduceByKeyAndWindow((x:Int,y:Int)=>{x+y}, (x:Int,y:Int)=>{x-y},Seconds(8), Seconds(2))
numStream.print()
streamingContext.start()
streamingContext.awaitTermination()
}
}
7.1 测试
input:
>java
>java
>java
>java
>scala
>scala
>scala
>scala
output:
-------------------------------------------
Time: 1608722410000 ms
-------------------------------------------
(java,3)
-------------------------------------------
Time: 1608722412000 ms
-------------------------------------------
(java,4)
-------------------------------------------
Time: 1608722414000 ms
-------------------------------------------
(scala,4)
(java,4)
-------------------------------------------
Time: 1608722416000 ms
-------------------------------------------
(scala,4)
(java,4)
-------------------------------------------
Time: 1608722418000 ms
-------------------------------------------
(scala,4)
(java,1)
-------------------------------------------
Time: 1608722420000 ms
-------------------------------------------
(scala,4)
(java,0)
-------------------------------------------
Time: 1608722422000 ms
-------------------------------------------
(scala,0)
(java,0)
8. transform:业务需求需要更改数据结构是可以使用transform完成转化工作
package cn.bright.kafka.Spark
import java.text.SimpleDateFormat
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* @Author Bright
* @Date 2020/12/22
* @Description
*/
object SparkWindowDemo6 {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo6")
val streamingContext = new StreamingContext(conf,Seconds(2))
streamingContext.checkpoint("checkpoint")
val kafkaParams = Map(
(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup6")
)
val kafkaStream:InputDStream[ConsumerRecord[String,String]]
= KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
)
// 业务需求需要更改数据结构是可以使用transform完成转化工作
val numStream: DStream[((String, String), Int)] = kafkaStream.transform((rdd, timestamp) => {
val format: SimpleDateFormat = new SimpleDateFormat("yyyyMMdd HH:mm:ss")
val time: String = format.format(timestamp.milliseconds)
val value: RDD[((String, String), Int)] = rdd.flatMap(x => x.value().split("\\s+"))
.map(x => ((x, time), 1))
.reduceByKey((x,y)=>x+y)
.sortBy(x=>x._2,false)
value
})
numStream.print()
streamingContext.start()
streamingContext.awaitTermination()
}
}
8.1 测试
input:
>java
>java
>java
>java
>scala
>scala
>scala
>scala
output:
-------------------------------------------
Time: 1608722714000 ms
-------------------------------------------
((java,20201223 19:25:14),4)
-------------------------------------------
Time: 1608722716000 ms
-------------------------------------------
-------------------------------------------
Time: 1608722718000 ms
-------------------------------------------
-------------------------------------------
Time: 1608722720000 ms
-------------------------------------------
((scala,20201223 19:25:20),4)
9. transform 转换成spark sql 进行流式处理
package cn.bright.kafka.Spark
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Dataset, Row, SQLContext}
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* @Author Bright
* @Date 2020/12/22
* @Description
*/
object SparkWindowDemo7 {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo7")
val streamingContext = new StreamingContext(conf,Seconds(2))
streamingContext.checkpoint("checkpoint")
val kafkaParams = Map(
(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup7")
)
val kafkaStream:InputDStream[ConsumerRecord[String,String]]
= KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
)
val numStream: DStream[Row] = kafkaStream.transform(rdd => {
val sqlContext: SQLContext = SQLContextSingleton.getInstance(rdd.sparkContext)
import sqlContext.implicits._
val words: RDD[String] = rdd.flatMap(_.value().toString.split("\\s+"))
val tupple2RDD: RDD[(String, Int)] = words.map((_, 1))
tupple2RDD.toDF("name", "cn").createOrReplaceTempView("tbwordcount")
val frame: DataFrame = sqlContext.sql("select name,count(cn) from tbwordcount group by name")
val dataset: Dataset[(String, Long)] = frame.map(row => {
val name: String = row.getAs[String]("name")
val count: Long = row.getAs[Long]("cn")
(name, count)
})
//dataset.rdd
frame.rdd
})
numStream.print()
streamingContext.start()
streamingContext.awaitTermination()
}
}
object SQLContextSingleton{
@transient private var instance:SQLContext=_
def getInstance(sparkContext: SparkContext):SQLContext={
synchronized(
if(instance==null){
instance=new SQLContext(sparkContext)
}
)
instance
}
}
9.1 测试
input:
>scala
>scala
>scala
>scala
>scala
>java
>java
>java
>java
>java
>java
output:
-------------------------------------------
Time: 1608723156000 ms
-------------------------------------------
[scala,5]
-------------------------------------------
Time: 1608723158000 ms
-------------------------------------------
-------------------------------------------
Time: 1608723160000 ms
-------------------------------------------
[java,4]
-------------------------------------------
Time: 1608723162000 ms
-------------------------------------------
[java,2]
10. SparkStreaming 的driver、executor的执行顺序以及 foreach 及 foreach partition 的效率对比
package cn.bright.kafka.Spark
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* @Author Bright
* @Date 2020/12/22
* @Description
*/
object SparkWindowDemo8 {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo8")
val streamingContext = new StreamingContext(conf,Seconds(2))
streamingContext.checkpoint("checkpoint")
val kafkaParams = Map(
(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
(ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup8")
)
val kafkaStream:InputDStream[ConsumerRecord[String,String]]
= KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
)
// println("driver")
// val wordStream: DStream[String] = kafkaStream.flatMap(
// line => {
// println("executor")
// line.value().toString.split("\\s+")
// }
// )
//
// wordStream.print()
// println("driver") // 1 只执行一次
// val wordStream: DStream[String] = kafkaStream.transform(
// (rdd) => {
// println("采集周期开始") // n次 数据采集周期
// //println(rdd)
// val value: RDD[String] = rdd.flatMap(
// x => {
// println("执行器executor") //数据进入n次 就执行n次
// x.value().split("\\s+")
// }
// )
// value
// }
// )
// wordStream.print()
//foreach 遍历每一个RDD中的元素 会出现driver很繁忙的情况
// println("driver")
// kafkaStream.foreachRDD(
// (rdd)=>{
// println("bb")
// rdd.foreach(
// x=>{
// println("cc")
// val strings: Array[String] = x.value().toString.split("\\s+")
// println(strings.toList)
// }
// )
// }
// )
//RDD foreachpartition 的效率比foreach高 但可能存在OOM的情况 内存溢出
println("driver")
kafkaStream.foreachRDD(
(rdd)=>{
println("bb")
rdd.foreachPartition(
x=>{
println("cc")
x.foreach(
y=>{
println("dd")
println(y.value().toString.split("\\s+"))
}
)
}
)
}
)
streamingContext.start()
streamingContext.awaitTermination()
}
}
10.1 测试
input:
>java
>java
>java
>java
>scala
>scala
>scala
>scala
>scala
output:
bb
cc
bb
cc
dd
[Ljava.lang.String;@fa3b8f8
dd
[Ljava.lang.String;@41adc6d
dd
[Ljava.lang.String;@1cb77cd4
dd
[Ljava.lang.String;@45bf48f6
dd
[Ljava.lang.String;@57c94cc1
bb
cc
dd
[Ljava.lang.String;@19903d1d
dd
[Ljava.lang.String;@46a045f3
bb
cc
dd
[Ljava.lang.String;@e7ed727
bb
cc
dd
[Ljava.lang.String;@30395d92
dd
[Ljava.lang.String;@57e4cb4d
bb
cc
dd
[Ljava.lang.String;@d402d9
dd
[Ljava.lang.String;@6c775000
dd
[Ljava.lang.String;@654faaab
bb
cc
maven依赖
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.6.6</version>
</dependency>
</dependencies>