SparkStreaming中的Window操作(driver、executor的执行顺序以及 foreach 及 foreach partition 的效率)

窗口函数定义

就是在DStream流上,以一个可配置的长度为窗口,以一个可配置的速率向前移动窗口,根据窗口函数的具体内容,分别对当前窗口中的这一波数据采取某个对应的操作算子。

需要注意的是窗口长度,和窗口移动速率需要是batch time的整数倍。

在这里插入图片描述

1. 创建topic

kafka-topics.sh --create --zookeeper 192.168.116.60:2181 --topic sparkKafkaDemo  --partitions 1 --replication-factor 1

2. 开启生产者topic

kafka-console-producer.sh --topic  sparkKafkaDemo  --broker-list 127.0.0.1:9092

提醒:每次写程序需要修改 object类名、setAppName、 (ConsumerConfig.GROUP_ID_CONFIG, “kafkaGroup2”)

3. window(windowLength, slideInterval)

该操作由一个DStream对象调用,传入一个窗口长度参数,一个窗口移动速率参数,然后将当前时刻当前长度窗口中的元素取出形成一个新的DStream。

package cn.bright.kafka.Spark

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}

/**
  * @Author Bright
  * @Date 2020/12/22
  * @Description  window
  */
object SparkWindowDemo {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo")
    val streamingContext = new StreamingContext(conf,Seconds(2))

    streamingContext.checkpoint("checkpoint")

    val kafkaParams = Map(
      (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
      (ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup2")
    )


    val kafkaStream:InputDStream[ConsumerRecord[String,String]]
    = KafkaUtils.createDirectStream(
      streamingContext,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
    )

//  注意:窗口长度,窗口移动速率需要是batch time的整数倍
    val numStream: DStream[(String, Int)] = kafkaStream.flatMap(line => line.value().toString.split("\\s+"))
      .map((_, 1))
      .window(Seconds(8),Seconds(6))    //流窗口  采集周期时间长度为8秒,窗口移动速率参数为6秒

    numStream.print()

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

3.1 测试

input:
在这里插入图片描述
output:
在这里插入图片描述

4. countByWindow(windowLength,slideInterval)

返回指定长度窗口中的元素个数。
注:需要设置checkpoint

package cn.bright.kafka.Spark

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @Author Bright
  * @Date 2020/12/22
  * @Description  返回指定窗口元素的个数
  */
object SparkWindowDemo2 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo2")
    val streamingContext = new StreamingContext(conf,Seconds(2))
    
// 设置checkpoint   idea项目根目录创建文件夹 checkpoint
    streamingContext.checkpoint("checkpoint")

    val kafkaParams = Map(
      (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
      (ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup2")
    )


    val kafkaStream:InputDStream[ConsumerRecord[String,String]]
    = KafkaUtils.createDirectStream(
      streamingContext,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
    )

    val numStream: DStream[Long] = kafkaStream.flatMap(line => line.value().toString.split("\\s+"))
      .map((_, 1))
      .countByWindow(Seconds(8), Seconds(6))

    numStream.print()

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

4.1 测试

input:

>java
>java
>scala
>scala

output:

-------------------------------------------
Time: 1608721234000 ms
-------------------------------------------
1

-------------------------------------------
Time: 1608721240000 ms
-------------------------------------------
1

-------------------------------------------
Time: 1608721246000 ms
-------------------------------------------
2

5. countByValueAndWindow(windowLength,slideInterval, [numTasks])

统计当前时间窗口中元素值相同的元素的个数
注:需要设置checkpoint

package cn.bright.kafka.Spark

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @Author Bright
  * @Date 2020/12/22
  * @Description
  */
object SparkWindowDemo3 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo3")
    val streamingContext = new StreamingContext(conf,Seconds(2))

   // 设置checkpoint   idea项目根目录创建文件夹 checkpoint
    streamingContext.checkpoint("checkpoint")

    val kafkaParams = Map(
      (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
      (ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup3")
    )


    val kafkaStream:InputDStream[ConsumerRecord[String,String]]
    = KafkaUtils.createDirectStream(
      streamingContext,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
    )

    val numStream: DStream[(String,Long)] = kafkaStream
      .flatMap(line => line.value().toString.split("\\s+"))
      .countByValueAndWindow(Seconds(8), Seconds(6))

    numStream.print()

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

5.1 测试

input:

>java
>java
>java
>scala
>scala
>scala

output:

-------------------------------------------
Time: 1608721828000 ms
-------------------------------------------
(java,1)

-------------------------------------------
Time: 1608721834000 ms
-------------------------------------------
(scala,1)
(java,2)

-------------------------------------------
Time: 1608721840000 ms
-------------------------------------------
(scala,2)

6. reduceByWindow(func,windowLength,slideInterval)

在调用DStream上首先取窗口函数的元素形成新的DStream,然后在窗口元素形成的DStream上进行reduce。

package cn.bright.kafka.Spark

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @Author Bright
  * @Date 2020/12/22
  * @Description
  */
object SparkWindowDemo4 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo4")
    val streamingContext = new StreamingContext(conf,Seconds(2))

    streamingContext.checkpoint("checkpoint")

    val kafkaParams = Map(
      (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
      (ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup4")
    )


    val kafkaStream:InputDStream[ConsumerRecord[String,String]]
    = KafkaUtils.createDirectStream(
      streamingContext,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
    )

    val numStream: DStream[String] = kafkaStream
      .flatMap(line => line.value().toString.split("\\s+"))
      .reduceByWindow(_ + ":" + _, Seconds(8), Seconds(2))

    numStream.print()

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

6.1 测试

input:

>java
>java
>java
>scala
>scala
>scala
>scala

output:

-------------------------------------------
Time: 1608722104000 ms
-------------------------------------------
java:java:java

-------------------------------------------
Time: 1608722106000 ms
-------------------------------------------
java:java:java:scala:scala

-------------------------------------------
Time: 1608722108000 ms
-------------------------------------------
java:java:java:scala:scala:scala:scala

-------------------------------------------
Time: 1608722110000 ms
-------------------------------------------
java:java:java:scala:scala:scala:scala

-------------------------------------------
Time: 1608722112000 ms
-------------------------------------------
scala:scala:scala:scala

-------------------------------------------
Time: 1608722114000 ms
-------------------------------------------
scala:scala

7. reduceByKeyAndWindow(func,windowLength, slideInterval, [numTasks])

reduceByKeyAndWindow的数据源是基于该DStream的窗口长度中的所有数据进行计算。该操作有一个可选的并发数参数。

package cn.bright.kafka.Spark

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @Author Bright
  * @Date 2020/12/22
  * @Description
  */
object SparkWindowDemo5 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo5")
    val streamingContext = new StreamingContext(conf,Seconds(2))

    streamingContext.checkpoint("checkpoint")

    val kafkaParams = Map(
      (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
      (ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup5")
    )


    val kafkaStream:InputDStream[ConsumerRecord[String,String]]
    = KafkaUtils.createDirectStream(
      streamingContext,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
    )

    val numStream: DStream[(String,Int)] = kafkaStream
      .flatMap(line => line.value().toString.split("\\s+"))
      .map((_, 1))
      //.reduceByKeyAndWindow((x:Int,y:Int)=>{x+y}, Seconds(8), Seconds(2))
      .reduceByKeyAndWindow((x:Int,y:Int)=>{x+y}, (x:Int,y:Int)=>{x-y},Seconds(8), Seconds(2))


    numStream.print()

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

7.1 测试

input:

>java
>java
>java
>java
>scala
>scala
>scala
>scala

output:

-------------------------------------------
Time: 1608722410000 ms
-------------------------------------------
(java,3)

-------------------------------------------
Time: 1608722412000 ms
-------------------------------------------
(java,4)

-------------------------------------------
Time: 1608722414000 ms
-------------------------------------------
(scala,4)
(java,4)

-------------------------------------------
Time: 1608722416000 ms
-------------------------------------------
(scala,4)
(java,4)

-------------------------------------------
Time: 1608722418000 ms
-------------------------------------------
(scala,4)
(java,1)

-------------------------------------------
Time: 1608722420000 ms
-------------------------------------------
(scala,4)
(java,0)

-------------------------------------------
Time: 1608722422000 ms
-------------------------------------------
(scala,0)
(java,0)

8. transform:业务需求需要更改数据结构是可以使用transform完成转化工作

package cn.bright.kafka.Spark

import java.text.SimpleDateFormat

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @Author Bright
  * @Date 2020/12/22
  * @Description
  */
object SparkWindowDemo6 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo6")
    val streamingContext = new StreamingContext(conf,Seconds(2))

    streamingContext.checkpoint("checkpoint")

    val kafkaParams = Map(
      (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
      (ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup6")
    )


    val kafkaStream:InputDStream[ConsumerRecord[String,String]]
    = KafkaUtils.createDirectStream(
      streamingContext,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
    )

    // 业务需求需要更改数据结构是可以使用transform完成转化工作
    val numStream: DStream[((String, String), Int)] = kafkaStream.transform((rdd, timestamp) => {
      val format: SimpleDateFormat = new SimpleDateFormat("yyyyMMdd HH:mm:ss")
      val time: String = format.format(timestamp.milliseconds)
      val value: RDD[((String, String), Int)] = rdd.flatMap(x => x.value().split("\\s+"))
        .map(x => ((x, time), 1))
          .reduceByKey((x,y)=>x+y)
          .sortBy(x=>x._2,false)
      value
    })

    numStream.print()

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

8.1 测试

input:

>java
>java
>java
>java
>scala
>scala
>scala
>scala

output:

-------------------------------------------
Time: 1608722714000 ms
-------------------------------------------
((java,20201223 19:25:14),4)

-------------------------------------------
Time: 1608722716000 ms
-------------------------------------------

-------------------------------------------
Time: 1608722718000 ms
-------------------------------------------

-------------------------------------------
Time: 1608722720000 ms
-------------------------------------------
((scala,20201223 19:25:20),4)

9. transform 转换成spark sql 进行流式处理

package cn.bright.kafka.Spark

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Dataset, Row, SQLContext}
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @Author Bright
  * @Date 2020/12/22
  * @Description
  */
object SparkWindowDemo7 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo7")
    val streamingContext = new StreamingContext(conf,Seconds(2))

    streamingContext.checkpoint("checkpoint")

    val kafkaParams = Map(
      (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
      (ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup7")
    )


    val kafkaStream:InputDStream[ConsumerRecord[String,String]]
    = KafkaUtils.createDirectStream(
      streamingContext,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
    )

    val numStream: DStream[Row] = kafkaStream.transform(rdd => {
      val sqlContext: SQLContext = SQLContextSingleton.getInstance(rdd.sparkContext)
      import sqlContext.implicits._
      val words: RDD[String] = rdd.flatMap(_.value().toString.split("\\s+"))
      val tupple2RDD: RDD[(String, Int)] = words.map((_, 1))

      tupple2RDD.toDF("name", "cn").createOrReplaceTempView("tbwordcount")

      val frame: DataFrame = sqlContext.sql("select name,count(cn) from tbwordcount group by name")

      val dataset: Dataset[(String, Long)] = frame.map(row => {
        val name: String = row.getAs[String]("name")
        val count: Long = row.getAs[Long]("cn")
        (name, count)
      })
      
      //dataset.rdd
      frame.rdd
    })

    numStream.print()

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

object SQLContextSingleton{
  @transient private  var instance:SQLContext=_
  def getInstance(sparkContext: SparkContext):SQLContext={
    synchronized(
      if(instance==null){
        instance=new SQLContext(sparkContext)
      }
    )
    instance
  }
}

9.1 测试

input:

>scala
>scala
>scala
>scala
>scala
>java
>java
>java
>java
>java
>java

output:

-------------------------------------------
Time: 1608723156000 ms
-------------------------------------------
[scala,5]

-------------------------------------------
Time: 1608723158000 ms
-------------------------------------------

-------------------------------------------
Time: 1608723160000 ms
-------------------------------------------
[java,4]

-------------------------------------------
Time: 1608723162000 ms
-------------------------------------------
[java,2]

10. SparkStreaming 的driver、executor的执行顺序以及 foreach 及 foreach partition 的效率对比

package cn.bright.kafka.Spark

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @Author Bright
  * @Date 2020/12/22
  * @Description
  */
object SparkWindowDemo8 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[*]").setAppName("SparkWindowDemo8")
    val streamingContext = new StreamingContext(conf,Seconds(2))

    streamingContext.checkpoint("checkpoint")

    val kafkaParams = Map(
      (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "192.168.116.60:9092"),
      (ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer"),
      (ConsumerConfig.GROUP_ID_CONFIG, "kafkaGroup8")
    )


    val kafkaStream:InputDStream[ConsumerRecord[String,String]]
    = KafkaUtils.createDirectStream(
      streamingContext,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe(Set("sparkKafkaDemo"), kafkaParams)
    )

//    println("driver")
//    val wordStream: DStream[String] = kafkaStream.flatMap(
//      line => {
//        println("executor")
//        line.value().toString.split("\\s+")
//      }
//    )
//
//    wordStream.print()


//    println("driver")   // 1 只执行一次
//    val wordStream: DStream[String] = kafkaStream.transform(
//      (rdd) => {
//        println("采集周期开始")   // n次  数据采集周期
//        //println(rdd)
//        val value: RDD[String] = rdd.flatMap(
//          x => {
//            println("执行器executor")   //数据进入n次 就执行n次
//            x.value().split("\\s+")
//          }
//        )
//        value
//      }
//    )
//    wordStream.print()

    //foreach  遍历每一个RDD中的元素 会出现driver很繁忙的情况
//    println("driver")
//    kafkaStream.foreachRDD(
//      (rdd)=>{
//        println("bb")
//        rdd.foreach(
//          x=>{
//            println("cc")
//            val strings: Array[String] = x.value().toString.split("\\s+")
//            println(strings.toList)
//          }
//        )
//      }
//    )


    //RDD foreachpartition  的效率比foreach高 但可能存在OOM的情况  内存溢出
    println("driver")
    kafkaStream.foreachRDD(
      (rdd)=>{
        println("bb")
        rdd.foreachPartition(
          x=>{
            println("cc")
            x.foreach(
              y=>{
                println("dd")
                println(y.value().toString.split("\\s+"))
              }
            )
          }
        )
      }
    )

    streamingContext.start()
    streamingContext.awaitTermination()
  }
}

10.1 测试

input:

>java
>java
>java
>java
>scala
>scala
>scala
>scala
>scala

output:

bb
cc
bb
cc
dd
[Ljava.lang.String;@fa3b8f8
dd
[Ljava.lang.String;@41adc6d
dd
[Ljava.lang.String;@1cb77cd4
dd
[Ljava.lang.String;@45bf48f6
dd
[Ljava.lang.String;@57c94cc1
bb
cc
dd
[Ljava.lang.String;@19903d1d
dd
[Ljava.lang.String;@46a045f3
bb
cc
dd
[Ljava.lang.String;@e7ed727
bb
cc
dd
[Ljava.lang.String;@30395d92
dd
[Ljava.lang.String;@57e4cb4d
bb
cc
dd
[Ljava.lang.String;@d402d9
dd
[Ljava.lang.String;@6c775000
dd
[Ljava.lang.String;@654faaab
bb
cc

maven依赖

<dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.12</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka_2.11</artifactId>
      <version>2.0.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-streams</artifactId>
      <version>2.0.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.4.5</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.11</artifactId>
      <version>2.4.5</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
      <version>2.4.5</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.4.5</version>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
      <version>2.6.6</version>
    </dependency>
  </dependencies>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值