SparkStreaming消费kafka中的数据(0.10版本kafka)

15 篇文章 1 订阅

自动维护offset

  • kafka0.10版本,topic的offset会维护在kafka的一个系统主题中:也就是__consumer_offsets中。

导入依赖

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.1.1</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.1.1</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
    <version>2.1.1</version>
</dependency>

代码

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Author:panghu
  * Date:2021-03-31
  * Description: spark streaming连接kafka-0.10
  * offset维护在kafka的主题 __consumer_offsets 中
  **/
object SparkStreaming01_Kafka010 {
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf().setAppName(this.getClass.getSimpleName).setMaster("local[*]")
    val ssc = new StreamingContext(conf, Seconds(3))
    //配置kafka连接参数
    val kafkaParams: Map[String, Object] = Map[String, Object](
      //kafka节点
      ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "hadoop01:9092,hadoop02:9092,hadoop03:9092",
      //消费者组
      ConsumerConfig.GROUP_ID_CONFIG -> "spark0331",
      //消息的key编解码方式
      //ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer",
      ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
      //消息的value编解码方式
      ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer]
    )

    //连接kafka,接收消息 泛型代表消息的kv类型
    val dataDStream: InputDStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream[String, String](
      ssc,
      //kafka分区分配策略,大多情况下都是PreferConsistent
      LocationStrategies.PreferConsistent,
      //消费者参数 泛型代表消息的kv类型,参数1是topic,参数2是kafka配置参数
      ConsumerStrategies.Subscribe[String, String](Set("SparkStreaming"), kafkaParams)
    )
    // 取出数据,处理数据
    val resDStream: DStream[(String, Int)] = dataDStream.map(_.value()).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
    resDStream.print()

    //开始采集
    ssc.start()
    //等待采集结束
    ssc.awaitTermination()

  }
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值