kafka（十八）：Streaming消费多个topic实例，并分别处理对应消息

最新推荐文章于 2024-06-03 00:52:30 发布

RayBreslin

最新推荐文章于 2024-06-03 00:52:30 发布

阅读量4k

点赞数 3

分类专栏： kafka SparkStreaming 文章标签： kafka 多个topic

本文链接：https://blog.csdn.net/u010886217/article/details/103549856

版权

kafka 同时被 2 个专栏收录

38 篇文章 2 订阅

订阅专栏

SparkStreaming

32 篇文章 0 订阅

订阅专栏

一、实现功能

Streaming从Kafka中读取消息，而不同topic有可能会有不同的日志结构，需要依据不同的topic结构进行对应的处理。

二、环境

1.kafka_2.11-0.10.0.1

特别提醒：kafka_2.11-0.10.2.1好像有问题，Streaming创建Direct直接连接获取不到信息，一直报错，坑了两天尽量不要用！换了其他版本后kafka_2.11-0.10.0.1即可。

2.JDK1.8
3.Scala2.11.8
4.zookeeper 3.4.5-cdh5.7.0
5.cdh5.7.0

三、创建Kafka对应topic

1.kafka环境搭建和启动

参考：https://blog.csdn.net/u010886217/article/details/82973573

2.kafka创建三个topic

bin/kafka-topics.sh --create --zookeeper hadoop:2181/kafka10_01 --replication-factor 1 --partitions 1 --topic hello_topic

bin/kafka-topics.sh --create --zookeeper hadoop:2181/kafka10_01 --replication-factor 1 --partitions 1 --topic hello_topic2

bin/kafka-topics.sh --create --zookeeper hadoop:2181/kafka10_01 --replication-factor 1 --partitions 1 --topic hello_topic3

四、代码实现

1.依赖

  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.11</artifactId>
      <version>2.1.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
      <version>2.1.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-clients</artifactId>
      <version>0.10.0.1</version>
      <exclusions>
        <exclusion>
          <groupId>javax.servlet</groupId>
          <artifactId>servlet-api</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <!-- Spark SQL -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.1.0</version>
      <!--<scope>compile</scope>-->
    </dependency>
    <!-- Spark SQL -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>2.1.0</version>
      <!--<scope>compile</scope>-->
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.zookeeper/zookeeper -->
    <dependency>
      <groupId>org.apache.zookeeper</groupId>
      <artifactId>zookeeper</artifactId>
      <version>3.4.5-cdh5.7.0</version>
    </dependency>

    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>
    
  </dependencies>

2.代码实现

import java.io.File

import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.kafka010.HasOffsetRanges
import org.apache.spark.{SparkConf, SparkContext, TaskContext}
//import org.apache.spark.sql.SparkSession
//import org.apache.spark.streaming.kafka.KafkaUtils
//import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}

/**
  * Created by Administrator on 2019/12/7.
  */
object StreamingKafkaMutiTopics {
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.WARN)
    Logger.getLogger("org.apache.kafka.clients.consumer").setLevel(Level.WARN)
//    val warehouseLocation = new File("hdfs://cluster/hive/warehouse").getAbsolutePath
//    @transient
//    val spark = SparkSession
//      .builder()
//      .appName("Spark SQL To Hive")
//      .config("spark.sql.warehouse.dir", warehouseLocation)
//      .enableHiveSupport()
//      .getOrCreate()
//    spark.conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

    val sparkConfig=new SparkConf()
      .setAppName("mutiTopics")
      .setMaster("local[2]")
      .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    @transient
    val sc = new SparkContext(sparkConfig)
    val scc = new StreamingContext(sc, Seconds(1))
    val kafkaParams = Map[String, Object](
      "auto.offset.reset" -> "latest", //latest,earliest
      "value.deserializer" -> classOf[StringDeserializer]
      , "key.deserializer" -> classOf[StringDeserializer]
      , "bootstrap.servers" -> "hadoop01:9092"
      , "group.id" -> "test_jason"
      , "enable.auto.commit" -> (false: java.lang.Boolean)
    )

    var stream: InputDStream[ConsumerRecord[String, String]] = null
    val topics = Array("hello_topic","hello_topic2","hello_topic3")

    stream = KafkaUtils.createDirectStream[String, String](
      scc,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe[String, String](topics, kafkaParams)
    )

    stream.foreachRDD(rdd=>{
      if(!rdd.isEmpty()){
        val offsetRanges=rdd.asInstanceOf[HasOffsetRanges].offsetRanges
        rdd.foreachPartition(
          partition=> {
            val o=offsetRanges(TaskContext.get.partitionId)
            if(o.topic=="hello_topic"){
              //hello_topic 处理逻辑
              println("hello_topic logic:"+ s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}")
            }
            if(o.topic=="hello_topic2"){
              //hello_topic2 处理逻辑
              println("hello_topic2 logic:"+ s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}")
            }
            if(o.topic=="hello_topic3"){
              //hello_topic3 处理逻辑
              println("hello_topic3 logic:"+ s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}")
            }

          }
        )
      }

    })

    stream.map(record=>(record.key,record.value)).print()

    scc.start()
    scc.awaitTermination()
  }
}

3.开启消息生产者

bin/kafka-console-producer.sh --broker-list hadoop:9092 --topic hello_topic

bin/kafka-console-producer.sh --broker-list hadoop:9092 --topic hello_topic2

bin/kafka-console-producer.sh --broker-list hadoop:9092 --topic hello_topic3

输入测试数据，结果

4.代码实现结果

-------------------------------------------
Time: 1576397344000 ms
-------------------------------------------

hello_topic3 logic:hello_topic3 0 10 10
hello_topic2 logic:hello_topic2 0 26 27
hello_topic logic:hello_topic 0 2 2
-------------------------------------------
Time: 1576397345000 ms
-------------------------------------------
(null,sdf sdf sdf sdfwesdf sdf sdf sdfwe)

-------------------------------------------
Time: 1576397346000 ms
-------------------------------------------

五、参考

1.https://blog.csdn.net/xianpanjia4616/article/details/90081537

RayBreslin

关注

3
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
kafka（十八）：Streaming消费多个topic实例，并分别处理对应消息

一、实现功能Streaming从Kafka中读取消息，而不同topic有可能会有不同的日志结构，需要依据不同的topic结构进行对应的处理。二、环境1.kafka_2.11-0.10.0.1特别提醒：kafka_2.11-0.10.2.1好像有问题，Streaming创建Direct直接连接获取不到信息，一直报错，坑了两天尽量不要用！换了其他版本后kafka_2.11-0.10.0...
复制链接

扫一扫