kafka(十八):Streaming消费多个topic实例,并分别处理对应消息

一、实现功能

Streaming从Kafka中读取消息,而不同topic有可能会有不同的日志结构,需要依据不同的topic结构进行对应的处理。

二、环境

1.kafka_2.11-0.10.0.1

特别提醒:kafka_2.11-0.10.2.1好像有问题,Streaming创建Direct直接连接获取不到信息,一直报错,坑了两天尽量不要用!换了其他版本后kafka_2.11-0.10.0.1即可。

2.JDK1.8
3.Scala2.11.8
4.zookeeper 3.4.5-cdh5.7.0
5.cdh5.7.0

三、创建Kafka对应topic

1.kafka环境搭建和启动

参考:https://blog.csdn.net/u010886217/article/details/82973573

2.kafka创建三个topic

bin/kafka-topics.sh --create --zookeeper hadoop:2181/kafka10_01 --replication-factor 1 --partitions 1 --topic hello_topic

bin/kafka-topics.sh --create --zookeeper hadoop:2181/kafka10_01 --replication-factor 1 --partitions 1 --topic hello_topic2

bin/kafka-topics.sh --create --zookeeper hadoop:2181/kafka10_01 --replication-factor 1 --partitions 1 --topic hello_topic3

四、代码实现

1.依赖

  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.11</artifactId>
      <version>2.1.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
      <version>2.1.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-clients</artifactId>
      <version>0.10.0.1</version>
      <exclusions>
        <exclusion>
          <groupId>javax.servlet</groupId>
          <artifactId>servlet-api</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <!-- Spark SQL -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.1.0</version>
      <!--<scope>compile</scope>-->
    </dependency>
    <!-- Spark SQL -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>2.1.0</version>
      <!--<scope>compile</scope>-->
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.zookeeper/zookeeper -->
    <dependency>
      <groupId>org.apache.zookeeper</groupId>
      <artifactId>zookeeper</artifactId>
      <version>3.4.5-cdh5.7.0</version>
    </dependency>

    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>
    
  </dependencies>

2.代码实现

import java.io.File

import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.kafka010.HasOffsetRanges
import org.apache.spark.{SparkConf, SparkContext, TaskContext}
//import org.apache.spark.sql.SparkSession
//import org.apache.spark.streaming.kafka.KafkaUtils
//import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}

/**
  * Created by Administrator on 2019/12/7.
  */
object StreamingKafkaMutiTopics {
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.WARN)
    Logger.getLogger("org.apache.kafka.clients.consumer").setLevel(Level.WARN)
//    val warehouseLocation = new File("hdfs://cluster/hive/warehouse").getAbsolutePath
//    @transient
//    val spark = SparkSession
//      .builder()
//      .appName("Spark SQL To Hive")
//      .config("spark.sql.warehouse.dir", warehouseLocation)
//      .enableHiveSupport()
//      .getOrCreate()
//    spark.conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

    val sparkConfig=new SparkConf()
      .setAppName("mutiTopics")
      .setMaster("local[2]")
      .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    @transient
    val sc = new SparkContext(sparkConfig)
    val scc = new StreamingContext(sc, Seconds(1))
    val kafkaParams = Map[String, Object](
      "auto.offset.reset" -> "latest", //latest,earliest
      "value.deserializer" -> classOf[StringDeserializer]
      , "key.deserializer" -> classOf[StringDeserializer]
      , "bootstrap.servers" -> "hadoop01:9092"
      , "group.id" -> "test_jason"
      , "enable.auto.commit" -> (false: java.lang.Boolean)
    )

    var stream: InputDStream[ConsumerRecord[String, String]] = null
    val topics = Array("hello_topic","hello_topic2","hello_topic3")

    stream = KafkaUtils.createDirectStream[String, String](
      scc,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe[String, String](topics, kafkaParams)
    )

    stream.foreachRDD(rdd=>{
      if(!rdd.isEmpty()){
        val offsetRanges=rdd.asInstanceOf[HasOffsetRanges].offsetRanges
        rdd.foreachPartition(
          partition=> {
            val o=offsetRanges(TaskContext.get.partitionId)
            if(o.topic=="hello_topic"){
              //hello_topic 处理逻辑
              println("hello_topic logic:"+ s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}")
            }
            if(o.topic=="hello_topic2"){
              //hello_topic2 处理逻辑
              println("hello_topic2 logic:"+ s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}")
            }
            if(o.topic=="hello_topic3"){
              //hello_topic3 处理逻辑
              println("hello_topic3 logic:"+ s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}")
            }

          }
        )
      }

    })

    stream.map(record=>(record.key,record.value)).print()

    scc.start()
    scc.awaitTermination()
  }
}

3.开启消息生产者

bin/kafka-console-producer.sh --broker-list hadoop:9092 --topic hello_topic

bin/kafka-console-producer.sh --broker-list hadoop:9092 --topic hello_topic2

bin/kafka-console-producer.sh --broker-list hadoop:9092 --topic hello_topic3

输入测试数据,结果

4.代码实现结果

-------------------------------------------
Time: 1576397344000 ms
-------------------------------------------

hello_topic3 logic:hello_topic3 0 10 10
hello_topic2 logic:hello_topic2 0 26 27
hello_topic logic:hello_topic 0 2 2
-------------------------------------------
Time: 1576397345000 ms
-------------------------------------------
(null,sdf sdf sdf sdfwesdf sdf sdf sdfwe)

-------------------------------------------
Time: 1576397346000 ms
-------------------------------------------

五、参考

1.https://blog.csdn.net/xianpanjia4616/article/details/90081537

  • 3
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值