spark 集成 kafka

1、spark消费kafka数据

       spark从topic的指定偏移量开始消费数据,指定后会覆盖参数设置中的配置   "auto.offset.reset" -> "earliest" 

val kafkaParams = Map[String, Object](
            "bootstrap.servers" -> "localhost:9092",
            "key.deserializer" -> classOf[StringDeserializer],
            "value.deserializer" -> classOf[StringDeserializer],
            "group.id" -> "test",
            "auto.offset.reset" -> "earliest",
            "enable.auto.commit" -> (false: java.lang.Boolean)
        )
        //从topic的指定偏移量开始消费
        val offsets = Map[TopicPartition, Long](
            new TopicPartition("topic-test",0)->60,
            new TopicPartition("topic-test",1)->20
        )
        val topics = Array("topic-test")
        val dStream: InputDStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream[String, String](
            ssc, LocationStrategies.PreferConsistent,Subscribe[String, String](topics, kafkaParams,offsets))

消费完后将偏移量记录在kafka的_consumer_offsets topic中

dStream.foreachRDD(rdd => {
            val offsets: Array[OffsetRange] = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
            //进行数据逻辑处理和保存输出
            rdd.foreach(l => println(l.value()))
            offsets.foreach(o => println(o.topic,o.partition,o.fromOffset,o.untilOffset))
            //保存偏移量到kafka
            dStream.asInstanceOf[CanCommitOffsets].commitAsync(offsets)
        })

2、spark数据写入kafka

为了可以在executor中通过producer发送数据到kafka,自定义可序列化的producer类

class KafkaSink[K, V](prop: Properties) extends Serializable {

    lazy val producer = new KafkaProducer[K, V](prop)

    def send(topic: String, key: K, value: V): Future[RecordMetadata] =
        producer.send(new ProducerRecord[K, V](topic, key, value))

    def send(topic: String, value: V): Future[RecordMetadata] =
        producer.send(new ProducerRecord[K, V](topic, value))

    def send(message: ProducerRecord[K, V]): Future[RecordMetadata] =
        producer.send(message)
}

将producer进行广播,在executor中就可以将数据写入kafka中

val prop = new Properties()
        prop.setProperty("bootstrap.servers", "localhost:9092")
        prop.setProperty("key.serializer", classOf[StringSerializer].getName)
        prop.setProperty("value.serializer", classOf[StringSerializer].getName)
        val producer: KafkaSink[String, String] = new KafkaSink[String, String](prop)
        val bc= sc.broadcast(producer)
        user_info.foreach(line => {
            /*val cols: Array[String] = line.split("\\|")
            val values = cols(1) + "," + cols(2) + "," + cols(3)
            val message = new ProducerRecord[String,String]("topic-test",cols(0),values)*/

            val message = new ProducerRecord[String,String]("yjs-test",null,line)
            //println("key:" + message.key() + "  value: " + message.value())
            bc.value.send(message)
        })

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值