SparkStreaming+Kafka2.0.0多主题多分区偏移量维护

5 篇文章 0 订阅
4 篇文章 0 订阅

偏移量保存到数据库

一、版本问题

由于kafka升级到2.0.0不得不向上兼容,之前kafka1.0.0的接口已经完全不适应上个工具,重写偏移量维护
Kafka1.0.x偏移量管理以及维护思路使用等的问题看上一篇文章
https://blog.csdn.net/qq_41922058/article/details/86478250

二、代码改动

比较kafka-1.0.x版本,需要改动获取偏移量范围的方法,增加一个方法
def listPartitionInfos()

/**
  * 获取当前主题的偏移量范围
  *
  * @param topicName The Topic Name
  * @param MinOrMax
  * @return
  */
def getTopicOffset(topicName: String, MinOrMax: Int): Map[TopicPartition, Long] = {
  val parser = new OptionParser(false)
  val brokerList = brokerListOpt
  ToolsUtils.validatePortOrDie(parser, brokerList)
  val topic = topicName
  val partitionIdsRequested: Set[Int] = Set.empty
  val listOffsetsTimestamp = MinOrMax.toLong.longValue()

  val config = new Properties
  config.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList)
  val consumer = new KafkaConsumer(config, new ByteArrayDeserializer, new ByteArrayDeserializer)
  val partitionInfos = listPartitionInfos(consumer, topic, partitionIdsRequested) match {
    case None =>
      System.err.println(s"Topic $topic does not exist")
      Exit.exit(1)
    case Some(p) if p.isEmpty =>
      if (partitionIdsRequested.isEmpty)
        System.err.println(s"Topic $topic has 0 partitions")
      else
        System.err.println(s"Topic $topic does not have any of the requested partitions ${partitionIdsRequested.mkString(",")}")
      Exit.exit(1)
    case Some(p) => p
  }
  if (partitionIdsRequested.nonEmpty) {
    (partitionIdsRequested -- partitionInfos.map(_.partition)).foreach { partitionId =>
      System.err.println(s"Error: partition $partitionId does not exist")
    }
  }
  val topicPartitions = partitionInfos.sortBy(_.partition).flatMap { p =>
    if (p.leader == null) {
      System.err.println(s"Error: partition ${p.partition} does not have a leader. Skip getting offsets")
      None
    } else
      Some(new TopicPartition(p.topic, p.partition))
  }
  /* Note that the value of the map can be null */
  val partitionOffsets: collection.Map[TopicPartition, java.lang.Long] = listOffsetsTimestamp match {
    case ListOffsetRequest.EARLIEST_TIMESTAMP => consumer.beginningOffsets(topicPartitions.asJava).asScala
    case ListOffsetRequest.LATEST_TIMESTAMP => consumer.endOffsets(topicPartitions.asJava).asScala
    case _ =>
      val timestampsToSearch = topicPartitions.map(tp => tp -> (listOffsetsTimestamp: java.lang.Long)).toMap.asJava
      consumer.offsetsForTimes(timestampsToSearch).asScala.mapValues(x => if (x == null) null else x.offset)
  }
  val fromOffsets = collection.mutable.HashMap.empty[TopicPartition, Long]
  partitionOffsets.toSeq.sortBy { case (tp, _) => tp.partition }.foreach { case (tp, offset) =>
    fromOffsets += (new TopicPartition(topic, tp.partition.toInt) -> Option(offset).getOrElse("").toString.toLong)
  }
  fromOffsets.toMap
}
/**
  * Return the partition infos for `topic`. If the topic does not exist, `None` is returned.
  */
private def listPartitionInfos(consumer: KafkaConsumer[_, _], topic: String, partitionIds: Set[Int]): Option[Seq[PartitionInfo]] = {
  val partitionInfos = consumer.listTopics.asScala.filterKeys(_ == topic).values.flatMap(_.asScala).toBuffer
  if (partitionInfos.isEmpty)
    None
  else if (partitionIds.isEmpty)
    Some(partitionInfos)
  else
    Some(partitionInfos.filter(p => partitionIds.contains(p.partition)))
}

三、方法调用

上篇文章只介绍了,如何配置工具的存储层,这篇文章详细介绍下spark driver端如何调用
首先是获取当前主题当前分区的偏移量

val fromOffsets = OffsetUtils.getLastCommittedOffsets(topics, group_id)

因为OffsetUtils已经自动帮忙矫正了偏移量,所以此处createDirectStream
时候不需要进行判断偏移量的有效性,直接创建

KafkaUtils.createDirectStream[String, String](streamingContext, PreferConsistent, Subscribe[String, String](topics, kafkaParams, fromOffsets))

接下来遍历 Dstream

stream.foreachRDD(rdd => {
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
//此处的rdd一定不能进行操作
/**
*此处业务逻辑
*/
//偏移量同步更新
rdd.foreachPartition(partition => {
  val con = OffsetUtils.getConn()
  try {
    con.setAutoCommit(false)
    val offsetRange = offsetRanges(TaskContext.get.partitionId)
    val update_offset = "UPDATE spark_offsets_manager SET lastsaveoffsets = ? WHERE topics = ? and partitions = ? and groups = ?"
    val state = con.prepareStatement(update_offset)
    state.setLong(1, offsetRange.untilOffset)
    state.setString(2, offsetRange.topic)
    state.setInt(3, offsetRange.partition)
    state.setString(4, group_id)
    state.execute()
    LOG.debug("topic :" + offsetRange.topic + " partition : " + offsetRange.partition + " fromOffset:  " + offsetRange.fromOffset + " untilOffset: " + offsetRange.untilOffset)
    con.commit()
    LOG.info("Successful offset update!")
  } finally {
    con.close()
  }
})

偏移量同步更新还可以使用下面的方式,对offsetRanges进行遍历,就免去了foreachPartition

for (offsetRange <- offsetRanges) {
    val con = OffsetUtils.getConn()
  try {
    con.setAutoCommit(false)
    val update_offset = "UPDATE spark_offsets_manager SET lastsaveoffsets = ? WHERE topics = ? and partitions = ? and groups = ?"
    val state = con.prepareStatement(update_offset)
    state.setLong(1, offsetRange.untilOffset)
    state.setString(2, offsetRange.topic)
    state.setInt(3, offsetRange.partition)
    state.setString(4, group_id)
    state.execute()
    LOG.debug("topic :" + offsetRange.topic + " partition : " + offsetRange.partition + " fromOffset:  " + offsetRange.fromOffset + " untilOffset: " + offsetRange.untilOffset)
    con.commit()
    LOG.info("Successful offset update!")
  } finally {
    con.close()
   }
 }

在这里插入图片描述
欢迎扫码进群,期待更优秀的你!

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值