偏移量保存到数据库
一、版本问题
由于kafka升级到2.0.0不得不向上兼容,之前kafka1.0.0的接口已经完全不适应上个工具,重写偏移量维护
Kafka1.0.x偏移量管理以及维护思路使用等的问题看上一篇文章
https://blog.csdn.net/qq_41922058/article/details/86478250
二、代码改动
比较kafka-1.0.x版本,需要改动获取偏移量范围的方法,增加一个方法
def listPartitionInfos()
/**
* 获取当前主题的偏移量范围
*
* @param topicName The Topic Name
* @param MinOrMax
* @return
*/
def getTopicOffset(topicName: String, MinOrMax: Int): Map[TopicPartition, Long] = {
val parser = new OptionParser(false)
val brokerList = brokerListOpt
ToolsUtils.validatePortOrDie(parser, brokerList)
val topic = topicName
val partitionIdsRequested: Set[Int] = Set.empty
val listOffsetsTimestamp = MinOrMax.toLong.longValue()
val config = new Properties
config.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList)
val consumer = new KafkaConsumer(config, new ByteArrayDeserializer, new ByteArrayDeserializer)
val partitionInfos = listPartitionInfos(consumer, topic, partitionIdsRequested) match {
case None =>
System.err.println(s"Topic $topic does not exist")
Exit.exit(1)
case Some(p) if p.isEmpty =>
if (partitionIdsRequested.isEmpty)
System.err.println(s"Topic $topic has 0 partitions")
else
System.err.println(s"Topic $topic does not have any of the requested partitions ${partitionIdsRequested.mkString(",")}")
Exit.exit(1)
case Some(p) => p
}
if (partitionIdsRequested.nonEmpty) {
(partitionIdsRequested -- partitionInfos.map(_.partition)).foreach { partitionId =>
System.err.println(s"Error: partition $partitionId does not exist")
}
}
val topicPartitions = partitionInfos.sortBy(_.partition).flatMap { p =>
if (p.leader == null) {
System.err.println(s"Error: partition ${p.partition} does not have a leader. Skip getting offsets")
None
} else
Some(new TopicPartition(p.topic, p.partition))
}
/* Note that the value of the map can be null */
val partitionOffsets: collection.Map[TopicPartition, java.lang.Long] = listOffsetsTimestamp match {
case ListOffsetRequest.EARLIEST_TIMESTAMP => consumer.beginningOffsets(topicPartitions.asJava).asScala
case ListOffsetRequest.LATEST_TIMESTAMP => consumer.endOffsets(topicPartitions.asJava).asScala
case _ =>
val timestampsToSearch = topicPartitions.map(tp => tp -> (listOffsetsTimestamp: java.lang.Long)).toMap.asJava
consumer.offsetsForTimes(timestampsToSearch).asScala.mapValues(x => if (x == null) null else x.offset)
}
val fromOffsets = collection.mutable.HashMap.empty[TopicPartition, Long]
partitionOffsets.toSeq.sortBy { case (tp, _) => tp.partition }.foreach { case (tp, offset) =>
fromOffsets += (new TopicPartition(topic, tp.partition.toInt) -> Option(offset).getOrElse("").toString.toLong)
}
fromOffsets.toMap
}
/**
* Return the partition infos for `topic`. If the topic does not exist, `None` is returned.
*/
private def listPartitionInfos(consumer: KafkaConsumer[_, _], topic: String, partitionIds: Set[Int]): Option[Seq[PartitionInfo]] = {
val partitionInfos = consumer.listTopics.asScala.filterKeys(_ == topic).values.flatMap(_.asScala).toBuffer
if (partitionInfos.isEmpty)
None
else if (partitionIds.isEmpty)
Some(partitionInfos)
else
Some(partitionInfos.filter(p => partitionIds.contains(p.partition)))
}
三、方法调用
上篇文章只介绍了,如何配置工具的存储层,这篇文章详细介绍下spark driver端如何调用
首先是获取当前主题当前分区的偏移量
val fromOffsets = OffsetUtils.getLastCommittedOffsets(topics, group_id)
因为OffsetUtils已经自动帮忙矫正了偏移量,所以此处createDirectStream
时候不需要进行判断偏移量的有效性,直接创建
KafkaUtils.createDirectStream[String, String](streamingContext, PreferConsistent, Subscribe[String, String](topics, kafkaParams, fromOffsets))
接下来遍历 Dstream
stream.foreachRDD(rdd => {
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
//此处的rdd一定不能进行操作
/**
*此处业务逻辑
*/
//偏移量同步更新
rdd.foreachPartition(partition => {
val con = OffsetUtils.getConn()
try {
con.setAutoCommit(false)
val offsetRange = offsetRanges(TaskContext.get.partitionId)
val update_offset = "UPDATE spark_offsets_manager SET lastsaveoffsets = ? WHERE topics = ? and partitions = ? and groups = ?"
val state = con.prepareStatement(update_offset)
state.setLong(1, offsetRange.untilOffset)
state.setString(2, offsetRange.topic)
state.setInt(3, offsetRange.partition)
state.setString(4, group_id)
state.execute()
LOG.debug("topic :" + offsetRange.topic + " partition : " + offsetRange.partition + " fromOffset: " + offsetRange.fromOffset + " untilOffset: " + offsetRange.untilOffset)
con.commit()
LOG.info("Successful offset update!")
} finally {
con.close()
}
})
偏移量同步更新还可以使用下面的方式,对offsetRanges进行遍历,就免去了foreachPartition
for (offsetRange <- offsetRanges) {
val con = OffsetUtils.getConn()
try {
con.setAutoCommit(false)
val update_offset = "UPDATE spark_offsets_manager SET lastsaveoffsets = ? WHERE topics = ? and partitions = ? and groups = ?"
val state = con.prepareStatement(update_offset)
state.setLong(1, offsetRange.untilOffset)
state.setString(2, offsetRange.topic)
state.setInt(3, offsetRange.partition)
state.setString(4, group_id)
state.execute()
LOG.debug("topic :" + offsetRange.topic + " partition : " + offsetRange.partition + " fromOffset: " + offsetRange.fromOffset + " untilOffset: " + offsetRange.untilOffset)
con.commit()
LOG.info("Successful offset update!")
} finally {
con.close()
}
}
欢迎扫码进群,期待更优秀的你!