SparkStreaming+Kafka1.0.x多主题多分区偏移量维护

5 篇文章 0 订阅
4 篇文章 0 订阅

偏移量保存到数据库

一、版本区别

之前版本的kafka偏移量都是保存在kafka中的,而现在的kafka偏移量保存在了自己的一个特殊主题__consumer__offsets中

二、维护思路

根据传入的主题以及消费者组,先判断库中是否存在当前消费者组的消费记录,如果不存在,则证明为第一次消费,获取主题每分区当前的偏移量保存入库,如果存在,则读取库中各分区偏移量字段,封装为MAP,传入创建Dstream函数中创建离散流。当spark流中每一个spark任务完成之后,同步更新库中偏移量字段,完成偏移量提交。
可能存在的问题,如果任务停止时间过长,当前库中的偏移量已经不存在kafka缓冲区中此时爆出异常OffsetOutofRangeException,为了避免异常出现,需要每次启动创建流的时候,判断当前的偏移量是否存在kafka中,如果不存在则自动矫正

三、代码实现

首先我们需要实现一个方法获取当前偏移量的最小值

1.获取偏移量范围

def getTopicOffset(topicName: String, MinOrMax: Int): Map[TopicPartition, Long] = {
  val parser = new OptionParser(false)
  val clientId = "GetOffset"
  val brokerList = brokerListOpt
  ToolsUtils.validatePortOrDie(parser, brokerList)
  val metadataTargetBrokers = ClientUtils.parseBrokerList(brokerList)
  val topic = topicName
  val time = MinOrMax
  val topicsMetadata = ClientUtils.fetchTopicMetadata(Set(topic), metadataTargetBrokers, clientId, 1000).topicsMetadata
  if (topicsMetadata.size != 1 || !topicsMetadata.head.topic.equals(topic)) {
    System.err.println(("Error: no valid topic metadata for topic: %s, " + " probably the topic does not exist, run ").format(topic) +
      "kafka-list-topic.sh to verify")
    Exit.exit(1)
  }
  val partitions = topicsMetadata.head.partitionsMetadata.map(_.partitionId)
  val fromOffsets = collection.mutable.HashMap.empty[TopicPartition, Long]
  partitions.foreach { partitionId =>
    val partitionMetadataOpt = topicsMetadata.head.partitionsMetadata.find(_.partitionId == partitionId)
    partitionMetadataOpt match {
      case Some(metadata) =>
        metadata.leader match {
          case Some(leader) =>
            val consumer = new SimpleConsumer(leader.host, leader.port, 10000, 100000, clientId)
            val topicAndPartition = TopicAndPartition(topic, partitionId)
            val request = OffsetRequest(Map(topicAndPartition -> PartitionOffsetRequestInfo(time, 1)))
            val offsets = consumer.getOffsetsBefore(request).partitionErrorAndOffsets(topicAndPartition).offsets
            fromOffsets += (new TopicPartition(topic, partitionId.toInt) -> offsets.mkString(",").toLong)
          case None => System.err.println("Error: partition %d does not have a leader. Skip getting offsets".format(partitionId))
        }
      case None => System.err.println("Error: partition %d does not exist".format(partitionId))
    }
  }
  fromOffsets.toMap
}

参数说明
topicName: String,当前主题的名字
MinOrMax: Int,-1当前主题偏移量最大值,-2当前主题偏移量最小值

2.获取该消费者组最后提交的偏移量并且自动矫正偏移量

def getLastCommittedOffsets(topicName: Array[String], groups: String): Map[TopicPartition, Long] = {
  val toplen = topicName.size
  if (LOG.isInfoEnabled())
    LOG.info("||--Topic:{},getLastCommittedOffsets from PGSQL By JINGXI--||", topicName)
  //从PGSQL获取上一次存的Offset
  //根据主题获取数据库中保存的偏移量
  var sql_str = "SELECT * FROM spark_offsets_manager where groups = ? and topics = ?"
  for (x <- 0 until toplen - 1) {
    sql_str += "or topics = ?"
  }
  val conn = getConn()
  //开启事务
  conn.setAutoCommit(false)
  val fromOffsets = collection.mutable.HashMap.empty[TopicPartition, Long]
  //判断当前组主题是否存在
  try {
    for (x <- 0 until toplen) {
      val statement = conn.prepareStatement("SELECT * FROM spark_offsets_manager where groups = ? and topics = ?")
      statement.setString(1, groups)
      statement.setString(2, topicName(x))
      val result = statement.executeQuery()
      //        println(result.next())
      if (!result.next()) {
        val a = getTopicOffset(topicName(x), -2)
        val tops = a.keys.mkString(",").split(",")
        for (m <- 0 until tops.size) {
          val partition = tops(m).split("-")(1)
          val statement = conn.prepareStatement("INSERT INTO spark_offsets_manager (topics,partitions,lastsaveoffsets,groups) VALUES(?,?,?,?)")
          statement.setString(1, topicName(x))
          statement.setInt(2, partition.toInt)
          statement.setLong(3, 0L)
          statement.setString(4, groups)
          statement.execute()
          conn.commit()
        }
      }
    }
  }
  try {
    val statement = conn.prepareStatement(sql_str)
    statement.setString(1, groups)
    for (x <- 0 until toplen) {
      statement.setString(x + 2, topicName(x))
    }
    // Execute Query
    val rs = statement.executeQuery()
    var columnCount = rs.getMetaData().getColumnCount();
    while (rs.next) {

      val topic = rs.getString("topics")
      val partition = rs.getString("partitions")
      val lastsaveoffset = rs.getString("lastsaveoffsets")
      val minOffset = getTopicOffset(topic, -2).get(new TopicPartition(topic, partition.toInt)).mkString(",").toLong
      val lastOffset = if (lastsaveoffset == null) {

        val statement = conn.prepareStatement("UPDATE spark_offsets_manager SET lastsaveoffsets = ? WHERE topics = ? and partitions = ? and groups = ?")
        statement.setLong(1, minOffset)
        statement.setString(2, topic)
        statement.setInt(3, partition.toInt)
        statement.setString(4, groups)
        statement.execute()
        minOffset
      } else if (lastsaveoffset.toLong < minOffset) {
        val statement = conn.prepareStatement("UPDATE spark_offsets_manager SET lastsaveoffsets = ? WHERE topics = ? and partitions = ? and groups = ?")
        statement.setLong(1, minOffset)
        statement.setString(2, topic)
        statement.setInt(3, partition.toInt)
        statement.setString(4, groups)
        statement.execute()
        minOffset
      } else lastsaveoffset.toLong
      fromOffsets += (new TopicPartition(topic, partition.toInt) -> lastOffset.toLong)
      //        println(topic + ","+partition + "," + lastOffset)
    }
    conn.commit()
  } finally {
    conn.close
  }
  fromOffsets.toMap
}

代码优化没有做到最有,但是逻辑已经实现,各位大佬可以自由发挥。

3.获取数据库连接方法

def getConn(): Connection = {
  val conn = DatabaseUtils.getConn()
  conn
}

可以自行封装一个DatabaseUtils,我使用的是pgsql,可以根据自己业务需求,选择合适的数据库,当然如果想保存在非数据库中,逻辑需要自己实现

四、库表字段

在这里插入图片描述
可以在此字段的基础上进行扩展
在这里插入图片描述
欢迎扫码进群,期待更优秀的你!

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值