获取指定时间的offsets类：GetOffsetShell 源码导读

最新推荐文章于 2023-07-28 08:51:16 发布

小胖头鱼

最新推荐文章于 2023-07-28 08:51:16 发布

阅读量2.9k

点赞数

分类专栏： kafka 文章标签： kafka

本文链接：https://blog.csdn.net/chilianyi/article/details/50947103

版权

kafka 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

总览：全程有两种rpc调用，server端就是kafka集群（通过给定的参数broker list进行连接），第一种只调用一次，获取topic的元数据信息：topicsMetadata，其中包含了该topic的partition数量及每个partition所在的brokerid（当然也能够找到该partition的leader）。第二种调用多次（次数=partition数，也就是每个partition需要调用一次）直接读取该partition的指定时间的offset。

// 第一次rpc，从集群中获取topic的元数据信息
val topicsMetadata = ClientUtils.fetchTopicMetadata(Set(topic), metadataTargetBrokers, clientId, maxWaitMs).topicsMetadata

// 获取该topic的所有partition id
val partitions =
  if(partitionList == "") {
    topicsMetadata.head.partitionsMetadata.map(_.partitionId)
  } else {
    partitionList.split(",").map(_.toInt).toSeq
  }
 
// 遍历所有partition
partitions.foreach { partitionId =>
  // 从第一次rpc获得的topic元数据信息中拿取该partition的元数据，包括这个partition所位于的broker。
  val partitionMetadataOpt = topicsMetadata.head.partitionsMetadata.find(_.partitionId == partitionId)
  partitionMetadataOpt match {
    case Some(metadata) =>
      // 获取该partition所处leader的broker信息
      metadata.leader match {
        case Some(leader) =>
          // 构建一个SimpleConsumer(该consumer不同于High consumer,不需要向zookeeper中注册group)
          val consumer = new SimpleConsumer(leader.host, leader.port, 10000, 100000, clientId)
          val topicAndPartition = TopicAndPartition(topic, partitionId)
          // OffsetRequest是一个拥有样例类的伴生对象，其样例类继承自RequestOrResponse，requestid设置为OffsetsKey，这个requestid在server端处理时会用到，用来获取对应的处理方法。
          val request = OffsetRequest(Map(topicAndPartition -> PartitionOffsetRequestInfo(time, nOffsets)))
          // 每个partition调用一次的第二种rpc，得到对应时间的offsets
          val offsets = consumer.getOffsetsBefore(request).partitionErrorAndOffsets(topicAndPartition).offsets

          println("%s:%d:%s".format(topic, partitionId, offsets.mkString(",")))
        case None => System.err.println("Error: partition %d does not have a leader. Skip getting offsets".format(partitionId))
      }
    case None => System.err.println("Error: partition %d does not exist".format(partitionId))
  }
}<span style="font-family: Arial, sans-serif; background-color: rgb(255, 255, 255);"> </span>

server端，kafka启动时会构建一个：KafkaRequestHandlerPool，从配置文件中读取配置属性：num.io.threads 默认值为8。每个线程都是一个KafkaRequestHandler线程，将KafkaApis这个方法作为参数传入，在线程处理中，每遇到request，就用KafkaApis的handle方法来处理。

该handle方法针对不同的requestid有不同的处理方法，而针对OffsetsKey调用的是handleOffsetRequest->fetchOffsets->fetchOffsetsBefore 到了这一步，发现实现是如此的简单，原来kafka定位时间的offset，仅仅是用的logsegment 的最后修改时间，如下图（以log结尾的就是一个logsegment，而index中则记录了该log的offset及message size，从而能够方便的定位到某个offset的具体message位置。有点跑题~~）后面贴上fetchOffsetsBefore的代码，很容易读懂，就是获取最后修改时间<timestamp参数，而下一个>timestamp参数的那个logsegment，直接返回这个logsegment的起始offset，over。

def fetchOffsetsBefore(log: Log, timestamp: Long, maxNumOffsets: Int): Seq[Long] = {
  val segsArray = log.logSegments.toArray
  var offsetTimeArray: Array[(Long, Long)] = null
  if(segsArray.last.size > 0)
    offsetTimeArray = new Array[(Long, Long)](segsArray.length + 1)
  else
    offsetTimeArray = new Array[(Long, Long)](segsArray.length)

  for(i <- 0 until segsArray.length)
    offsetTimeArray(i) = (segsArray(i).baseOffset, segsArray(i).lastModified)
  if(segsArray.last.size > 0)
    offsetTimeArray(segsArray.length) = (log.logEndOffset, SystemTime.milliseconds)

  var startIndex = -1
  timestamp match {
    case OffsetRequest.LatestTime =>
      startIndex = offsetTimeArray.length - 1
    case OffsetRequest.EarliestTime =>
      startIndex = 0
    case _ =>
      var isFound = false
      debug("Offset time array = " + offsetTimeArray.foreach(o => "%d, %d".format(o._1, o._2)))
      startIndex = offsetTimeArray.length - 1
      while (startIndex >= 0 && !isFound) {
        if (offsetTimeArray(startIndex)._2 <= timestamp)
          isFound = true
        else
          startIndex -=1
      }
  }

  val retSize = maxNumOffsets.min(startIndex + 1)
  val ret = new Array[Long](retSize)
  for(j <- 0 until retSize) {
    ret(j) = offsetTimeArray(startIndex)._1
    startIndex -= 1
  }
  // ensure that the returned seq is in descending order of offsets
  ret.toSeq.sortBy(- _)
}

kafka获取指定时间的offsets竟然使用了如此投机取巧的办法，不过如果通过良好的配置，还是可以接受的。

# log的最长轮转时间，也就是如果大小一直没有达到下一个条件，经过log.roll.hours，log也需要改名。
log.roll.hours=72  
# log即便没有达到上面一个时间的条件，当大小达到下面的条件，也需要改名。
log.segment.bytes=536870912

这两个配置选项可以针对具体的topic进行配置（可覆盖server端写死的配置）有了上面两个选项，可以方便的进行调整，比如改小准确率要求高的topic的轮转时间或大小等方式，视具体情况来定。

小胖头鱼

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录