kafka2.2源码分析之handleOffsetForLeaderEpochRequest

最新推荐文章于 2024-01-13 23:47:06 发布

zhifeng687

最新推荐文章于 2024-01-13 23:47:06 发布

阅读量894

点赞数

分类专栏： kafka

本文链接：https://blog.csdn.net/qq_26222859/article/details/52998121

版权

kafka 专栏收录该内容

23 篇文章 0 订阅

订阅专栏

概述

OffsetForLeaderEpoch api仅用于内部broker之间的通信，并且要求集群的许可。在KIP-320中，consumer端也使用这个api去检查在leader change后是否发生日志截断。

首先，我将通过一个例子来说明follower副本向leader副本请求OffsetForLeaderEpoch的过程

A(leader, epoch=1): 1, 2, 3, 4, 5, 6
A cache: leaderEpoch = 1, startOffset = 1
B(follower): 1, 2, 3, 4
B cache: leaderEpoch = 1, startOffset = 1
=============================================

B(leader, epoch=2): 1, 2, 3, 4, 5, 6, 7
B cache:
leaderEpoch = 1, startOffset = 1
leaderEpoch = 2, startOffset = 5、
A挂掉后，B成为新leader，A又恢复过来，此时追加了新数据，B的leaderEpochCache增加了新条目（leaderEpoch=2, startOffset=5）。
当A请求复制B时，请求的epoch为1，B查询到epoch=2（比1大的最小epoch），然后返回对应的startOffset=5，A收到后truncate自己>=5的记录（这里是offset=5和6），然后把请求的offset更新为5，重新复制数据，B返回数据（offset=5, 6 和7，epoch=2），A追加记录时发现数据的epoch=2，新增条目（epoch=2, startOffset=5）到自己的leaderEpochCache。

leader副本在处理OffsetForLeaderEpoch请求时，总是返回大于requestedLeaderEpoch的最小Epoch的startOffset，下面将通过源码说明。

kafkaApis#handleOffsetForLeaderEpochRequest()

解码OffsetForLeaderEpochRequest的请求实体，判断是否授权，调用ReplicaManger#lastOffsetForLeaderEpoch()方法获取每个Partition最近上一次的leader epoch和对应的LEO，最后向客户端发送OffsetsForLeaderEpochResponse响应。

def handleOffsetForLeaderEpochRequest(request: RequestChannel.Request): Unit = {
    val offsetForLeaderEpoch = request.body[OffsetsForLeaderEpochRequest]
    val requestInfo = offsetForLeaderEpoch.epochsByTopicPartition.asScala

    // The OffsetsForLeaderEpoch API was initially only used for inter-broker communication and required
    // cluster permission. With KIP-320, the consumer now also uses this API to check for log truncation
    // following a leader change, so we also allow topic describe permission.
    val (authorizedPartitions, unauthorizedPartitions) = if (isAuthorizedClusterAction(request)) {
      (requestInfo, Map.empty[TopicPartition, OffsetsForLeaderEpochRequest.PartitionData])
    } else {
      requestInfo.partition {
        case (tp, _) => authorize(request.session, Describe, Resource(Topic, tp.topic, LITERAL))
      }
    }
//ReplicaManger#lastOffsetForLeaderEpoch()方法获取最近上一次的leader epoch
    val endOffsetsForAuthorizedPartitions = replicaManager.lastOffsetForLeaderEpoch(authorizedPartitions)
    val endOffsetsForUnauthorizedPartitions = unauthorizedPartitions.mapValues(_ =>
      new EpochEndOffset(Errors.TOPIC_AUTHORIZATION_FAILED, EpochEndOffset.UNDEFINED_EPOCH,
        EpochEndOffset.UNDEFINED_EPOCH_OFFSET))

    val endOffsetsForAllPartitions = endOffsetsForAuthorizedPartitions ++ endOffsetsForUnauthorizedPartitions
    sendResponseMaybeThrottle(request, requestThrottleMs =>
      new OffsetsForLeaderEpochResponse(requestThrottleMs, endOffsetsForAllPartitions.asJava))
  }

ReplicaManager#lastOffsetForLeaderEpoch()方法

迭代requestedEpochInfo集合，返回每个Partition对应的EpochEndOffset的map集合

def lastOffsetForLeaderEpoch(requestedEpochInfo: Map[TopicPartition, OffsetsForLeaderEpochRequest.PartitionData]): Map[TopicPartition, EpochEndOffset] = {
//迭代requestedEpochInfo集合，返回每个Partition对应的EpochEndOffset的map集合
  requestedEpochInfo.map { case (tp, partitionData) =>
      val epochEndOffset = getPartition(tp) match {
        case Some(partition) =>
          if (partition eq ReplicaManager.OfflinePartition)
            new EpochEndOffset(Errors.KAFKA_STORAGE_ERROR, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
          else
            partition.lastOffsetForLeaderEpoch(partitionData.currentLeaderEpoch, partitionData.leaderEpoch,
              fetchOnlyFromLeader = true)

        case None if metadataCache.contains(tp) =>
          new EpochEndOffset(Errors.NOT_LEADER_FOR_PARTITION, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)

        case None =>
          new EpochEndOffset(Errors.UNKNOWN_TOPIC_OR_PARTITION, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
      }
      tp -> epochEndOffset
    }
  }

Partition#lastOffsetForLeaderEpoch()方法

返回小于或等于requestedLeaderEpoch的最大Epoch的LEO。LEO被定义为大于requestedLeaderEpoch的第一个Epoch的startOffset；或者定义为latestLeaderEpoch的LEO，如果requestedLeaderEpoch等于latestLeaderEpoch。

  /**
   * Find the (exclusive) last offset of the largest epoch less than or equal to the requested epoch.
   *
   * @param currentLeaderEpoch The expected epoch of the current leader (if known)
   * @param leaderEpoch Requested leader epoch
   * @param fetchOnlyFromLeader Whether or not to require servicing only from the leader
   *
   * @return The requested leader epoch and the end offset of this leader epoch, or if the requested
   *         leader epoch is unknown, the leader epoch less than the requested leader epoch and the end offset
   *         of this leader epoch. The end offset of a leader epoch is defined as the start
   *         offset of the first leader epoch larger than the leader epoch, or else the log end
   *         offset if the leader epoch is the latest leader epoch.
   */
  def lastOffsetForLeaderEpoch(currentLeaderEpoch: Optional[Integer],
                               leaderEpoch: Int,
                               fetchOnlyFromLeader: Boolean): EpochEndOffset = {
    inReadLock(leaderIsrUpdateLock) {
//如果localReplica存在，并且currentLeaderEpoch小于Partition记录的LeaderEpoch，获取localReplica
      val localReplicaOrError = getLocalReplica(localBrokerId, currentLeaderEpoch, fetchOnlyFromLeader)
      localReplicaOrError match {
        case Left(replica) =>
//基于requestedLeaderEpoch，返回一个(leaderEpoch, logEndOffset)二元组
          replica.endOffsetForEpoch(leaderEpoch) match {
            case Some(epochAndOffset) => new EpochEndOffset(NONE, epochAndOffset.leaderEpoch, epochAndOffset.offset)
            case None => new EpochEndOffset(NONE, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
          }
        case Right(error) =>
          new EpochEndOffset(error, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
      }
    }
  }

Replica#endOffsetForEpoch方法

def endOffsetForEpoch(leaderEpoch: Int): Option[OffsetAndEpoch] = {
    if (isLocal) {
      log.get.endOffsetForEpoch(leaderEpoch)
    } else {
      throw new KafkaException(s"Cannot lookup end offset for epoch of non-local replica of $topicPartition")
    }
  }

Log#endOffsetForEpoch()方法

如果leaderEpochFileCache存在，调用LeaderEpochFileCache#endOffsetFor方法

 def endOffsetForEpoch(leaderEpoch: Int): Option[OffsetAndEpoch] = {
//如果leaderEpochCache存在
    leaderEpochCache.flatMap { cache =>
      val (foundEpoch, foundOffset) = cache.endOffsetFor(leaderEpoch)
      if (foundOffset == EpochEndOffset.UNDEFINED_EPOCH_OFFSET)
        None
      else
        Some(OffsetAndEpoch(foundOffset, foundEpoch))
    }
  }

LeaderEpochFileCache#endOffsetFor方法

基于requestedLeaderEpoch，返回一个(leaderEpoch, logEndOffset)二元组。

返回的leaderEpoch是小于或等于requestedLeaderEpoch的最大Epoch，而logEndOffset是该返回的leaderEpoch的LEO。

相关公式如下：

当requestedLeaderEpoch ! = latestEpoch

return leaderEpoch = Math.floor(requestedLeaderEpoch)
return LEO = Math.ceil(requestedLeaderEpoch).startOffset

当requestedLeaderEpoch == latestEpoch

return leaderEpoch = requestedLeaderEpoch
return LEO = latestEpoch.currentLogEndOffset

 /**
    * Returns the Leader Epoch and the End Offset for a requested Leader Epoch.
    *
    * The Leader Epoch returned is the largest epoch less than or equal to the requested Leader
    * Epoch. The End Offset is the end offset of this epoch, which is defined as the start offset
    * of the first Leader Epoch larger than the Leader Epoch requested, or else the Log End
    * Offset if the latest epoch was requested.
    *
    * During the upgrade phase, where there are existing messages may not have a leader epoch,
    * if requestedEpoch is < the first epoch cached, UNSUPPORTED_EPOCH_OFFSET will be returned
    * so that the follower falls back to High Water Mark.
    *
    * @param requestedEpoch requested leader epoch
    * @return found leader epoch and end offset
    */
  def endOffsetFor(requestedEpoch: Int): (Int, Long) = {
    inReadLock(lock) {
      val epochAndOffset =
        if (requestedEpoch == UNDEFINED_EPOCH) {
          // This may happen if a bootstrapping follower sends a request with undefined epoch or
          // a follower is on the older message format where leader epochs are not recorded
          (UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
        } else if (latestEpoch.contains(requestedEpoch)) {
          // For the leader, the latest epoch is always the current leader epoch that is still being written to.
          // Followers should not have any reason to query for the end offset of the current epoch, but a consumer
          // might if it is verifying its committed offset following a group rebalance. In this case, we return
          // the current log end offset which makes the truncation check work as expected.
//对于leader副本，latestEpoch总是当前正在被写入的leaderEpoch。follower副本永远不应该查询currentLeaderEpoch的LEO。但是consumer端可能会这样做，因为它需要在group rebalance发生时检验它的commited offset。
          (requestedEpoch, logEndOffset())
        } else {
//以requestedEpoch为界，将epochs集合分成2组，分别是大于requestedEpoch的subsequentEpochs集合，和小于等于requestedEpoch的previousEpochs集合
          val (subsequentEpochs, previousEpochs) = epochs.partition { e => e.epoch > requestedEpoch}
          if (subsequentEpochs.isEmpty) {
            // The requested epoch is larger than any known epoch. This case should never be hit because
            // the latest cached epoch is always the largest.
//requestedEpoch比任何epoch都大，这个情况永远不应该发生，因为latestEpoch总应该是最大的epoch。
            (UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
          } else if (previousEpochs.isEmpty) {
            // The requested epoch is smaller than any known epoch, so we return the start offset of the first
            // known epoch which is larger than it. This may be inaccurate as there could have been
            // epochs in between, but the point is that the data has already been removed from the log
            // and we want to ensure that the follower can replicate correctly beginning from the leader's
            // start offset.
//requestedEpoch比任何epoch都小，所以LEO返回头一个比它大的epoch的startOffset
            (requestedEpoch, subsequentEpochs.head.startOffset)
          } else {
            // We have at least one previous epoch and one subsequent epoch. The result is the first
            // prior epoch and the starting offset of the first subsequent epoch.
//LeaderEpoch返回小于等于requestedEpoch的最大epoch，LEO返回大于requestedEpoch的头一个epoch的startOffset
            (previousEpochs.last.epoch, subsequentEpochs.head.startOffset)
          }
        }
      debug(s"Processed end offset request for epoch $requestedEpoch and returning epoch ${epochAndOffset._1} " +
        s"with end offset ${epochAndOffset._2} from epoch cache of size ${epochs.size}")
      epochAndOffset
    }
  }

zhifeng687

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
kafka2.2源码分析之handleOffsetForLeaderEpochRequest

概述OffsetForLeaderEpoch api仅用于内部broker之间的通信，并且要求集群的许可。在KIP-320中，consumer端也使用这个api去检查在leader change后是否发生日志截断。首先，我将通过一个例子来说明follower副本向leader副本请求OffsetForLeaderEpoch的过程A(leader, epoch=1): 1, 2, 3, ...
复制链接

扫一扫