kafka2.2源码分析之handleOffsetForLeaderEpochRequest

概述

OffsetForLeaderEpoch api仅用于内部broker之间的通信,并且要求集群的许可。在KIP-320中,consumer端也使用这个api去检查在leader change后是否发生日志截断。

首先,我将通过一个例子来说明follower副本向leader副本请求OffsetForLeaderEpoch的过程

A(leader, epoch=1): 1, 2, 3, 4, 5, 6
A cache: leaderEpoch = 1, startOffset = 1
B(follower): 1, 2, 3, 4
B cache: leaderEpoch = 1, startOffset = 1
=============================================

B(leader, epoch=2): 1, 2, 3, 4, 5, 6, 7
B cache:
leaderEpoch = 1, startOffset = 1
leaderEpoch = 2, startOffset = 5、
A挂掉后,B成为新leader,A又恢复过来,此时追加了新数据,B的leaderEpochCache增加了新条目(leaderEpoch=2, startOffset=5)。
当A请求复制B时,请求的epoch为1,B查询到epoch=2(比1大的最小epoch),然后返回对应的startOffset=5,A收到后truncate自己>=5的记录(这里是offset=5和6),然后把请求的offset更新为5,重新复制数据,B返回数据(offset=5, 6 和7,epoch=2),A追加记录时发现数据的epoch=2,新增条目(epoch=2, startOffset=5)到自己的leaderEpochCache。

leader副本在处理OffsetForLeaderEpoch请求时,总是返回大于requestedLeaderEpoch的最小Epoch的startOffset,下面将通过源码说明。

kafkaApis#handleOffsetForLeaderEpochRequest()

解码OffsetForLeaderEpochRequest的请求实体,判断是否授权,调用ReplicaManger#lastOffsetForLeaderEpoch()方法获取每个Partition最近上一次的leader epoch和对应的LEO,最后向客户端发送OffsetsForLeaderEpochResponse响应。

def handleOffsetForLeaderEpochRequest(request: RequestChannel.Request): Unit = {
    val offsetForLeaderEpoch = request.body[OffsetsForLeaderEpochRequest]
    val requestInfo = offsetForLeaderEpoch.epochsByTopicPartition.asScala

    // The OffsetsForLeaderEpoch API was initially only used for inter-broker communication and required
    // cluster permission. With KIP-320, the consumer now also uses this API to check for log truncation
    // following a leader change, so we also allow topic describe permission.
    val (authorizedPartitions, unauthorizedPartitions) = if (isAuthorizedClusterAction(request)) {
      (requestInfo, Map.empty[TopicPartition, OffsetsForLeaderEpochRequest.PartitionData])
    } else {
      requestInfo.partition {
        case (tp, _) => authorize(request.session, Describe, Resource(Topic, tp.topic, LITERAL))
      }
    }
//ReplicaManger#lastOffsetForLeaderEpoch()方法获取最近上一次的leader epoch
    val endOffsetsForAuthorizedPartitions = replicaManager.lastOffsetForLeaderEpoch(authorizedPartitions)
    val endOffsetsForUnauthorizedPartitions = unauthorizedPartitions.mapValues(_ =>
      new EpochEndOffset(Errors.TOPIC_AUTHORIZATION_FAILED, EpochEndOffset.UNDEFINED_EPOCH,
        EpochEndOffset.UNDEFINED_EPOCH_OFFSET))

    val endOffsetsForAllPartitions = endOffsetsForAuthorizedPartitions ++ endOffsetsForUnauthorizedPartitions
    sendResponseMaybeThrottle(request, requestThrottleMs =>
      new OffsetsForLeaderEpochResponse(requestThrottleMs, endOffsetsForAllPartitions.asJava))
  }

ReplicaManager#lastOffsetForLeaderEpoch()方法

迭代requestedEpochInfo集合,返回每个Partition对应的EpochEndOffset的map集合

def lastOffsetForLeaderEpoch(requestedEpochInfo: Map[TopicPartition, OffsetsForLeaderEpochRequest.PartitionData]): Map[TopicPartition, EpochEndOffset] = {
//迭代requestedEpochInfo集合,返回每个Partition对应的EpochEndOffset的map集合
  requestedEpochInfo.map { case (tp, partitionData) =>
      val epochEndOffset = getPartition(tp) match {
        case Some(partition) =>
          if (partition eq ReplicaManager.OfflinePartition)
            new EpochEndOffset(Errors.KAFKA_STORAGE_ERROR, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
          else
            partition.lastOffsetForLeaderEpoch(partitionData.currentLeaderEpoch, partitionData.leaderEpoch,
              fetchOnlyFromLeader = true)

        case None if metadataCache.contains(tp) =>
          new EpochEndOffset(Errors.NOT_LEADER_FOR_PARTITION, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)

        case None =>
          new EpochEndOffset(Errors.UNKNOWN_TOPIC_OR_PARTITION, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
      }
      tp -> epochEndOffset
    }
  }

Partition#lastOffsetForLeaderEpoch()方法

返回小于或等于requestedLeaderEpoch的最大Epoch的LEO。LEO被定义为大于requestedLeaderEpoch的第一个Epoch的startOffset;或者定义为latestLeaderEpoch的LEO,如果requestedLeaderEpoch等于latestLeaderEpoch。

  /**
   * Find the (exclusive) last offset of the largest epoch less than or equal to the requested epoch.
   *
   * @param currentLeaderEpoch The expected epoch of the current leader (if known)
   * @param leaderEpoch Requested leader epoch
   * @param fetchOnlyFromLeader Whether or not to require servicing only from the leader
   *
   * @return The requested leader epoch and the end offset of this leader epoch, or if the requested
   *         leader epoch is unknown, the leader epoch less than the requested leader epoch and the end offset
   *         of this leader epoch. The end offset of a leader epoch is defined as the start
   *         offset of the first leader epoch larger than the leader epoch, or else the log end
   *         offset if the leader epoch is the latest leader epoch.
   */
  def lastOffsetForLeaderEpoch(currentLeaderEpoch: Optional[Integer],
                               leaderEpoch: Int,
                               fetchOnlyFromLeader: Boolean): EpochEndOffset = {
    inReadLock(leaderIsrUpdateLock) {
//如果localReplica存在,并且currentLeaderEpoch小于Partition记录的LeaderEpoch,获取localReplica
      val localReplicaOrError = getLocalReplica(localBrokerId, currentLeaderEpoch, fetchOnlyFromLeader)
      localReplicaOrError match {
        case Left(replica) =>
//基于requestedLeaderEpoch,返回一个(leaderEpoch, logEndOffset)二元组
          replica.endOffsetForEpoch(leaderEpoch) match {
            case Some(epochAndOffset) => new EpochEndOffset(NONE, epochAndOffset.leaderEpoch, epochAndOffset.offset)
            case None => new EpochEndOffset(NONE, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
          }
        case Right(error) =>
          new EpochEndOffset(error, UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
      }
    }
  }

Replica#endOffsetForEpoch方法

def endOffsetForEpoch(leaderEpoch: Int): Option[OffsetAndEpoch] = {
    if (isLocal) {
      log.get.endOffsetForEpoch(leaderEpoch)
    } else {
      throw new KafkaException(s"Cannot lookup end offset for epoch of non-local replica of $topicPartition")
    }
  }

Log#endOffsetForEpoch()方法

如果leaderEpochFileCache存在,调用LeaderEpochFileCache#endOffsetFor方法

 def endOffsetForEpoch(leaderEpoch: Int): Option[OffsetAndEpoch] = {
//如果leaderEpochCache存在
    leaderEpochCache.flatMap { cache =>
      val (foundEpoch, foundOffset) = cache.endOffsetFor(leaderEpoch)
      if (foundOffset == EpochEndOffset.UNDEFINED_EPOCH_OFFSET)
        None
      else
        Some(OffsetAndEpoch(foundOffset, foundEpoch))
    }
  }

LeaderEpochFileCache#endOffsetFor方法

基于requestedLeaderEpoch,返回一个(leaderEpoch, logEndOffset)二元组。

返回的leaderEpoch是小于或等于requestedLeaderEpoch的最大Epoch,而logEndOffset是该返回的leaderEpoch的LEO。

相关公式如下:

当requestedLeaderEpoch ! = latestEpoch

  • return leaderEpoch = Math.floor(requestedLeaderEpoch) 
  • return LEO =  Math.ceil(requestedLeaderEpoch).startOffset 

当requestedLeaderEpoch == latestEpoch

  • return leaderEpoch = requestedLeaderEpoch
  • return LEO =  latestEpoch.currentLogEndOffset
 /**
    * Returns the Leader Epoch and the End Offset for a requested Leader Epoch.
    *
    * The Leader Epoch returned is the largest epoch less than or equal to the requested Leader
    * Epoch. The End Offset is the end offset of this epoch, which is defined as the start offset
    * of the first Leader Epoch larger than the Leader Epoch requested, or else the Log End
    * Offset if the latest epoch was requested.
    *
    * During the upgrade phase, where there are existing messages may not have a leader epoch,
    * if requestedEpoch is < the first epoch cached, UNSUPPORTED_EPOCH_OFFSET will be returned
    * so that the follower falls back to High Water Mark.
    *
    * @param requestedEpoch requested leader epoch
    * @return found leader epoch and end offset
    */
  def endOffsetFor(requestedEpoch: Int): (Int, Long) = {
    inReadLock(lock) {
      val epochAndOffset =
        if (requestedEpoch == UNDEFINED_EPOCH) {
          // This may happen if a bootstrapping follower sends a request with undefined epoch or
          // a follower is on the older message format where leader epochs are not recorded
          (UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
        } else if (latestEpoch.contains(requestedEpoch)) {
          // For the leader, the latest epoch is always the current leader epoch that is still being written to.
          // Followers should not have any reason to query for the end offset of the current epoch, but a consumer
          // might if it is verifying its committed offset following a group rebalance. In this case, we return
          // the current log end offset which makes the truncation check work as expected.
//对于leader副本,latestEpoch总是当前正在被写入的leaderEpoch。follower副本永远不应该查询currentLeaderEpoch的LEO。但是consumer端可能会这样做,因为它需要在group rebalance发生时检验它的commited offset。
          (requestedEpoch, logEndOffset())
        } else {
//以requestedEpoch为界,将epochs集合分成2组,分别是大于requestedEpoch的subsequentEpochs集合,和小于等于requestedEpoch的previousEpochs集合
          val (subsequentEpochs, previousEpochs) = epochs.partition { e => e.epoch > requestedEpoch}
          if (subsequentEpochs.isEmpty) {
            // The requested epoch is larger than any known epoch. This case should never be hit because
            // the latest cached epoch is always the largest.
//requestedEpoch比任何epoch都大,这个情况永远不应该发生,因为latestEpoch总应该是最大的epoch。
            (UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET)
          } else if (previousEpochs.isEmpty) {
            // The requested epoch is smaller than any known epoch, so we return the start offset of the first
            // known epoch which is larger than it. This may be inaccurate as there could have been
            // epochs in between, but the point is that the data has already been removed from the log
            // and we want to ensure that the follower can replicate correctly beginning from the leader's
            // start offset.
//requestedEpoch比任何epoch都小,所以LEO返回头一个比它大的epoch的startOffset
            (requestedEpoch, subsequentEpochs.head.startOffset)
          } else {
            // We have at least one previous epoch and one subsequent epoch. The result is the first
            // prior epoch and the starting offset of the first subsequent epoch.
//LeaderEpoch返回小于等于requestedEpoch的最大epoch,LEO返回大于requestedEpoch的头一个epoch的startOffset
            (previousEpochs.last.epoch, subsequentEpochs.head.startOffset)
          }
        }
      debug(s"Processed end offset request for epoch $requestedEpoch and returning epoch ${epochAndOffset._1} " +
        s"with end offset ${epochAndOffset._2} from epoch cache of size ${epochs.size}")
      epochAndOffset
    }
  }

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值