kakfa RPC 协议（社区 Trunk 版本）--ListOffsetRequest

最新推荐文章于 2021-08-18 10:39:46 发布

IT___zhao

最新推荐文章于 2021-08-18 10:39:46 发布

阅读量420

点赞数

文章标签： kafka

本文链接：https://blog.csdn.net/IT___zhao/article/details/106600051

版权

大数据#消息队列MQ 架构设计专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1、RPC 功能和使用场景

功能：用于consumer 或者 follower 获取消费的起始offset，只支持EARLIEST_TIMESTAMP（最早）和 LATEST_TIMESTAMP（最新）消费。
使用场景

2.1 consumer: consumer 在初次启动的时候需要指定一个消费的起始offset,才能进行消费，如果该consumer 对应的group 之前消费过kafka 数据切commit offset 未过期（提交的数据存在过期时间，因为kafka 的数据也会过期删除），那么就会发送另外一个RPC（OffsetFetchRequest），会在后续解析中介绍）

2.2 follower: follower 从leader fetch 数据的时候，由于fetch 过慢或者其他异常情况下，会触发OUT_OF_RANFE_EXCEPTION，此时需要从leader 获取一个新的offset，继续fetch 数据

2、RPC 字段解析

    private final int replicaId;
    private final IsolationLevel isolationLevel;
    private final Map<TopicPartition, PartitionData> partitionTimestamps;
    private final Set<TopicPartition> duplicatePartitions;

replicaId ,用于服务端在处理时区分请求是来自于consumer 还是follower（小于0 标识为consumer）
isolationLevel:消费事务性控制，控制能够消费的数据区间
partitionTimestamps:RPC 携带的tp 信息
duplicatePartitions：如果一个rpc 包含重复的tp,就会将其加入到duplicatePartitions 中，服务端在处理时会忽略这个tp(一般也不会出现这种情况)

3、client 端解析

3.1 consumer

consumer 调用LISTOFFSET的起点代码位于这个位置

 private RequestFuture<ListOffsetResult> sendListOffsetRequest(final Node node,
                                                                  final Map<TopicPartition, ListOffsetRequest.PartitionData> timestampsToSearch,
                                                                  boolean requireTimestamp) {
        ListOffsetRequest.Builder builder = ListOffsetRequest.Builder
                .forConsumer(requireTimestamp, isolationLevel)
                .setTargetTimes(timestampsToSearch);

        log.debug("Sending ListOffsetRequest {} to broker {}", builder, node);
        return client.send(node, builder)
                .compose(new RequestFutureAdapter<ClientResponse, ListOffsetResult>() {
                    @Override
                    public void onSuccess(ClientResponse response, RequestFuture<ListOffsetResult> future) {
                        ListOffsetResponse lor = (ListOffsetResponse) response.responseBody();
                        log.trace("Received ListOffsetResponse {} from broker {}", lor, node);
                        handleListOffsetResponse(timestampsToSearch, lor, future);
                    }
                });
    }

3.2 follower

(1) makeFollower的时候，由于fetch的起始offset 小于0 ，此时需要truncat 掉自身的数据并且从leader fetch offset

def addPartitions(initialFetchStates: Map[TopicPartition, OffsetAndEpoch]) {
    partitionMapLock.lockInterruptibly()
    try {
      initialFetchStates.foreach { case (tp, initialFetchState) =>
        // We can skip the truncation step iff the leader epoch matches the existing epoch
        val currentState = partitionStates.stateValue(tp)
        val updatedState = if (currentState != null && currentState.currentLeaderEpoch == initialFetchState.leaderEpoch) {
          currentState
        } else {
          val initialFetchOffset = if (initialFetchState.offset < 0)
            fetchOffsetAndTruncate(tp, initialFetchState.leaderEpoch)
          else
            initialFetchState.offset
          PartitionFetchState(initialFetchOffset, initialFetchState.leaderEpoch, state = Truncating)
        }
        partitionStates.updateAndMoveToEnd(tp, updatedState)
      }

      partitionMapCond.signalAll()
    } finally partitionMapLock.unlock()
  }

（2）在fetch leader 数据的时候返回OUT_OF_RANGE时

 private def handleOutOfRangeError(topicPartition: TopicPartition,
                                    fetchState: PartitionFetchState): Boolean = {
    try {
      val newOffset = fetchOffsetAndTruncate(topicPartition, fetchState.currentLeaderEpoch)
      val newFetchState = PartitionFetchState(newOffset, fetchState.currentLeaderEpoch, state = Fetching)
      partitionStates.updateAndMoveToEnd(topicPartition, newFetchState)
      info(s"Current offset ${fetchState.fetchOffset} for partition $topicPartition is " +
        s"out of range, which typically implies a leader change. Reset fetch offset to $newOffset")
      true

--这里首先获取leader 最新的leo，如果leder的leo < follower 的leo,那么需要truncat掉越界的数据并从leader leo 开始拖数据

---如果leder的leo > follower 的leo, 那么继续判断leader的lso 是否是否比follower 的leo 大，大于则从leader的lso 开始恢复数据

4、server 处理解析

def handleListOffsetRequest(request: RequestChannel.Request) {
    val version = request.header.apiVersion()

    val mergedResponseMap = if (version == 0)
      handleListOffsetRequestV0(request)
    else
      handleListOffsetRequestV1AndAbove(request)

    sendResponseMaybeThrottle(request, requestThrottleMs => new ListOffsetResponse(requestThrottleMs, mergedResponseMap.asJava))
  }

server 端接收到请求后，根据api Version 信息进行区分
handleListOffsetRequestV0与handleListOffsetRequestV1 最大的区别在于是否支持只返回指定offsets 之前的offset(由于这一参数在后期被社区标记为Deprecated,所以区别已然很小)，这里重点解析handleListOffsetRequestV1
核心的处理逻辑位于下列函数中

 def fetchOffsetByTimestamp(targetTimestamp: Long): Option[TimestampAndOffset] = {
    maybeHandleIOException(s"Error while fetching offset by timestamp for $topicPartition in dir ${dir.getParent}") {
      debug(s"Searching offset for timestamp $targetTimestamp")

      if (config.messageFormatVersion < KAFKA_0_10_0_IV0 &&
        targetTimestamp != ListOffsetRequest.EARLIEST_TIMESTAMP &&
        targetTimestamp != ListOffsetRequest.LATEST_TIMESTAMP)
        throw new UnsupportedForMessageFormatException(s"Cannot search offsets based on timestamp because message format version " +
          s"for partition $topicPartition is ${config.messageFormatVersion} which is earlier than the minimum " +
          s"required version $KAFKA_0_10_0_IV0")

      // Cache to avoid race conditions. `toBuffer` is faster than most alternatives and provides
      // constant time access while being safe to use with concurrent collections unlike `toArray`.
      val segmentsCopy = logSegments.toBuffer
      // For the earliest and latest, we do not need to return the timestamp.
      if (targetTimestamp == ListOffsetRequest.EARLIEST_TIMESTAMP) {
        // The first cached epoch usually corresponds to the log start offset, but we have to verify this since
        // it may not be true following a message format version bump as the epoch will not be available for
        // log entries written in the older format.
        val earliestEpochEntry = leaderEpochCache.flatMap(_.earliestEntry)
        val epochOpt = earliestEpochEntry match {
          case Some(entry) if entry.startOffset <= logStartOffset => Optional.of[Integer](entry.epoch)
          case _ => Optional.empty[Integer]()
        }
        return Some(new TimestampAndOffset(RecordBatch.NO_TIMESTAMP, logStartOffset, epochOpt))
      } else if (targetTimestamp == ListOffsetRequest.LATEST_TIMESTAMP) {
        val latestEpochOpt = leaderEpochCache.flatMap(_.latestEpoch).map(_.asInstanceOf[Integer])
        val epochOptional = Optional.ofNullable(latestEpochOpt.orNull)
        return Some(new TimestampAndOffset(RecordBatch.NO_TIMESTAMP, logEndOffset, epochOptional))
      }

      val targetSeg = {
        // Get all the segments whose largest timestamp is smaller than target timestamp
        val earlierSegs = segmentsCopy.takeWhile(_.largestTimestamp < targetTimestamp)
        // We need to search the first segment whose largest timestamp is greater than the target timestamp if there is one.
        if (earlierSegs.length < segmentsCopy.length)
          Some(segmentsCopy(earlierSegs.length))
        else
          None
      }

      targetSeg.flatMap(_.findOffsetByTimestamp(targetTimestamp, logStartOffset))
    }
  }

这里的一个优化是通过对leader每一个segment 维持一个leaderEpochCache来加速查询过程

IT___zhao

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
kakfa RPC 协议（社区 Trunk 版本）--ListOffsetRequest

1、RPC 功能用于consumer 或者 follower 获取消费的起始offset，只支持EARLIEST_TIMESTAMP（最早）和 LATEST_TIMESTAMP（最新）消费。2、RPC 字段解析 private final int replicaId; private final IsolationLevel isolationLevel; private final Map<TopicPartition, PartitionData> par
复制链接

扫一扫

专栏目录