kakfa RPC 协议(社区 Trunk 版本)--ListOffsetRequest

1、RPC 功能和使用场景

  1. 功能:用于consumer 或者 follower 获取消费的起始offset,只支持EARLIEST_TIMESTAMP(最早)和 LATEST_TIMESTAMP(最新)消费。
  2. 使用场景

         2.1 consumer: consumer 在初次启动的时候需要指定一个消费的起始offset,才能进行消费,如果该consumer 对应的group 之前消费过kafka 数据切commit offset 未过期(提交的数据存在过期时间,因为kafka 的数据也会过期删除),那么就会发送另外一个RPC(OffsetFetchRequest),会在后续解析中介绍)

         2.2 follower: follower 从leader fetch 数据的时候,由于fetch 过慢或者其他异常情况下,会触发OUT_OF_RANFE_EXCEPTION,此时需要从leader 获取一个新的offset,继续fetch 数据

2、RPC 字段解析

    private final int replicaId;
    private final IsolationLevel isolationLevel;
    private final Map<TopicPartition, PartitionData> partitionTimestamps;
    private final Set<TopicPartition> duplicatePartitions;
  • replicaId ,用于服务端在处理时区分请求是来自于consumer 还是follower(小于0 标识为consumer)
  • isolationLevel:消费事务性控制,控制能够消费的数据区间
  • partitionTimestamps:RPC 携带的tp 信息
  • duplicatePartitions:如果一个rpc 包含重复的tp,就会将其加入到duplicatePartitions 中,服务端在处理时会忽略这个tp(一般也不会出现这种情况)

3、client 端解析

      3.1 consumer 

          consumer 调用LISTOFFSET的起点代码位于这个位置

 private RequestFuture<ListOffsetResult> sendListOffsetRequest(final Node node,
                                                                  final Map<TopicPartition, ListOffsetRequest.PartitionData> timestampsToSearch,
                                                                  boolean requireTimestamp) {
        ListOffsetRequest.Builder builder = ListOffsetRequest.Builder
                .forConsumer(requireTimestamp, isolationLevel)
                .setTargetTimes(timestampsToSearch);

        log.debug("Sending ListOffsetRequest {} to broker {}", builder, node);
        return client.send(node, builder)
                .compose(new RequestFutureAdapter<ClientResponse, ListOffsetResult>() {
                    @Override
                    public void onSuccess(ClientResponse response, RequestFuture<ListOffsetResult> future) {
                        ListOffsetResponse lor = (ListOffsetResponse) response.responseBody();
                        log.trace("Received ListOffsetResponse {} from broker {}", lor, node);
                        handleListOffsetResponse(timestampsToSearch, lor, future);
                    }
                });
    }

     3.2 follower

          (1)    makeFollower的时候,由于fetch的起始offset 小于0 ,此时需要truncat 掉自身的数据并且从leader fetch offset

def addPartitions(initialFetchStates: Map[TopicPartition, OffsetAndEpoch]) {
    partitionMapLock.lockInterruptibly()
    try {
      initialFetchStates.foreach { case (tp, initialFetchState) =>
        // We can skip the truncation step iff the leader epoch matches the existing epoch
        val currentState = partitionStates.stateValue(tp)
        val updatedState = if (currentState != null && currentState.currentLeaderEpoch == initialFetchState.leaderEpoch) {
          currentState
        } else {
          val initialFetchOffset = if (initialFetchState.offset < 0)
            fetchOffsetAndTruncate(tp, initialFetchState.leaderEpoch)
          else
            initialFetchState.offset
          PartitionFetchState(initialFetchOffset, initialFetchState.leaderEpoch, state = Truncating)
        }
        partitionStates.updateAndMoveToEnd(tp, updatedState)
      }

      partitionMapCond.signalAll()
    } finally partitionMapLock.unlock()
  }

   (2)在fetch leader 数据的时候返回OUT_OF_RANGE时

 private def handleOutOfRangeError(topicPartition: TopicPartition,
                                    fetchState: PartitionFetchState): Boolean = {
    try {
      val newOffset = fetchOffsetAndTruncate(topicPartition, fetchState.currentLeaderEpoch)
      val newFetchState = PartitionFetchState(newOffset, fetchState.currentLeaderEpoch, state = Fetching)
      partitionStates.updateAndMoveToEnd(topicPartition, newFetchState)
      info(s"Current offset ${fetchState.fetchOffset} for partition $topicPartition is " +
        s"out of range, which typically implies a leader change. Reset fetch offset to $newOffset")
      true

       --这里首先获取leader 最新的leo,如果leder的leo < follower 的leo,那么需要truncat掉越界的数据并从leader leo 开始拖数据

      ---如果leder的leo > follower 的leo, 那么继续判断leader的lso 是否 是否比follower 的leo 大,大于则从leader的lso 开始恢复数据

4、server 处理解析

 

def handleListOffsetRequest(request: RequestChannel.Request) {
    val version = request.header.apiVersion()

    val mergedResponseMap = if (version == 0)
      handleListOffsetRequestV0(request)
    else
      handleListOffsetRequestV1AndAbove(request)

    sendResponseMaybeThrottle(request, requestThrottleMs => new ListOffsetResponse(requestThrottleMs, mergedResponseMap.asJava))
  }
  1. server 端接收到请求后,根据api Version 信息进行区分
  2. handleListOffsetRequestV0与handleListOffsetRequestV1 最大的区别在于是否支持只返回指定offsets 之前的offset(由于这一参数在后期被社区标记为Deprecated,所以区别已然很小),这里重点解析handleListOffsetRequestV1
  3. 核心的处理逻辑位于下列函数中
  4.  def fetchOffsetByTimestamp(targetTimestamp: Long): Option[TimestampAndOffset] = {
        maybeHandleIOException(s"Error while fetching offset by timestamp for $topicPartition in dir ${dir.getParent}") {
          debug(s"Searching offset for timestamp $targetTimestamp")
    
          if (config.messageFormatVersion < KAFKA_0_10_0_IV0 &&
            targetTimestamp != ListOffsetRequest.EARLIEST_TIMESTAMP &&
            targetTimestamp != ListOffsetRequest.LATEST_TIMESTAMP)
            throw new UnsupportedForMessageFormatException(s"Cannot search offsets based on timestamp because message format version " +
              s"for partition $topicPartition is ${config.messageFormatVersion} which is earlier than the minimum " +
              s"required version $KAFKA_0_10_0_IV0")
    
          // Cache to avoid race conditions. `toBuffer` is faster than most alternatives and provides
          // constant time access while being safe to use with concurrent collections unlike `toArray`.
          val segmentsCopy = logSegments.toBuffer
          // For the earliest and latest, we do not need to return the timestamp.
          if (targetTimestamp == ListOffsetRequest.EARLIEST_TIMESTAMP) {
            // The first cached epoch usually corresponds to the log start offset, but we have to verify this since
            // it may not be true following a message format version bump as the epoch will not be available for
            // log entries written in the older format.
            val earliestEpochEntry = leaderEpochCache.flatMap(_.earliestEntry)
            val epochOpt = earliestEpochEntry match {
              case Some(entry) if entry.startOffset <= logStartOffset => Optional.of[Integer](entry.epoch)
              case _ => Optional.empty[Integer]()
            }
            return Some(new TimestampAndOffset(RecordBatch.NO_TIMESTAMP, logStartOffset, epochOpt))
          } else if (targetTimestamp == ListOffsetRequest.LATEST_TIMESTAMP) {
            val latestEpochOpt = leaderEpochCache.flatMap(_.latestEpoch).map(_.asInstanceOf[Integer])
            val epochOptional = Optional.ofNullable(latestEpochOpt.orNull)
            return Some(new TimestampAndOffset(RecordBatch.NO_TIMESTAMP, logEndOffset, epochOptional))
          }
    
          val targetSeg = {
            // Get all the segments whose largest timestamp is smaller than target timestamp
            val earlierSegs = segmentsCopy.takeWhile(_.largestTimestamp < targetTimestamp)
            // We need to search the first segment whose largest timestamp is greater than the target timestamp if there is one.
            if (earlierSegs.length < segmentsCopy.length)
              Some(segmentsCopy(earlierSegs.length))
            else
              None
          }
    
          targetSeg.flatMap(_.findOffsetByTimestamp(targetTimestamp, logStartOffset))
        }
      }

    这里的一个优化是通过对leader每一个segment 维持一个leaderEpochCache来加速查询过程

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值