kafka2.2源码分析之Partition和Replica

Partition概述

Partition记录了一个partition的所有replica的相关信息,其中包括了local replica、leader replica的信息。

每个Partition都维护着一个LeaderEpoch、TopicPartition、localBrokerId、leaderReplicaId、AR(all replicas)集合和ISR(in-sync replicas)集合。

其中,TopicPartition标明了该Partition的partitionId和所在的topic。

leaderReplicaId记录了leader replica所在的brokerId。

@volatile private var leaderEpoch: Int = LeaderAndIsr.initialLeaderEpoch - 1

// allReplicasMap includes both assigned replicas and the future replica if there is ongoing replica movement
  private val allReplicasMap = new Pool[Int, Replica]

@volatile var inSyncReplicas: Set[Replica] = Set.empty[Replica]

Partition可以用指定的replicaId作为AR集合的key,获取相应的Replica。

Partition可以用localBrokerId作为AR集合的key,获取localReplica。如果获取为null,说明该partition没有Replica在本地broker。

def getReplica(replicaId: Int): Option[Replica] = Option(allReplicasMap.get(replicaId))

def localReplica: Option[Replica] = getReplica(localBrokerId)

def localReplicaOrException: Replica = localReplica.getOrElse {
    throw new ReplicaNotAvailableException(s"Replica for partition $topicPartition is not available " +
      s"on broker $localBrokerId")
  }

如果local Replica存在,Partition提供了判断local Replica是否是leader replica的方法

 def leaderReplicaIfLocal: Option[Replica] = {
    if (leaderReplicaIdOpt.contains(localBrokerId))
      localReplica
    else
      None
  }

此外,如果local Replica存在,Partition还提供了让local Replica成为leader replica的方法。

 /**
   * Make the local replica the leader by resetting LogEndOffset for remote replicas (there could be old LogEndOffset
   * from the time when this broker was the leader last time) and setting the new leader and ISR.
   * If the leader replica id does not change, return false to indicate the replica manager.
   */
  def makeLeader(controllerId: Int, partitionStateInfo: LeaderAndIsrRequest.PartitionState, correlationId: Int): Boolean = {
    val (leaderHWIncremented, isNewLeader) = inWriteLock(leaderIsrUpdateLock) {
      val newAssignedReplicas = partitionStateInfo.basePartitionState.replicas.asScala.map(_.toInt)
      // record the epoch of the controller that made the leadership decision. This is useful while updating the isr
      // to maintain the decision maker controller's epoch in the zookeeper path
      controllerEpoch = partitionStateInfo.basePartitionState.controllerEpoch
      // add replicas that are new
      val newInSyncReplicas = partitionStateInfo.basePartitionState.isr.asScala.map(r => getOrCreateReplica(r, partitionStateInfo.isNew)).toSet
      // remove assigned replicas that have been removed by the controller
      (assignedReplicas.map(_.brokerId) -- newAssignedReplicas).foreach(removeReplica)
//设置ISR为新的集合
      inSyncReplicas = newInSyncReplicas
      newAssignedReplicas.foreach(id => getOrCreateReplica(id, partitionStateInfo.isNew))
//设置leaderReplica为localReplica,前提是localReplica存在,否则抛出异常
      val leaderReplica = localReplicaOrException
//设置leaderEpochStartOffset为leaderReplica的LEO
      val leaderEpochStartOffset = leaderReplica.logEndOffset
      info(s"$topicPartition starts at Leader Epoch ${partitionStateInfo.basePartitionState.leaderEpoch} from " +
        s"offset $leaderEpochStartOffset. Previous Leader Epoch was: $leaderEpoch")

      //We cache the leader epoch here, persisting it only if it's local (hence having a log dir)
//设置leaderEpoch
      leaderEpoch = partitionStateInfo.basePartitionState.leaderEpoch
      leaderEpochStartOffsetOpt = Some(leaderEpochStartOffset)
      zkVersion = partitionStateInfo.basePartitionState.zkVersion

      // In the case of successive leader elections in a short time period, a follower may have
      // entries in its log from a later epoch than any entry in the new leader's log. In order
      // to ensure that these followers can truncate to the right offset, we must cache the new
      // leader epoch and the start offset since it should be larger than any epoch that a follower
      // would try to query.
//为了保证follower副本能把日志截断到正确的offset上,我们把leaderEpoch和leaderEpochStartOffset缓存到Log的LeaderEpochFileCache上。
//follower副本会请求查询leader副本的leaderEpoch,既然leader副本的leaderEpoch会大于其它Epoch
      leaderReplica.log.foreach { log =>
        log.maybeAssignEpochStartOffset(leaderEpoch, leaderEpochStartOffset)
      }

//判断原leaderReplicaId不是localBrokerId,即原leader副本不是本机
      val isNewLeader = !leaderReplicaIdOpt.contains(localBrokerId)
      val curLeaderLogEndOffset = leaderReplica.logEndOffset
      val curTimeMs = time.milliseconds
      // initialize lastCaughtUpTime of replicas as well as their lastFetchTimeMs and lastFetchLeaderLogEndOffset.
      (assignedReplicas - leaderReplica).foreach { replica =>
        val lastCaughtUpTimeMs = if (inSyncReplicas.contains(replica)) curTimeMs else 0L
        replica.resetLastCaughtUpTime(curLeaderLogEndOffset, curTimeMs, lastCaughtUpTimeMs)
      }
//如果原leader副本不是本机
      if (isNewLeader) {
        // construct the high watermark metadata for the new leader replica
//构建新的leader副本的HW元数据信息
        leaderReplica.convertHWToLocalOffsetMetadata()
        // mark local replica as the leader after converting hw
//重置leader副本为本机
        leaderReplicaIdOpt = Some(localBrokerId)
        // reset log end offset for remote replicas
//重置本机partition记录的其它副本的LEO
//assignedReplicas是指allReplicas集合中有效的replica的集合
        assignedReplicas.filter(_.brokerId != localBrokerId).foreach(_.updateLogReadResult(LogReadResult.UnknownLogReadResult))
      }
      // we may need to increment high watermark since ISR could be down to 1
      (maybeIncrementLeaderHW(leaderReplica), isNewLeader)
    }
    // some delayed operations may be unblocked after HW changed
    if (leaderHWIncremented)
      tryCompleteDelayedRequests()
    isNewLeader
  }

如果local Replica是是leader replica,可以追加record到本地日志。

def appendRecordsToLeader(records: MemoryRecords, isFromClient: Boolean, requiredAcks: Int = 0): LogAppendInfo = {
    val (info, leaderHWIncremented) = inReadLock(leaderIsrUpdateLock) {
//如果local Replica是是leader replica
      leaderReplicaIfLocal match {
        case Some(leaderReplica) =>
          val log = leaderReplica.log.get
//获取配置的ISR集合的最低大小
          val minIsr = log.config.minInSyncReplicas
//获取当前ISR集合的大小
          val inSyncSize = inSyncReplicas.size

          // Avoid writing to leader if there are not enough insync replicas to make it safe
//如果当前ISR集合的大小小于SR集合的最低大小,并且ack等于-1,抛出异常
          if (inSyncSize < minIsr && requiredAcks == -1) {
            throw new NotEnoughReplicasException(s"The size of the current ISR ${inSyncReplicas.map(_.brokerId)} " +
              s"is insufficient to satisfy the min.isr requirement of $minIsr for partition $topicPartition")
          }
//调用底层存储Log对象追加record
          val info = log.appendAsLeader(records, leaderEpoch = this.leaderEpoch, isFromClient,
            interBrokerProtocolVersion)

          // we may need to increment high watermark since ISR could be down to 1
          (info, maybeIncrementLeaderHW(leaderReplica))

        case None =>
          throw new NotLeaderForPartitionException("Leader not local for partition %s on broker %d"
            .format(topicPartition, localBrokerId))
      }
    }

    // some delayed operations may be unblocked after HW changed
    if (leaderHWIncremented)
      tryCompleteDelayedRequests()
    else {
      // probably unblock some follower fetch requests since log end offset has been updated
      replicaManager.tryCompleteDelayedFetch(new TopicPartitionOperationKey(topicPartition))
    }

    info
  }

Replica概述

Replica记录了一个broker上的某个Partition的当前日志存储状态信息。

用于标识一个Replica的字段有:brokerId、TopicPartition。其中,ReplicaId等于brokerId。

class Replica(val brokerId: Int,
              val topicPartition: TopicPartition,
              time: Time = Time.SYSTEM,
              initialHighWatermarkValue: Long = 0L,
              @volatile var log: Option[Log] = None) extends Logging {

Replica记录的日志存储状态信息有:HW、logStartOffset、logEndOffset、leaderLogEndOffset。leaderLogEndOffset是指follower Replica上次发送FetchRequest获取到的leader replica的LEO。

// the high watermark offset value, in non-leader replicas only its message offsets are kept
  @volatile private[this] var highWatermarkMetadata = new LogOffsetMetadata(initialHighWatermarkValue)
  // the log end offset value, kept in all replicas;
  // for local replica it is the log's end offset, for remote replicas its value is only updated by follower fetch
  @volatile private[this] var _logEndOffsetMetadata = LogOffsetMetadata.UnknownOffsetMetadata
  // the log start offset value, kept in all replicas;
  // for local replica it is the log's start offset, for remote replicas its value is only updated by follower fetch
  @volatile private[this] var _logStartOffset = Log.UnknownLogStartOffset

  // The log end offset value at the time the leader received the last FetchRequest from this follower
  // This is used to determine the lastCaughtUpTimeMs of the follower
  @volatile private[this] var lastFetchLeaderLogEndOffset = 0L

  // The time when the leader received the last FetchRequest from this follower
  // This is used to determine the lastCaughtUpTimeMs of the follower
  @volatile private[this] var lastFetchTimeMs = 0L

 

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值