PartitionStateMachine分析

最新推荐文章于 2023-06-05 13:03:41 发布

莫言静好、

最新推荐文章于 2023-06-05 13:03:41 发布

阅读量875

点赞数

分类专栏：大数据/kafka/源码文章标签： kafka PartitionStateMachin 源码

本文链接：https://blog.csdn.net/zhanglh046/article/details/72822091

版权

大数据/kafka/源码专栏收录该内容

50 篇文章 3 订阅

订阅专栏

PartitionStateMachine是Controller Leader用于维护分区状态的状态机，分区状态时PartitionState，它有四个子类：

一分区的状态转换

# NonExistentPartition -> NewPartition

从zookeeper中加载partition的AR 集合到ControllerContext的partitionReplicaAssignment

# NewPartition -> OnlinePartition

首先将第一个可用的副本所在的broker作为leader,再把所有可用的副本对象都装入ISR，然后写leader和ISR信息到zookeeper中保存

对于这个分区而言，发送LeaderAndIsr请求到每个可用的副本broker，以及UpdateMetadata请求到每个可用的broker上

# OnlinePartition/OfflinePartition ->OnlinePartition

为分区选择新的Leader副本和ISR集合，并将结果写入zookeeper，然后向需要进行角色切换的副本发LeaderAndIsrReqeust，指导这些副本进行角色切换，并向所有可用broker发送UpdateMetadataCache请求，更新该broker上的MetadataCache

# NewPartition/OnlinePartition ->OfflinePartition

仅仅是在kafkaController中标记该状态为OfflinePartition

# OfflinePartition -> NonExistentPartition

只是进行状态切换，没有其他操作

二核心字段

controllerContext: ControllerContext 用于维护KafkaController中上下文信息

partitionState：Map[TopicAndPartition, PartitionState] 用于保存分区对应的状态

brokerRequestBatch：ControllerBrokerRequestBatch 用于向指定的Broker批量发送请请求

noOpPartitionLeaderSelector：NoOpLeaderSelector 默认的副本选举器，并没有真正进行副本选举，只是返回当前的Leader副本，ISR集合和AR集合

topicChangeListener:TopicChangeListener zookeeper的监听器，监听topic的变化

deleteTopicsListener：DeleteTopicsListener zookeeper的监听器，监听topic的删除

partitionModificationsListeners：Map[String, PartitionModifications

Listener] 用于监听分区修改

三核心方法

3.1 startup方法

在PartitionStateMachine初始化的时候,会初始化partition的状态，并且会将NewPartition、OfflinePartition状态的分区试图转换成Online

Partition状态

def startup() {
  // 初始化partition状态
  initializePartitionState()
  // set started flag
  hasStarted.set(true)
  // 试图移动partition到online状态
  triggerOnlinePartitionStateChange()
}

# 初始化各个partition状态，初始化是根据ControllerContext的

partitionLeadershipinfo来决定的

private def initializePartitionState() {
// 遍历ControllerContext获取的分区和副本映射集合
for((topicPartition, replicaAssignment) <- controllerContext.partitionReplicaAssignment) {
    // 检测ControllerContext保存的leader信息的leader和isr的路径在zookeeper是否存在
    // 如果存在表示不是新建的分区，如果不存在则表示这是新分区
    controllerContext.partitionLeadershipInfo.get(topicPartition) match {
      case Some(currentLeaderIsrAndEpoch) =>
        // 检测该分区leader是否可用
        if (controllerContext.liveBrokerIds.contains(currentLeaderIsrAndEpoch.leaderAndIsr.leader))
          // 如果可用初始化为OnlinePartition
          partitionState.put(topicPartition, OnlinePartition)
        else
          // 如果不可用初始化为OfflinePartition
          partitionState.put(topicPartition, OfflinePartition)
      case None =>
        // 如果没有，则表示是新建的，状态为NewPartition
        partitionState.put(topicPartition, NewPartition)
    }
}
}

# 试图移动所有NewPartition或者OfflinePartition状态的partition到OnlinePartition状态

def triggerOnlinePartitionStateChange() {
  try {
    brokerRequestBatch.newBatch()
    // 试图移动所有NewPartition或者OfflinePartition状态的partition到OnlinePartition状态
    // 遍历每一个分区和对应的状态的映射集合
    for((topicAndPartition, partitionState) <- partitionState
        // 如果没有开启topic物理删除机制且没有在topic删除队列
        if !controller.deleteTopicManager.isTopicQueuedUpForDeletion(topicAndPartition.topic)) {
      // 将OfflinePartition和NewPartition 试图转换成NewPartition
      if(partitionState.equals(OfflinePartition) || partitionState.equals(NewPartition))
        handleStateChange(topicAndPartition.topic, topicAndPartition.partition, OnlinePartition, controller.offlinePartitionSelector,
                          (new CallbackBuilder).build)
    }
    // 批量发送请求到指定的broker
    brokerRequestBatch.sendRequestsToBrokers(controller.epoch)
  } catch {
    case e: Throwable => error("Error while moving some partitions to the online state", e)
  }
}

3.2 handleStateChange方法进行分区状态切换的核心方法，它会根据指定的leader 选举策略进行选举，每一次在转换前都会检测分区的前置状态是否合法

private def handleStateChange(topic: String, partition: Int, targetState: PartitionState, leaderSelector: PartitionLeaderSelector,
    callbacks: Callbacks) {
  val topicAndPartition = TopicAndPartition(topic, partition)
  if (!hasStarted.get)
    throw new StateChangeFailedException(("Controller %d epoch %d initiated state change for partition %s to %s failed because " +
      "the partition state machine has not started").format(controllerId, controller.epoch, topicAndPartition, targetState))
  // 根据指定的分区，获取分区状态，如果没有则为NonExistentPartition
  val currState = partitionState.getOrElseUpdate(topicAndPartition, NonExistentPartition)
  try {
    targetState match {
      // 如果要转换成NewPartition
      case NewPartition =>
        // 检查该分区的前置状态
        assertValidPreviousStates(topicAndPartition, List(NonExistentPartition), NewPartition)
        // 修改partition状态
        partitionState.put(topicAndPartition, NewPartition)
        // 获取分区AR集合
        val assignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition).mkString(",")
        stateChangeLogger.trace("Controller %d epoch %d changed partition %s state from %s to %s with assigned replicas %s"
          .format(controllerId, controller.epoch, topicAndPartition, currState, targetState, assignedReplicas))
      // 如果要转成OnLinePartition
      case OnlinePartition =>
        // 检查该分区的前置状态
        assertValidPreviousStates(topicAndPartition, List(NewPartition, OnlinePartition, OfflinePartition), OnlinePartition)
        partitionState(topicAndPartition) match {
          // 当前分区状态是NewPartition
          case NewPartition =>
            // 实例化新分区的Leader 和 ISR
            initializeLeaderAndIsrForPartition(topicAndPartition)
          // 当前分区状态是OfflinePartition
          case OfflinePartition =>
            // 调用OfflinePartition->OnlinePartition状态转换方法
            electLeaderForPartition(topic, partition, leaderSelector)
          // 如果本身就是OnlinePartition，然后因为某种原因重新选举
          case OnlinePartition => // invoked when the leader needs to be re-elected
            // 调用OnlinePartition->OnlinePartition状态转换方法
            electLeaderForPartition(topic, partition, leaderSelector)
          case _ => // should never come here since illegal previous states are checked above
        }
        // 修改partition状态为OnlinePartition
        partitionState.put(topicAndPartition, OnlinePartition)
        val leader = controllerContext.partitionLeadershipInfo(topicAndPartition).leaderAndIsr.leader
        stateChangeLogger.trace("Controller %d epoch %d changed partition %s from %s to %s with leader %d"
          .format(controllerId, controller.epoch, topicAndPartition, currState, targetState, leader))
      // 如果要转成OfflinePartition
      case OfflinePartition =>
        // 检查前置状态
        assertValidPreviousStates(topicAndPartition, List(NewPartition, OnlinePartition, OfflinePartition), OfflinePartition)
        stateChangeLogger.trace("Controller %d epoch %d changed partition %s state from %s to %s"
          .format(controllerId, controller.epoch, topicAndPartition, currState, targetState))
        // 修改partition状态为OnlinePartition
        partitionState.put(topicAndPartition, OfflinePartition)
      // 如果要转成NonExistentPartition
      case NonExistentPartition =>
        // 检查前置状态
        assertValidPreviousStates(topicAndPartition, List(OfflinePartition), NonExistentPartition)
        stateChangeLogger.trace("Controller %d epoch %d changed partition %s state from %s to %s"
          .format(controllerId, controller.epoch, topicAndPartition, currState, targetState))
        // 修改partition状态为NonExistentPartition
        partitionState.put(topicAndPartition, NonExistentPartition)
    }
  } catch {
    case t: Throwable =>
      stateChangeLogger.error("Controller %d epoch %d initiated state change for partition %s from %s to %s failed"
        .format(controllerId, controller.epoch, topicAndPartition, currState, targetState), t)
  }
}

3.3 initializeLeaderAndIsrForPartition如果NewPartition要切换成OnlinePartition状态时，会初始化该分区的Leader和ISR列表

# 获取该分区AR副本集，并且过滤出现在可用的有哪些副本

# 如果没有可用副本，表示转换失败

# 如果有则创建LeaderIsrAndControllerEpoch对象,它封装了Leader,

ISR以及controller epoch相关的信息

# 将LeaderIsrAndControllerEpoch对象进行转换后，保存到zookeeper对应的路径下：

/brokers/topics/[topic_name]/partitions/[partition_id]/state

# 更新ControllerContext的 partitionLeadershipInfo分区的leader信息

# 将获取的Leader副本和ISR列表以及AR等信息，封装成LeaderAndIsrRequest，添加到待发送队列，等待被发送

private def initializeLeaderAndIsrForPartition(topicAndPartition: TopicAndPartition) {
  // 获取该分区AR副本集
  val replicaAssignment = controllerContext.partitionReplicaAssignment(topicAndPartition)
  // 获取该分区AR副本集中所有可用的副本
  val liveAssignedReplicas = replicaAssignment.filter(r => controllerContext.liveBrokerIds.contains(r))
  liveAssignedReplicas.size match {
    // 如果AR中没有存活的副本集，抛出状态转换失败的异常
    case 0 =>
      // ......
    case _ =>
      debug("Live assigned replicas for partition %s are: [%s]".format(topicAndPartition, liveAssignedReplicas))
      // 获取AR中可用副本集中的第一个副本作为Leader
      val leader = liveAssignedReplicas.head
      // 创建LeaderIsrAndControllerEpoch对象
      val leaderIsrAndControllerEpoch = new LeaderIsrAndControllerEpoch(new LeaderAndIsr(leader, liveAssignedReplicas.toList),
        controller.epoch)
      debug("Initializing leader and isr for partition %s to %s".format(topicAndPartition, leaderIsrAndControllerEpoch))
      try {
        // 根据leaderIsrAndControllerEpoch信息在zookeeper创建/brokers/topics/[topic_name]/partitions/[partition_id]/state
        zkUtils.createPersistentPath(
          getTopicPartitionLeaderAndIsrPath(topicAndPartition.topic, topicAndPartition.partition),
          zkUtils.leaderAndIsrZkData(leaderIsrAndControllerEpoch.leaderAndIsr, controller.epoch))
        // 更新ControllerContext的partitionLeadershipInfo分区leader相关的信息
        controllerContext.partitionLeadershipInfo.put(topicAndPartition, leaderIsrAndControllerEpoch)
        // 添加LeaderAndIsr请求到队列，等待发送到指定的broker
        brokerRequestBatch.addLeaderAndIsrRequestForBrokers(liveAssignedReplicas, topicAndPartition.topic,
          topicAndPartition.partition, leaderIsrAndControllerEpoch, replicaAssignment)
      } catch {
        //......
      }
  }
}

3.4 electLeaderForPartition 当OfflinePartition、OnlinePartition 要切换成OnlinePartition状态时

# 根据指定的选举策略为分区选举新的Leader副本

# 将Leader和ISR信息更新到zookeeper对应的路径下

# 更新ControllerContext的 partitionLeadershipInfo分区的leader信息

# 将获取的Leader副本和ISR列表以及AR等信息，封装成LeaderAndIsrRequest，添加到待发送队列，等待被发送

def electLeaderForPartition(topic: String, partition: Int, leaderSelector: PartitionLeaderSelector) {
  val topicAndPartition = TopicAndPartition(topic, partition)
  // handle leader election for the partitions whose leader is no longer alive
  stateChangeLogger.trace("Controller %d epoch %d started leader election for partition %s"
                            .format(controllerId, controller.epoch, topicAndPartition))
  try {
    var zookeeperPathUpdateSucceeded: Boolean = false
    var newLeaderAndIsr: LeaderAndIsr = null
    var replicasForThisPartition: Seq[Int] = Seq.empty[Int]
    while(!zookeeperPathUpdateSucceeded) {
      // 从zk中获取分区当前的leader副本，ISR集合，zkversion等信息，如果不存在则抛出异常
      val currentLeaderIsrAndEpoch = getLeaderIsrAndEpochOrThrowException(topic, partition)
      val currentLeaderAndIsr = currentLeaderIsrAndEpoch.leaderAndIsr
      val controllerEpoch = currentLeaderIsrAndEpoch.controllerEpoch
      // 判断是否小于已有的controller epoch值，如果小于抛出异常
      if (controllerEpoch > controller.epoch) {
        val failMsg = ("aborted leader election for partition [%s,%d] since the LeaderAndIsr path was " +
                       "already written by another controller. This probably means that the current controller %d went through " +
                       "a soft failure and another controller was elected with epoch %d.")
                         .format(topic, partition, controllerId, controllerEpoch)
        stateChangeLogger.error("Controller %d epoch %d ".format(controllerId, controller.epoch) + failMsg)
        throw new StateChangeFailedException(failMsg)
      }
      //根据leaderSelector选举出新的Leader副本和ISR列表
      val (leaderAndIsr, replicas) = leaderSelector.selectLeader(topicAndPartition, currentLeaderAndIsr)
      // 将新的LeaderAndIsr信息保存到zookeeper
      val (updateSucceeded, newVersion) = ReplicationUtils.updateLeaderAndIsr(zkUtils, topic, partition,
        leaderAndIsr, controller.epoch, currentLeaderAndIsr.zkVersion)
      newLeaderAndIsr = leaderAndIsr
      newLeaderAndIsr.zkVersion = newVersion
      zookeeperPathUpdateSucceeded = updateSucceeded
      replicasForThisPartition = replicas
    }
    val newLeaderIsrAndControllerEpoch = new LeaderIsrAndControllerEpoch(newLeaderAndIsr, controller.epoch)
    // 更新ControllerContext的partitionLeadershipInfo分区leader信息
    controllerContext.partitionLeadershipInfo.put(TopicAndPartition(topic, partition), newLeaderIsrAndControllerEpoch)
    stateChangeLogger.trace("Controller %d epoch %d elected leader %d for Offline partition %s"
      .format(controllerId, controller.epoch, newLeaderAndIsr.leader, topicAndPartition))
    // 获取该分区AR副本集
    val replicas = controllerContext.partitionReplicaAssignment(TopicAndPartition(topic, partition))
    // 向队列添加LeaderAndIsrRequest，等待被发送到指定的broker
    brokerRequestBatch.addLeaderAndIsrRequestForBrokers(replicasForThisPartition, topic, partition,
      newLeaderIsrAndControllerEpoch, replicas)
  } catch {
    
  }
  debug("After leader election, leader cache is updated to %s".format(controllerContext.partitionLeadershipInfo.map(l => (l._1, l._2))))
}

莫言静好、

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
1
评论
PartitionStateMachine分析

PartitionStateMachine是Controller Leader用于维护分区状态的状态机，分区状态时PartitionState，它有四个子类：一分区的状态转换# NonExistentPartition -> NewPartition从zookeeper中加载partition的AR 集合到ControllerContext的p
复制链接

扫一扫