kafka源码(7)-控制管理模块之KafkaController

10 篇文章 0 订阅

目录

1. 概述

2.重要类介绍

3.主要流程

3.1 选举主控制器

3.2 管理分区和副本状态机

3.2 选举主副本

4.创建topic源码

4.1 command创建时Partition均匀分布于Broker的策略

4.2 kafka Controller监听到topic创建事件后的处理

4.3 Broker leader和follower的产生过程

4.4 总结


1. 概述

KafkaController是kafka集群的控制管理模块,每个broker都有一个控制器,但是有且存在一个leader,若干个follower。leader角色的控制器管理所有节点的信息,并通过向ZK注册各种监听事件来管理整个集群节点、分区的leader的选举、再平衡等问题。外部事件会更新ZK的数据,ZK中的数据一旦发生变化,控制器都要做不同的响应处理。

kafkaController通过注册多种监听器来完成响应的处理:


class KafkaController(val config: KafkaConfig,
                      zkClient: KafkaZkClient,
                      time: Time,
                      metrics: Metrics,
                      initialBrokerInfo: BrokerInfo,
                      initialBrokerEpoch: Long,
                      tokenManager: DelegationTokenManager,
                      threadNamePrefix: Option[String] = None)
  extends ControllerEventProcessor with Logging with KafkaMetricsGroup {

  this.logIdent = s"[Controller id=${config.brokerId}] "

  @volatile private var brokerInfo = initialBrokerInfo
  @volatile private var _brokerEpoch = initialBrokerEpoch

  private val stateChangeLogger = new StateChangeLogger(config.brokerId, inControllerContext = true, None)
  val controllerContext = new ControllerContext
  var controllerChannelManager = new ControllerChannelManager(controllerContext, config, time, metrics,
    stateChangeLogger, threadNamePrefix)

  private[controller] val kafkaScheduler = new KafkaScheduler(1)

  private[controller] val eventManager = new ControllerEventManager(config.brokerId, this, time,
    controllerContext.stats.rateAndTimeMetrics)

  private val brokerRequestBatch = new ControllerBrokerRequestBatch(config, controllerChannelManager,
    eventManager, controllerContext, stateChangeLogger)
  val replicaStateMachine: ReplicaStateMachine = new ZkReplicaStateMachine(config, stateChangeLogger, controllerContext, zkClient,
    new ControllerBrokerRequestBatch(config, controllerChannelManager, eventManager, controllerContext, stateChangeLogger))
  val partitionStateMachine: PartitionStateMachine = new ZkPartitionStateMachine(config, stateChangeLogger, controllerContext, zkClient,
    new ControllerBrokerRequestBatch(config, controllerChannelManager, eventManager, controllerContext, stateChangeLogger))
  val topicDeletionManager = new TopicDeletionManager(config, controllerContext, replicaStateMachine,
    partitionStateMachine, new ControllerDeletionClient(this, zkClient))

  private val controllerChangeHandler = new ControllerChangeHandler(eventManager)
  private val brokerChangeHandler = new BrokerChangeHandler(eventManager)
  private val brokerModificationsHandlers: mutable.Map[Int, BrokerModificationsHandler] = mutable.Map.empty
  private val topicChangeHandler = new TopicChangeHandler(eventManager)
  private val topicDeletionHandler = new TopicDeletionHandler(eventManager)
  private val partitionModificationsHandlers: mutable.Map[String, PartitionModificationsHandler] = mutable.Map.empty
  private val partitionReassignmentHandler = new PartitionReassignmentHandler(eventManager)
  private val preferredReplicaElectionHandler = new PreferredReplicaElectionHandler(eventManager)
  private val isrChangeNotificationHandler = new IsrChangeNotificationHandler(eventManager)
  private val logDirEventNotificationHandler = new LogDirEventNotificationHandler(eventManager)

2.重要类介绍

KafkaController

KafkaController作为kafka集群的控制者,有且存在一个leader,若干个follower。Leader能够发送具体的指令给follower,具体指令如:RequestKeys.LeaderAndIsrKey,RequestKeys.StopReplicaKey,RequestKeys.UpdateMetadataKey。

**Handler

kafkaController启动的时候会实例化**Handler,handler往ControllerEventManager队列中put,自己关心的事件(ControllerEvent)和需要监听zk节点的path,一个handler可能会关心多个ControllerEvent,其中有:ControllerChangeHandler、BrokerChangeHandler、BrokerModificationsHandler、TopicChangeHandler、TopicDeletionHandler、PartitionModificationsHandler、PartitionReassignmentHandler、PreferredReplicaElectionHandler、IsrChangeNotificationHandler、LogDirEventNotificationHandler

ControllerEvent

其中包括:MockEvent、ShutdownEventThread、ControllerChange、ReplicaLeaderElection、BrokerChange等

ControllerEventManager

kafkaController实例化Handler的时候&watcher监听到zk节点变化事件,会将事件放进ControllerEventManager队列,ControllerEventManager内部线程从队列中获取事件,调用ControllerEventProcessor的process方法,process根据不同的事件执行响应的处理

ControllerEventProcessor

根据ControllerEvent执行响应的处理

 

3.主要流程

3.1 选举主控制器

主控制器的选举主要利用的是ZK的leader选举机制,每个节点都会参与竞选主控制器,只有一个节点可以成为主控制器。其他节点只有在主控制器出现故障或会话失效时参与领导选举。每个节点都会作为ZK客户端,向ZK服务端尝试创建/controller的临时节点, 最终只有一个代理节点可以成功创建/controller节点。

各节点启动的时候都会通过创建/controller 节点竞选主控制器,但只有一个成为主控制器。多个节点都会注册会话失效监听器,并在/controller节点注册数据改变监听器

如果是主控制器产生会话失效,就会删除/controller临时节点。其他节点就会收到/controller节点的数据改变事件,它们的选举器都会尝试重新创建/controller竞选主控制器。

Kafka 是如何避免脑裂问题的呢?

  1. Controller 给 Broker 发送的请求中,都会携带 controller epoch 信息,如果 broker 发现当前请求的 epoch 小于缓存中的值,那么就证明这是来自旧 Controller 的请求,就会决绝这个请求,正常情况下是没什么问题的;
  2. 但是异常情况下呢?如果 Broker 先收到异常 Controller 的请求进行处理呢?现在看 Kafka 在这一部分并没有适合的方案;
  3. 正常情况下,Kafka 新的 Controller 选举出来之后,Controller 会向全局所有 broker 发送一个 metadata 请求,这样全局所有 Broker 都可以知道当前最新的 controller epoch,但是并不能保证可以完全避免上面这个问题,还是有出现这个问题的几率的,只不过非常小,而且即使出现了由于 Kafka 的高可靠架构,影响也非常有限,至少从目前看,这个问题并不是严重的问题。

3.2 管理分区和副本状态机

这个过程实际上也是基于Zookeeper实现了订阅发布系统,比如创建topic的时候发布者是TopicCommand类,订阅者是kafkaController类。再由kafkaController进行分区leader选举(副本列表第一个),然后给TopicCommand已经指定的各个Broker Follower发送LeaderAndIsrRequest,由根据我们TopicCommand中分区的分配的具体Broker去启动副本为leader(leader的被分配的Brokerid和当前Broker的id相等)或者Follower。

分区和副本都有四种状态: 新建、在线、离线和不存在。分区的四种状态为:

  • 新增分区(NewPartition)
  • 在线分区(OnlinePartition)
  • 离线分区(OfflinePartition)
  • 不存在分区(NonExistenPartition)

副本的四种状态:

  • 新建副本(NewReplica)
  • 在线副本(OnlineReplica)
  • 离线副本(OfflineReplica)
  • 不存在副本(NonExistentReplica)

当外部事件发生变化的时候会调用状态机的状态转移方法,根据不同的状态做出不同的响应。控制器通过分区和副本状态机来管理集群节点、实现主副本选举已经再平衡操作等。

 

KafkaController监听流程:

3.2 选举主副本

分区从”下线状态“或”上线状态“到“上线状态”都要重新选举分区的主副本:

  • 首先读取分区当前的主副本、ISR集合
  • 优先从ISR中选择第一个副本作为主副本。 如果第一副本挂了,就会选择其他副本作为主副本。
  • 如果ISR都挂了,那么就从AR(所有的副本)中选择第一个存活的副本作为主副本。

选择最优副本选举(第一个副本作为leader)是为了分区平衡,因为kafka的分区分配操作保证了分区的主副本会均匀的分布在所有节点上。kafka为了均衡将所有主副本均衡地分配到各个节点上。会有一个分区平衡的后台线程,定时检查最优副本(第一个副本)是不是主副本,如果不是则会进行重新选举,将最优副本作为主副本。KafkaController会发送LeaderAndIsr请求给分区的所有存活副本,让这些节点更新元数据。

 

4.创建topic源码

4.1 command创建时Partition均匀分布于Broker的策略

源码执行的具体过程,TopicCommand.main调用adminClient发送CreateTopicsRequest

val createResult = adminClient.createTopics(Collections.singleton(newTopic))

KafkaApis处理CreateTopicsRequest

class KafkaApis(val requestChannel: RequestChannel,
                val replicaManager: ReplicaManager,
                val adminManager: AdminManager,
                val groupCoordinator: GroupCoordinator,
                val txnCoordinator: TransactionCoordinator,
                val controller: KafkaController,
                val zkClient: KafkaZkClient,
                val brokerId: Int,
                val config: KafkaConfig,
                val metadataCache: MetadataCache,
                val metrics: Metrics,
                val authorizer: Option[Authorizer],
                val quotas: QuotaManagers,
                val fetchManager: FetchManager,
                brokerTopicStats: BrokerTopicStats,
                val clusterId: String,
                time: Time,
                val tokenManager: DelegationTokenManager) extends Logging {

  def handle(request: RequestChannel.Request): Unit = {
    try {
      request.header.apiKey match {
        // 处理创建topic
        case ApiKeys.CREATE_TOPICS => handleCreateTopicsRequest(request)

      // 创建topic
      adminManager.createTopics(createTopicsRequest.data.timeoutMs,
          createTopicsRequest.data.validateOnly,
          toCreate,
          authorizedForDescribeConfigs,
          handleCreateTopicsResults)

adminManager进行partition和replica分配


class AdminManager(val config: KafkaConfig,
                   val metrics: Metrics,
                   val metadataCache: MetadataCache,
                   val zkClient: KafkaZkClient) extends Logging with KafkaMetricsGroup {

        // 计算和分配topic、partition、replica
          AdminUtils.assignReplicasToBrokers(
            brokers, resolvedNumPartitions, resolvedReplicationFactor)

        // 调用zk创建topic
          adminZkClient.createTopicWithAssignment(topic.name, configs, assignments)

计算和分配topic、partition、replica:
1、随机获取一个broker的位置作为startIndex

2、设置当前分区的id的值>=0

3、随机选取Broker数目范围内的位移作为下一个副本replica的位置

4、遍历分区个数

4.1、当前分区id加上起始位置,对Brokersize取余得到第一个副本所属的broker位置

4.2、遍历副本个数

          4.2.1、计算出每个副本的位置 计算方法是replicaIndex:

4.3、当前副本id+1


  private def assignReplicasToBrokersRackUnaware(nPartitions: Int,
                                                 replicationFactor: Int,
                                                 brokerList: Seq[Int],
                                                 fixedStartIndex: Int,
                                                 startPartitionId: Int): Map[Int, Seq[Int]] = {
    val ret = mutable.Map[Int, Seq[Int]]()
    val brokerArray = brokerList.toArray
    // 随机选取一个Broker位置作为startIndex
    val startIndex = if (fixedStartIndex >= 0) fixedStartIndex else rand.nextInt(brokerArray.length)
    // 当前分区Id赋值为0
    var currentPartitionId = math.max(0, startPartitionId)
    // 随机选取Broker数目范围内的位移
    var nextReplicaShift = if (fixedStartIndex >= 0) fixedStartIndex else rand.nextInt(brokerArray.length)
    for (_ <- 0 until nPartitions) {
      // 只有在所有遍历过Broker数目个分区后才将位移加一
      if (currentPartitionId > 0 && (currentPartitionId % brokerArray.length == 0))
        nextReplicaShift += 1
      // 当前分区id加上起始位置,对Brokersize取余得到第一个副本所属的broker位置
      val firstReplicaIndex = (currentPartitionId + startIndex) % brokerArray.length
      val replicaBuffer = mutable.ArrayBuffer(brokerArray(firstReplicaIndex))
      for (j <- 0 until replicationFactor - 1)
      // 计算出每个副本的位置 计算方法是replicaIndex:
        replicaBuffer += brokerArray(replicaIndex(firstReplicaIndex, nextReplicaShift, j, brokerArray.length))
      ret.put(currentPartitionId, replicaBuffer)
      // 分区id加一
      currentPartitionId += 1
    }
    ret
  }

 

写配置:Zookeeper的目录是:/config/topics/TopicName

写topic相关,Zookeeper的目录是:/brokers/topics/TopicName


    // write out the config if there is any, this isn't transactional with the partition assignments
    zkClient.setOrCreateEntityConfigs(ConfigType.Topic, topic, config)

    // create the partition assignment
    writeTopicPartitionAssignment(topic, partitionReplicaAssignment.mapValues(ReplicaAssignment(_)).toMap, isUpdate = false)

 

4.2 kafka Controller监听到topic创建事件后的处理

KafkaController启动的时候实例化**Handler,**Handler往队列中put事件进行监听path初始化和相关操作。

class KafkaController(val config: KafkaConfig,
                      zkClient: KafkaZkClient,
                      time: Time,
                      metrics: Metrics,
                      initialBrokerInfo: BrokerInfo,
                      initialBrokerEpoch: Long,
                      tokenManager: DelegationTokenManager,
                      threadNamePrefix: Option[String] = None)
  extends ControllerEventProcessor with Logging with KafkaMetricsGroup {
  // 实例化TopicChangeHandler
  private val topicChangeHandler = new TopicChangeHandler(eventManager)


class BrokerChangeHandler(eventManager: ControllerEventManager) extends ZNodeChildChangeHandler {
  // 需要注册监听的path
  override val path: String = BrokerIdsZNode.path

  override def handleChildChange(): Unit = {
    // 往队里中注册BrokerChange事件
    eventManager.put(BrokerChange)
  }
}

class ControllerEventManager(controllerId: Int,
                             processor: ControllerEventProcessor,
                             time: Time,
                             rateAndTimeMetrics: Map[ControllerState, KafkaTimer]) extends KafkaMetricsGroup {
  // kafkacontroller实例化&zk节点变更watcher事件都会丢进队列
  def put(event: ControllerEvent): QueuedEvent = inLock(putLock) {
    val queuedEvent = new QueuedEvent(event, time.milliseconds())
    queue.put(queuedEvent)
    queuedEvent
  }

  class ControllerEventThread(name: String) extends ShutdownableThread(name = name, isInterruptible = false) {
    logIdent = s"[ControllerEventThread controllerId=$controllerId] "

    override def doWork(): Unit = {
      val dequeued = queue.take()
      dequeued.event match {
        case ShutdownEventThread => // The shutting down of the thread has been initiated at this point. Ignore this event.
        case controllerEvent =>
          _state = controllerEvent.state

          eventQueueTimeHist.update(time.milliseconds() - dequeued.enqueueTimeMs)

          try {
            // 从队列中获取并处理事件
            def process(): Unit = dequeued.process(processor)

            rateAndTimeMetrics.get(state) match {
              case Some(timer) => timer.time { process() }
              case None => process()
            }
          } catch {
            case e: Throwable => error(s"Uncaught error processing event $controllerEvent", e)
          }

          _state = ControllerState.Idle
      }
    }
  }

}

proccessor处理事件:


  override def process(event: ControllerEvent): Unit = {
    try {
      event match {
        case event: MockEvent =>
          ............
        case BrokerChange =>
          processBrokerChange()
        case BrokerModifications(brokerId) =>
          processBrokerModification(brokerId)
        case ControllerChange =>
          processControllerChange()
        case Reelect =>
          processReelect()
        case RegisterBrokerAndReelect =>
          processRegisterBrokerAndReelect()
        case Expire =>
          processExpire()
        // 处理topic改变事件
        case TopicChange =>
          processTopicChange()
        ...........
        case Startup =>
          processStartup()
      }
    } catch {
        ..........
    } finally {
      updateMetrics()
    }
  }

真正处理topic改变逻辑:


  private def processTopicChange(): Unit = {
    if (!isActive) return
    // 在zk上面创建topic节点
    val topics = zkClient.getAllTopicsInCluster(true)
    // 获取新增topic
    val newTopics = topics -- controllerContext.allTopics
    // 获取删除topic
    val deletedTopics = controllerContext.allTopics -- topics
    controllerContext.allTopics = topics
    // 注册path和handler对应的处理关系,将Handler放进zNodeChangeHandlers注册watcher
    registerPartitionModificationsHandlers(newTopics.toSeq)
    // 获取分区副本分配策略HashMap[TopicAndPartition, Seq[Int]]
    val addedPartitionReplicaAssignment = zkClient.getFullReplicaAssignmentForTopics(newTopics)
    deletedTopics.foreach(controllerContext.removeTopic)
    addedPartitionReplicaAssignment.foreach {
      case (topicAndPartition, newReplicaAssignment) => controllerContext.updatePartitionFullReplicaAssignment(topicAndPartition, newReplicaAssignment)
    }
    info(s"New topics: [$newTopics], deleted topics: [$deletedTopics], new partition replica assignment " +
      s"[$addedPartitionReplicaAssignment]")
    if (addedPartitionReplicaAssignment.nonEmpty)
      // 进入具体的操作
      onNewPartitionCreation(addedPartitionReplicaAssignment.keySet)
  }

PartitionStateMachine执行分区和副本的handleStateChanges

  private def onNewPartitionCreation(newPartitions: Set[TopicPartition]): Unit = {
    info(s"New partition creation callback for ${newPartitions.mkString(",")}")
    // 将新建分区的状态转化为NewPartition状态
    partitionStateMachine.handleStateChanges(newPartitions.toSeq, NewPartition)
    // 将新建副本的状态转化为NewReplica状态
    replicaStateMachine.handleStateChanges(controllerContext.replicasForPartition(newPartitions).toSeq, NewReplica)
    // 将新建副本的状态转化为OnlinePartition状态
    partitionStateMachine.handleStateChanges(
      newPartitions.toSeq,
      OnlinePartition,
      Some(OfflinePartitionLeaderElectionStrategy(false))
    )
    // 将新建副本的状态转化为OnlineReplica状态
    replicaStateMachine.handleStateChanges(controllerContext.replicasForPartition(newPartitions).toSeq, OnlineReplica)
  }

将新建分区的状态转化为NewPartition状态&将新建分区的状态转化为OnlinePartition状态

private def doHandleStateChanges(
    partitions: Seq[TopicPartition],
    targetState: PartitionState,
    partitionLeaderElectionStrategyOpt: Option[PartitionLeaderElectionStrategy]
  ): Map[TopicPartition, Either[Throwable, LeaderAndIsr]] = {
    val stateChangeLog = stateChangeLogger.withControllerEpoch(controllerContext.epoch)
    partitions.foreach(partition => controllerContext.putPartitionStateIfNotExists(partition, NonExistentPartition))
    val (validPartitions, invalidPartitions) = controllerContext.checkValidPartitionStateChange(partitions, targetState)
    invalidPartitions.foreach(partition => logInvalidTransition(partition, targetState))

    targetState match {
      // 处理NewPartition
      case NewPartition =>
        validPartitions.foreach { partition =>
          stateChangeLog.trace(s"Changed partition $partition state from ${partitionState(partition)} to $targetState with " +
            s"assigned replicas ${controllerContext.partitionReplicaAssignment(partition).mkString(",")}")
          controllerContext.putPartitionState(partition, NewPartition)
        }
        Map.empty
        // 处理OnlinePartition
      case OnlinePartition =>
        val uninitializedPartitions = validPartitions.filter(partition => partitionState(partition) == NewPartition)
        val partitionsToElectLeader = validPartitions.filter(partition => partitionState(partition) == OfflinePartition || partitionState(partition) == OnlinePartition)
        if (uninitializedPartitions.nonEmpty) {
          // 初始化leader
          val successfulInitializations = initializeLeaderAndIsrForPartitions(uninitializedPartitions)
          successfulInitializations.foreach { partition =>
            stateChangeLog.trace(s"Changed partition $partition from ${partitionState(partition)} to $targetState with state " +
              s"${controllerContext.partitionLeadershipInfo(partition).leaderAndIsr}")
            controllerContext.putPartitionState(partition, OnlinePartition)
          }
        }
        if (partitionsToElectLeader.nonEmpty) {
          val electionResults = electLeaderForPartitions(
            partitionsToElectLeader,
            partitionLeaderElectionStrategyOpt.getOrElse(
              throw new IllegalArgumentException("Election strategy is a required field when the target state is OnlinePartition")
            )
          )

          electionResults.foreach {
            case (partition, Right(leaderAndIsr)) =>
              stateChangeLog.trace(
                s"Changed partition $partition from ${partitionState(partition)} to $targetState with state $leaderAndIsr"
              )
              controllerContext.putPartitionState(partition, OnlinePartition)
            case (_, Left(_)) => // Ignore; no need to update partition state on election error
          }

          electionResults
        } else {
          Map.empty
        }
      case OfflinePartition =>
        validPartitions.foreach { partition =>
          stateChangeLog.trace(s"Changed partition $partition state from ${partitionState(partition)} to $targetState")
          controllerContext.putPartitionState(partition, OfflinePartition)
        }
        Map.empty
      case NonExistentPartition =>
        validPartitions.foreach { partition =>
          stateChangeLog.trace(s"Changed partition $partition state from ${partitionState(partition)} to $targetState")
          controllerContext.putPartitionState(partition, NonExistentPartition)
        }
        Map.empty
    }
  }

在initializeLeaderAndIsrForPartitions 第一个seq中的Broker当做leader

val leaderIsrAndControllerEpochs = partitionsWithLiveReplicas.map { case (partition, liveReplicas) =>
      // 选择第一个副本节点为leader
      val leaderAndIsr = LeaderAndIsr(liveReplicas.head, liveReplicas.toList)
      val leaderIsrAndControllerEpoch = LeaderIsrAndControllerEpoch(leaderAndIsr, controllerContext.epoch)
      partition -> leaderIsrAndControllerEpoch
    }.toMap
    val createResponses = try {
      zkClient.createTopicPartitionStatesRaw(leaderIsrAndControllerEpochs, controllerContext.epochZkVersion)
    }

........

    // 放到上下文controllerContext
    controllerContext.partitionLeadershipInfo.put(partition, leaderIsrAndControllerEpoch)
    // 放到leaderAndIsrRequestMap                   controllerBrokerRequestBatch.addLeaderAndIsrRequestForBrokers(leaderIsrAndControllerEpoch.leaderAndIsr.isr,
          partition, leaderIsrAndControllerEpoch, controllerContext.partitionFullReplicaAssignment(partition), isNew = true)

topic 分区 副本 放入leaderAndIsrRequestMap,以便我们可以通过Brokerid找到


  def addLeaderAndIsrRequestForBrokers(brokerIds: Seq[Int],
                                       topicPartition: TopicPartition,
                                       leaderIsrAndControllerEpoch: LeaderIsrAndControllerEpoch,
                                       replicaAssignment: ReplicaAssignment,
                                       isNew: Boolean): Unit = {

    brokerIds.filter(_ >= 0).foreach { brokerId =>
      val result = leaderAndIsrRequestMap.getOrElseUpdate(brokerId, mutable.Map.empty)
      val alreadyNew = result.get(topicPartition).exists(_.isNew)
      val leaderAndIsr = leaderIsrAndControllerEpoch.leaderAndIsr
      result.put(topicPartition, new LeaderAndIsrPartitionState()
        .setTopicName(topicPartition.topic)
        .setPartitionIndex(topicPartition.partition)
        .setControllerEpoch(leaderIsrAndControllerEpoch.controllerEpoch)
        .setLeader(leaderAndIsr.leader)
        .setLeaderEpoch(leaderAndIsr.leaderEpoch)
        .setIsr(leaderAndIsr.isr.map(Integer.valueOf).asJava)
        .setZkVersion(leaderAndIsr.zkVersion)
        .setReplicas(replicaAssignment.replicas.map(Integer.valueOf).asJava)
        .setAddingReplicas(replicaAssignment.addingReplicas.map(Integer.valueOf).asJava)
        .setRemovingReplicas(replicaAssignment.removingReplicas.map(Integer.valueOf).asJava)
        .setIsNew(isNew || alreadyNew))
    }

最后调用需要通知的broker  sendRequestsToBrokers


  def sendRequestsToBrokers(controllerEpoch: Int): Unit = {
    try {
      val stateChangeLog = stateChangeLogger.withControllerEpoch(controllerEpoch)
      sendLeaderAndIsrRequest(controllerEpoch, stateChangeLog)
      sendUpdateMetadataRequests(controllerEpoch, stateChangeLog)
      sendStopReplicaRequests(controllerEpoch)
    } catch {
    }
  }

4.3 Broker leader和follower的产生过程

在Broker接收到Controller的LeaderAndIsrRequest消息后,交由kafkaApis的handle处理

case RequestKeys.LeaderAndIsrKey => handleLeaderAndIsrRequest(request)

当前Broker成为副本的leader或者follower的入口函数:replicaManager.becomeLeaderOrFollower

当前Broker能不能成为Broker,取决于Brokerid是否与leader分配的Brokerid一致,一致就会成为leader,否则follower


        val highWatermarkCheckpoints = new LazyOffsetCheckpoints(this.highWatermarkCheckpoints)
        val partitionsBecomeLeader = if (partitionsTobeLeader.nonEmpty)
          // 成为leader
          makeLeaders(controllerId, controllerEpoch, partitionsTobeLeader, correlationId, responseMap,
            highWatermarkCheckpoints)
        else
          Set.empty[Partition]
        val partitionsBecomeFollower = if (partitionsToBeFollower.nonEmpty)
        // 成为follower
          makeFollowers(controllerId, controllerEpoch, partitionsToBeFollower, correlationId, responseMap,
            highWatermarkCheckpoints)
        else
          Set.empty[Partition]

使当前Broker成为给定分区的leader ,需要做以下几个处理:

* 1,停止掉这些分区的fetchers

* 2,更新缓存的当前分区的元数据

* 3,将分区加入leader 分区集合

 // First stop fetchers for all the partitions
      replicaFetcherManager.removeFetcherForPartitions(partitionStates.keySet.map(_.topicPartition))
      // Update the partition information to be the leader
      partitionStates.foreach { case (partition, partitionState) =>
        try {
          if (partition.makeLeader(controllerId, partitionState, correlationId, highWatermarkCheckpoints)) {

当前Broker成为给定分区的follower要做要做以下几个处理:

* 1,将分区从leader partition 集合中移除

* 2,将副本标记为follower ,目的是不让生产者继续往该副本生产消息

* 3,停止掉该分区的所有fetcher,目的是不让副本fetcher线程往该副本写数据。

* 4,清空当前分区的log和Checkpoint offsets

* 5,假如Broker没有挂掉,增加从新leader获取数据的副本fetcher线程

metadataCache.getAliveBrokers.find(_.id == newLeaderBrokerId) match {
            // Only change partition state when the leader is available
            case Some(_) =>
              if (partition.makeFollower(controllerId, partitionState, correlationId, highWatermarkCheckpoints))

.....
      replicaFetcherManager.removeFetcherForPartitions(partitionsToMakeFollower.map(_.topicPartition))

      partitionsToMakeFollower.foreach { partition =>
        completeDelayedFetchOrProduceRequests(partition.topicPartition)
      }



            replicaFetcherManager.addFetcherForPartitions(partitionsToMakeFollowerWithLeaderAndOffset)

4.4 总结

本文源码主要是以topic的创建过程,大致过程如下:

1. admin修改zk节点

2.controller的watcher监听zk节点,选举副本的leader,通知brokers

3.broker根据选举结果是leader或follower做出相应处理

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值