kafka broker shutdown过程分析

最新推荐文章于 2024-07-13 15:14:03 发布

幽灵之使

最新推荐文章于 2024-07-13 15:14:03 发布

阅读量7.1k

点赞数

本文链接：https://blog.csdn.net/lizhitao/article/details/42266065

版权

apache kafka 专栏收录该内容

74 篇文章

订阅专栏

本文深入解析了 Kafka Broker 的关闭流程，包括通过 JMX 接口进行控制的实现方式，以及关闭过程中涉及的逻辑步骤，如检查控制器和目标 Broker 是否存活、获取所有分区并处理 Leader 和 Follower 分区的关闭请求。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

kafka broker shutdown过程分析

controlled shutdown通过给controller发送命令实现停止指定broker

实现方式很诡异，controller并没有提供任何socket或者http方式开放接口，而是提供了一个 jmx bean，命令行工具通过jmx revoke方式调用controller中提供的接口shutdownBroker

val jmxUrl = newJMXServiceURL("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi".format(controllerHost, controllerJmxPort))

info("Connecting to jmx url "+ jmxUrl)

val jmxc = JMXConnectorFactory.connect(jmxUrl,null)

val mbsc = jmxc.getMBeanServerConnection

val leaderPartitionsRemaining =mbsc.invoke(new ObjectName(KafkaController.MBeanName),

"shutdownBroker",

Array(params.brokerId),

Array(classOf[Int].getName)).asInstanceOf[Set[TopicAndPartition]]

shutdown broker的逻辑

检查请求的controller是否还存活
检查此broker是否还存活，如果存活，在controllerContext中更新shuttingDownBrokerId列表；
获取此broker上所有的partition；
对所有的partition做处理，分两种情况：

（1）partition的leader是此broker，调用 partitionStateMachine.handleStateChanges，

（2）partition的leader不是此broker，给其发送stopReplicaRequest，并调用 replicaStateMachine.handleStateChanges

def shutdownBroker(id: Int) : Set[TopicAndPartition]= {

if(!isActive()) {

thrownew ControllerMovedException("Controller moved to another broker. Aborting controlled shutdown")

}

controllerContext.brokerShutdownLock synchronized {

info("Shutting down broker "+ id)

inLock(controllerContext.controllerLock) {

if(!controllerContext.liveOrShuttingDownBrokerIds.contains(id))

thrownew BrokerNotAvailableException("Broker id %d does not exist.".format(id))

controllerContext.shuttingDownBrokerIds.add(id)

debug("All shutting down brokers: "+ controllerContext.shuttingDownBrokerIds.mkString(","))

debug("Live brokers: "+ controllerContext.liveBrokerIds.mkString(","))

}

//获取此broker上所有partition的副本因子

valallPartitionsAndReplicationFactorOnBroker:Set[(TopicAndPartition, Int)] =

inLock(controllerContext.controllerLock) {

controllerContext.partitionsOnBroker(id)

.map(topicAndPartition => (topicAndPartition, controllerContext.partitionReplicaAssignment(topicAndPartition).size))

}

allPartitionsAndReplicationFactorOnBroker.foreach {

case(topicAndPartition, replicationFactor)=>

// Move leadership serially to relinquish lock.

inLock(controllerContext.controllerLock) {

controllerContext.partitionLeadershipInfo.get(topicAndPartition).foreach { currLeaderIsrAndControllerEpoch=>

if(currLeaderIsrAndControllerEpoch.leaderAndIsr.leader ==id) {

// If the broker leads the topic partition, transition the leader and update isr. Updates zk and

// notifies all affected brokers

partitionStateMachine.handleStateChanges(Set(topicAndPartition), OnlinePartition,

controlledShutdownPartitionLeaderSelector)

}

else {

// Stop the replica first. The state change below initiates ZK changes which should take some time

// before which the stop replica request should be completed (in most cases)

// all requests are send in batch group by broker

brokerRequestBatch.newBatch()

brokerRequestBatch.addStopReplicaRequestForBrokers(Seq(id), topicAndPartition.topic,

topicAndPartition.partition, deletePartition= false)

brokerRequestBatch.sendRequestsToBrokers(epoch, controllerContext.correlationId.getAndIncrement)

// If the broker is a follower, updates the isr in ZK and notifies the current leader

replicaStateMachine.handleStateChanges(Set(PartitionAndReplica(topicAndPartition.topic,

topicAndPartition.partition, id)), OfflineReplica)

}

defreplicatedPartitionsBrokerLeads() =inLock(controllerContext.controllerLock) {

trace("All leaders = "+ controllerContext.partitionLeadershipInfo.mkString(","))

controllerContext.partitionLeadershipInfo.filter {

case(topicAndPartition, leaderIsrAndControllerEpoch) =>

leaderIsrAndControllerEpoch.leaderAndIsr.leader== id && controllerContext.partitionReplicaAssignment(topicAndPartition).size >1

}.map(_._1)

}

replicatedPartitionsBrokerLeads().toSet

}

partitionStateMachine.handleStateChanges 处理逻辑

从zk获取partition的controller epoch，防止controller发生变化，已经被其他controller更新了partition信息；
读取zk上partition的信息，从当前isr列表里清除已经shuttingDown的broker，然后选取第一个broker作为leader，返回partition最新的状态信息（leader, isr, 存活的replicas）；
使用新的partition信息更新zk上partition的信息；
更新controllerContext中缓存的partition信息；
更新partitionStateMachine中的partition状态（onlineState）
发送新的leaderAndIsrRequest给此partition当前可用的replica（通知它们新的leader是谁），发送updateMetaRequest给所有broker （此过程失败可能导致其他broker上的状态不一致，需要再次触发state change才行，处于TODO状态）

replicaStateMachine.handleStateChanges 处理逻辑

给此broker发送stopReplicaRequest
调用controller.removeReplicaFromIsr，从zk读取当前partition的状态，从isr中移除此broker，并更新zk信息（如果leader为此broker，则新leader被置为-1，代表没有leader，为什么没有选择isr中其他broker为leader?）
发送leaderAndIsrRequest到此partition的leader，发送updateMetaRequest给所有broker
更新ReplicaStateMachine中replicaState的状态

引用自陈尚安wiki