kafka broker shutdown过程分析
controlled shutdown通过给controller发送命令实现停止指定broker
实现方式很诡异,controller并没有提供任何socket或者http方式开放接口,而是提供了一个 jmx bean,命令行工具通过jmx revoke方式调用controller中提供的接口shutdownBroker
val jmxUrl = newJMXServiceURL("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi".format(controllerHost, controllerJmxPort)) info("Connecting to jmx url "+ jmxUrl) val jmxc = JMXConnectorFactory.connect(jmxUrl,null) val mbsc = jmxc.getMBeanServerConnection val leaderPartitionsRemaining =mbsc.invoke(new ObjectName(KafkaController.MBeanName), "shutdownBroker", Array(params.brokerId), Array(classOf[Int].getName)).asInstanceOf[Set[TopicAndPartition]] |
shutdown broker的逻辑
- 检查请求的controller是否还存活
- 检查此broker是否还存活,如果存活,在controllerContext中更新shuttingDownBrokerId列表;
- 获取此broker上所有的partition;
- 对所有的partition做处理,分两种情况:
(1)partition的leader是此broker,调用 partitionStateMachine.handleStateChanges,
(2)partition的leader不是此broker,给其发送stopReplicaRequest,并调用 replicaStateMachine.handleStateChanges
def shutdownBroker(id: Int) : Set[TopicAndPartition]= { if(!isActive()) { thrownew ControllerMovedException("Controller moved to another broker. Aborting controlled shutdown") } controllerContext.brokerShutdownLock synchronized { info("Shutting down broker "+ id) inLock(controllerContext.controllerLock) { if(!controllerContext.liveOrShuttingDownBrokerIds.contains(id)) thrownew BrokerNotAvailableException("Broker id %d does not exist.".format(id)) controllerContext.shuttingDownBrokerIds.add(id) debug("All shutting down brokers: "+ controllerContext.shuttingDownBrokerIds.mkString(",")) debug("Live brokers: "+ controllerContext.liveBrokerIds.mkString(",")) } //获取此broker上所有partition的副本因子 valallPartitionsAndReplicationFactorOnBroker:Set[(TopicAndPartition, Int)] = inLock(controllerContext.controllerLock) { controllerContext.partitionsOnBroker(id) .map(topicAndPartition => (topicAndPartition, controllerContext.partitionReplicaAssignment(topicAndPartition).size)) }
allPartitionsAndReplicationFactorOnBroker.foreach { case(topicAndPartition, replicationFactor)=> // Move leadership serially to relinquish lock. inLock(controllerContext.controllerLock) { controllerContext.partitionLeadershipInfo.get(topicAndPartition).foreach { currLeaderIsrAndControllerEpoch=> if(currLeaderIsrAndControllerEpoch.leaderAndIsr.leader ==id) { // If the broker leads the topic partition, transition the leader and update isr. Updates zk and // notifies all affected brokers partitionStateMachine.handleStateChanges(Set(topicAndPartition), OnlinePartition, controlledShutdownPartitionLeaderSelector) } else { // Stop the replica first. The state change below initiates ZK changes which should take some time // before which the stop replica request should be completed (in most cases) // all requests are send in batch group by broker brokerRequestBatch.newBatch() brokerRequestBatch.addStopReplicaRequestForBrokers(Seq(id), topicAndPartition.topic, topicAndPartition.partition, deletePartition= false) brokerRequestBatch.sendRequestsToBrokers(epoch, controllerContext.correlationId.getAndIncrement) // If the broker is a follower, updates the isr in ZK and notifies the current leader replicaStateMachine.handleStateChanges(Set(PartitionAndReplica(topicAndPartition.topic, topicAndPartition.partition, id)), OfflineReplica) } } } } defreplicatedPartitionsBrokerLeads() =inLock(controllerContext.controllerLock) { trace("All leaders = "+ controllerContext.partitionLeadershipInfo.mkString(",")) controllerContext.partitionLeadershipInfo.filter { case(topicAndPartition, leaderIsrAndControllerEpoch) => leaderIsrAndControllerEpoch.leaderAndIsr.leader== id && controllerContext.partitionReplicaAssignment(topicAndPartition).size >1 }.map(_._1) } replicatedPartitionsBrokerLeads().toSet } } |
partitionStateMachine.handleStateChanges 处理逻辑
- 从zk获取partition的controller epoch,防止controller发生变化,已经被其他controller更新了partition信息;
- 读取zk上partition的信息,从当前isr列表里清除已经shuttingDown的broker,然后选取第一个broker作为leader,返回partition最新的状态信息(leader, isr, 存活的replicas);
- 使用新的partition信息更新zk上partition的信息;
- 更新controllerContext中缓存的partition信息;
- 更新partitionStateMachine中的partition状态(onlineState)
- 发送新的leaderAndIsrRequest给此partition当前可用的replica(通知它们新的leader是谁),发送updateMetaRequest给所有broker (此过程失败可能导致其他broker上的状态不一致,需要再次触发state change才行,处于TODO状态)
replicaStateMachine.handleStateChanges 处理逻辑
- 给此broker发送stopReplicaRequest
- 调用controller.removeReplicaFromIsr,从zk读取当前partition的状态,从isr中移除此broker,并更新zk信息(如果leader为此broker,则新leader被置为-1,代表没有leader,为什么没有选择isr中其他broker为leader?)
- 发送leaderAndIsrRequest到此partition的leader,发送updateMetaRequest给所有broker
- 更新ReplicaStateMachine中replicaState的状态