【kafka源码】ReassignPartitionsCommand源码原理分析(附配套教学视频)

2401_84140707

于 2024-04-11 07:07:34 发布

阅读量644

点赞数 18

分类专栏： 2024年程序员学习文章标签： kafka 音视频 linq

本文链接：https://blog.csdn.net/2401_84140707/article/details/137621808

版权

2024年程序员学习专栏收录该内容

63 篇文章 0 订阅

订阅专栏

检查配置中的Brokers-id是否都存在
如果发现已经存在副本重分配进程(检查是否有节点/admin/reassign_partitions),则检查是否需要更改限流; 如果有参数(--throttle,--replica-alter-log-dirs-throttle) 则设置限流信息; 而后不再执行下一步
如果当前没有执行中的副本重分配任务(检查是否有节点/admin/reassign_partitions),则开始进行副本重分配任务;

2.2.1 已有任务,尝试限流

如果zk中有节点/admin/reassign_partitions; 则表示当前已有一个任务在进行,那么当前操作就不继续了,如果有参数

--throttle：

--replica-alter-log-dirs-throttle：

则进行限制

限制当前移动副本的节流阀。请注意，此命令可用于更改节流阀，但如果某些代理已完成重新平衡，则它可能不会更改最初设置的所有限制。所以后面需要将这个限制给移除掉通过--verify

maybeLimit

def maybeLimit(throttle: Throttle): Unit = {

if (throttle.interBrokerLimit >= 0 || throttle.replicaAlterLogDirsLimit >= 0) {

//当前存在的broker

val existingBrokers = existingAssignment().values.flatten.toSeq

//期望的broker

val proposedBrokers = proposedPartitionAssignment.values.flatten.toSeq ++ proposedReplicaAssignment.keys.toSeq.map(_.brokerId())

//前面broker相加去重

val brokers = (existingBrokers ++ proposedBrokers).distinct

//遍历与之相关的Brokers, 添加限流配置写入到zk节点/config/broker/{brokerId}中

for (id <- brokers) {

//获取broker的配置 /config/broker/{brokerId}

val configs = adminZkClient.fetchEntityConfig(ConfigType.Broker, id.toString)

if (throttle.interBrokerLimit >= 0) {

configs.put(DynamicConfig.Broker.LeaderReplicationThrottledRateProp, throttle.interBrokerLimit.toString)

configs.put(DynamicConfig.Broker.FollowerReplicationThrottledRateProp, throttle.interBrokerLimit.toString)

}

if (throttle.replicaAlterLogDirsLimit >= 0)

configs.put(DynamicConfig.Broker.ReplicaAlterLogDirsIoMaxBytesPerSecondProp, throttle.replicaAlterLogDirsLimit.toString)

adminZkClient.changeBrokerConfig(Seq(id), configs)

}

/config/brokers/{brokerId}节点配置是Broker端的动态配置,不需要重启Broker实时生效;

如果传入了参数--throttle： 则从zk节点/config/brokers/{BrokerId}节点获取Broker们的配置信息,然后再加上以下两个配置重新写入到节点/config/brokers/{BrokerId}中

leader.replication.throttled.rate 控制leader副本端处理FETCH请求的速率

follower.replication.throttled.rate 控制follower副本发送FETCH请求的速率

如果传入了参数--replica-alter-log-dirs-throttle： 则将如下配置也写入节点中;

replica.alter.log.dirs.io.max.bytes.per.second: broker内部目录之间迁移数据流量限制功能，限制数据拷贝从一个目录到另外一个目录带宽上限

例如写入之后的数据

{“version”:1,“config”:{“leader.replication.throttled.rate”:“1”,“follower.replication.throttled.rate”:“1”}}

注意: 这里写入的限流配置,是写入所有与之相关的Broker的限流配置;

2.2.2 当前未有执行任务,开始执行副本重分配任务

ReassignPartitionsCommand.reassignPartitions

def reassignPartitions(throttle: Throttle = NoThrottle, timeoutMs: Long = 10000L): Boolean = {

//写入一些限流数据

maybeThrottle(throttle)

try {

//验证分区是否存在

val validPartitions = proposedPartitionAssignment.groupBy(_._1.topic())

.flatMap { case (topic, topicPartitionReplicas) =>

validatePartition(zkClient, topic, topicPartitionReplicas)

}

if (validPartitions.isEmpty) false

else {

if (proposedReplicaAssignment.nonEmpty && adminClientOpt.isEmpty)

throw new AdminCommandFailedException(“bootstrap-server needs to be provided in order to reassign replica to the specified log directory”)

val startTimeMs = System.currentTimeMillis()

// Send AlterReplicaLogDirsRequest to allow broker to create replica in the right log dir later if the replica has not been created yet.

if (proposedReplicaAssignment.nonEmpty)

alterReplicaLogDirsIgnoreReplicaNotAvailable(proposedReplicaAssignment, adminClientOpt.get, timeoutMs)

// Create reassignment znode so that controller will send LeaderAndIsrRequest to create replica in the broker

zkClient.createPartitionReassignment(validPartitions.map({case (key, value) => (new TopicPartition(key.topic, key.partition), value)}).toMap)

// Send AlterReplicaLogDirsRequest again to make sure broker will start to move replica to the specified log directory.

// It may take some time for controller to create replica in the broker. Retry if the replica has not been created.

var remainingTimeMs = startTimeMs + timeoutMs - System.currentTimeMillis()

val replicasAssignedToFutureDir = mutable.Set.empty[TopicPartitionReplica]

while (remainingTimeMs > 0 && replicasAssignedToFutureDir.size < proposedReplicaAssignment.size) {

replicasAssignedToFutureDir ++= alterReplicaLogDirsIgnoreReplicaNotAvailable(

proposedReplicaAssignment.filter { case (replica, _) => !replicasAssignedToFutureDir.contains(replica) },

adminClientOpt.get, remainingTimeMs)

Thread.sleep(100)

remainingTimeMs = startTimeMs + timeoutMs - System.currentTimeMillis()

}

replicasAssignedToFutureDir.size == proposedReplicaAssignment.size

}

} catch {

case _: NodeExistsException =>

val partitionsBeingReassigned = zkClient.getPartitionReassignment()

throw new AdminCommandFailedException("Partition reassignment currently in " +

“progress for %s. Aborting operation”.format(partitionsBeingReassigned))

}

maybeThrottle(throttle) 设置副本移动时候的限流配置,这个方法只用于任务初始化的时候

private def maybeThrottle(throttle: Throttle): Unit = {

if (throttle.interBrokerLimit >= 0)

assignThrottledReplicas(existingAssignment(), proposedPartitionAssignment, adminZkClient)

maybeLimit(throttle)

if (throttle.interBrokerLimit >= 0 || throttle.replicaAlterLogDirsLimit >= 0)

throttle.postUpdateAction()

if (throttle.interBrokerLimit >= 0)

println(s"The inter-broker throttle limit was set to ${throttle.interBrokerLimit} B/s")

if (throttle.replicaAlterLogDirsLimit >= 0)

println(s"The replica-alter-dir throttle limit was set to ${throttle.replicaAlterLogDirsLimit} B/s")

}

1.1 将一些topic的限流配置写入到节点/config/topics/{topicName}中

在这里插入图片描述

将计算得到的leader、follower 值写入到/config/topics/{topicName}中

leader: 找到 TopicPartition中有新增的副本的那个分区；数据= 分区号:副本号,分区号:副本号

follower: 遍历预期 TopicPartition,副本= 预期副本-现有副本；数据= 分区号:副本号,分区号:副本号

leader.replication.throttled.replicas: leader

follower.replication.throttled.replicas: follower

在这里插入图片描述

1.2. 执行《2.2.1 已有任务,尝试限流》流程

从zk中获取/broker/topics/{topicName}数据来验证给定的分区是否存在,如果分区不存在则忽略此分区的配置,继续流程
如果Json文件中存在指定Log Dir的情况，则发送 AlterReplicaLogDirsRequest 以允许代理稍后在正确的日志目录中创建副本。

比如：log_dirs 指定了文件存放目录

{“version”:1,“partitions”:[{“topic”:“Topic1”,“partition”:2,“replicas”:[1],“log_dirs”:[“/Users/shirenchuang/work/IdeaPj/didi_source/kafka/k0”]}]}

那么 AlterReplicaLogDirsRequest 请求就会先去创建对应的副本。具体的跨目录数据迁移请看跨目录数据迁移

{“version”:1,“partitions”:[{“topic”:“test_create_topic1”,“partition”:0,“replicas”:[0,1,2,3]},{“topic”:“test_create_topic1”,“partition”:1,“replicas”:[1,2,0,3]},{“topic”:“test_create_topic1”,“partition”:2,“replicas”:[2,1,0,3]}]}

再次发送 AlterReplicaLogDirsRequest以确保代理将开始将副本移动到指定的日志目录。控制器在代理中创建副本可能需要一些时间。如果尚未创建副本，请重试。
像Broker发送alterReplicaLogDirs请求

2.2.3 Controller监听`/admin/reassign_partitions`节点变化

KafkaController.processZkPartitionReassignment

private def processZkPartitionReassignment(): Set[TopicPartition] = {

// We need to register the watcher if the path doesn’t exist in order to detect future

// reassignments and we get the path exists check for free

if (isActive && zkClient.registerZNodeChangeHandlerAndCheckExistence(partitionReassignmentHandler)) {

val reassignmentResults = mutable.Map.empty[TopicPartition, ApiError]

val partitionsToReassign = mutable.Map.empty[TopicPartition, ReplicaAssignment]

zkClient.getPartitionReassignment().foreach { case (tp, targetReplicas) =>

maybeBuildReassignment(tp, Some(targetReplicas)) match {

case Some(context) => partitionsToReassign.put(tp, context)

case None => reassignmentResults.put(tp, new ApiError(Errors.NO_REASSIGNMENT_IN_PROGRESS))

}

reassignmentResults ++= maybeTriggerPartitionReassignment(partitionsToReassign)

val (partitionsReassigned, partitionsFailed) = reassignmentResults.partition(_._2.error == Errors.NONE)

if (partitionsFailed.nonEmpty) {

warn(s"Failed reassignment through zk with the following errors: $partitionsFailed")

maybeRemoveFromZkReassignment((tp, _) => partitionsFailed.contains(tp))

}

partitionsReassigned.keySet

} else {

Set.empty

}

判断是否是Controller角色并且是否存在节点/admin/reassign_partitions
maybeTriggerPartitionReassignment 重分配，如果topic已经被标记为删除了,则此topic流程终止;
maybeRemoveFromZkReassignment将执行失败的一些分区信息从zk中删除;(覆盖信息)

onPartitionReassignment

KafkaController.onPartitionReassignment

private def onPartitionReassignment(topicPartition: TopicPartition, reassignment: ReplicaAssignment): Unit = {

// 暂停一些正在删除的Topic操作

topicDeletionManager.markTopicIneligibleForDeletion(Set(topicPartition.topic), reason = “topic reassignment in progress”)

//更新当前的分配

updateCurrentReassignment(topicPartition, reassignment)

val addingReplicas = reassignment.addingReplicas

val removingReplicas = reassignment.removingReplicas

if (!isReassignmentComplete(topicPartition, reassignment)) {

// A1. Send LeaderAndIsr request to every replica in ORS + TRS (with the new RS, AR and RR).

updateLeaderEpochAndSendRequest(topicPartition, reassignment)

// A2. replicas in AR -> NewReplica

startNewReplicasForReassignedPartition(topicPartition, addingReplicas)

} else {

// B1. replicas in AR -> OnlineReplica

replicaStateMachine.handleStateChanges(addingReplicas.map(PartitionAndReplica(topicPartition, _)), OnlineReplica)

// B2. Set RS = TRS, AR = [], RR = [] in memory.

val completedReassignment = ReplicaAssignment(reassignment.targetReplicas)

controllerContext.updatePartitionFullReplicaAssignment(topicPartition, completedReassignment)

// B3. Send LeaderAndIsr request with a potential new leader (if current leader not in TRS) and

// a new RS (using TRS) and same isr to every broker in ORS + TRS or TRS

moveReassignedPartitionLeaderIfRequired(topicPartition, completedReassignment)

// B4. replicas in RR -> Offline (force those replicas out of isr)

// B5. replicas in RR -> NonExistentReplica (force those replicas to be deleted)

stopRemovedReplicasOfReassignedPartition(topicPartition, removingReplicas)

// B6. Update ZK with RS = TRS, AR = [], RR = [].

updateReplicaAssignmentForPartition(topicPartition, completedReassignment)

// B7. Remove the ISR reassign listener and maybe update the /admin/reassign_partitions path in ZK to remove this partition from it.

removePartitionFromReassigningPartitions(topicPartition, completedReassignment)

// B8. After electing a leader in B3, the replicas and isr information changes, so resend the update metadata request to every broker

sendUpdateMetadataRequest(controllerContext.liveOrShuttingDownBrokerIds.toSeq, Set(topicPartition))

// signal delete topic thread if reassignment for some partitions belonging to topics being deleted just completed

topicDeletionManager.resumeDeletionForTopics(Set(topicPartition.topic))

}

暂停一些正在删除的Topic操作
更新 Zk节点brokers/topics/{topicName},和内存中的当前分配状态。如果重新分配已经在进行中，那么新的重新分配将取代它并且一些副本将被关闭。

2.1 更新zk中的topic节点信息brokers/topics/{topicName},这里会标记AR哪些副本是新增的,RR哪些副本是要删除的;例如: 在这里插入图片描述

在这里插入图片描述

2.2 更新当前内存

2.3 如果重新分配已经在进行中，那么一些当前新增加的副本有可能被立即删除，在这种情况下，我们需要停止副本。

2.4 注册一个监听节点/brokers/topics/{topicName}/partitions/{分区号}/state变更的处理器PartitionReassignmentIsrChangeHandler

如果该分区的重新分配还没有完成(根据/brokers/topics/{topicName}/partitions/{分区号}/state里面的isr来判断是否已经包含了新增的BrokerId了);则

以下几个名称说明:

ORS: OriginReplicas 原先的副本

TRS: targetReplicas 将要变更成的目标副本

AR: adding_replicas 正在添加的副本

RR:removing_replicas 正在移除的副本

3.1 向 ORS + TRS 中的每个副本发送LeaderAndIsr请求（带有新的 RS、AR 和 RR）。

3.2 给新增加的AR副本进行状态变更成NewReplica ; 这个过程有发送LeaderAndIsrRequest详细请看【kafka源码】Controller中的状态机

2.2.4 Controller监听节点`brokers/topics/{topicName}`变化,检查是否有新增分区

这一个流程可以不必在意,因为在这里没有做任何事情;

上面的 2.2.3 的第2小段中不是有将新增的和删掉的副本写入到了 zk中吗

例如:

{“version”:2,“partitions”:{“2”:[0,1],“1”:[0,1],“0”:[0,1]},“adding_replicas”:{“2”:[1],“1”:[1],“0”:[1]},“removing_replicas”:{}}

Controller监听到这个节点之后,执行方法processPartitionModifications

KafkaController.processPartitionModifications

private def processPartitionModifications(topic: String): Unit = {

def restorePartitionReplicaAssignment(

topic: String,

newPartitionReplicaAssignment: Map[TopicPartition, ReplicaAssignment]

): Unit = {

info(“Restoring the partition replica assignment for topic %s”.format(topic))

//从zk节点中获取所有分区

val existingPartitions = zkClient.getChildren(TopicPartitionsZNode.path(topic))

//找到已经存在的分区

val existingPartitionReplicaAssignment = newPartitionReplicaAssignment

.filter(p => existingPartitions.contains(p._1.partition.toString))

.map { case (tp, _) =>

tp -> controllerContext.partitionFullReplicaAssignment(tp)

}.toMap

zkClient.setTopicAssignment(topic,

existingPartitionReplicaAssignment,

controllerContext.epochZkVersion)

}

if (!isActive) return

val partitionReplicaAssignment = zkClient.getFullReplicaAssignmentForTopics(immutable.Set(topic))

val partitionsToBeAdded = partitionReplicaAssignment.filter { case (topicPartition, _) =>

controllerContext.partitionReplicaAssignment(topicPartition).isEmpty

}

if (topicDeletionManager.isTopicQueuedUpForDeletion(topic)) {

if (partitionsToBeAdded.nonEmpty) {

warn(“Skipping adding partitions %s for topic %s since it is currently being deleted”

.format(partitionsToBeAdded.map(_._1.partition).mkString(“,”), topic))

restorePartitionReplicaAssignment(topic, partitionReplicaAssignment)

} else {

// This can happen if existing partition replica assignment are restored to prevent increasing partition count during topic deletion

info(“Ignoring partition change during topic deletion as no new partitions are added”)

}

} else if (partitionsToBeAdded.nonEmpty) {

info(s"New partitions to be added $partitionsToBeAdded")

partitionsToBeAdded.foreach { case (topicPartition, assignedReplicas) =>

controllerContext.updatePartitionFullReplicaAssignment(topicPartition, assignedReplicas)

}

onNewPartitionCreation(partitionsToBeAdded.keySet)

}

从brokers/topics/{topicName}中获取完整的分配信息,例如

{

“version”: 2,

“partitions”: {

“2”: [0, 1],

“1”: [0, 1],

“0”: [0, 1]

“adding_replicas”: {

“2”: [1],

“1”: [1],

“0”: [1]

“removing_replicas”: {}

}

如果有需要新增的分区,如下操作

2.1 如果当前Topic刚好在删掉队列中,那么就没有必要进行分区扩容了; 将zk的brokers/topics/{topicName}数据恢复回去

2.2 如果不在删除队列中,则开始走新增分区的流程；关于新增分区的流程在[【kafka源码】TopicCommand之创建Topic源码解析

]( )里面已经详细讲过了,跳转后请搜索关键词onNewPartitionCreation

如果该Topic正在删除中,则跳过该Topic的处理; 并且同时如果有AR(adding_replical)，则重写一下zk节点/broker/topics/{topicName}节点的数据; 相当于是还原数据; 移除掉里面的AR;

这一步完全不用理会,因为分区副本重分配不会出现新增分区的情况;

2.2.5 Controller监听zk节点`/brokers/topics/{topicName}/partitions/{分区号}/state`

上面2.2.3 里面的 2.4不是有说过注册一个监听节点/brokers/topics/{topicName}/partitions/{分区号}/state变更的处理器PartitionReassignmentIsrChangeHandler

到底是什么时候这个节点有变化呢? 前面我们不是对副本们发送了LEADERANDISR的请求么，当新增的副本去leader

fetch数据开始同步的时候,当数据同步完成跟上了ISR的节奏,就会去修改这个节点; 修改之后那么下面就开始执行监听流程了

这里跟 2.2.3 中有调用同一个接口; 不过这个时候经过了LeaderAndIsr请求

kafkaController.processPartitionReassignmentIsrChange->

private def processPartitionReassignmentIsrChange(topicPartition: TopicPartition): Unit = {

if (!isActive) return

if (controllerContext.partitionsBeingReassigned.contains(topicPartition)) {

val reassignment = controllerContext.partitionFullReplicaAssignment(topicPartition)

if (isReassignmentComplete(topicPartition, reassignment)) {

// resume the partition reassignment process

info(s"Target replicas ${reassignment.targetReplicas} have all caught up with the leader for " +

s"reassigning partition $topicPartition")

onPartitionReassignment(topicPartition, reassignment)

}

自我介绍一下，小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。

深知大多数Java工程师，想要提升技能，往往是自己摸索成长或者是报班学习，但对于培训机构动则几千的学费，着实压力不小。自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！

因此收集整理了一份《2024年Java开发全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，基本涵盖了95%以上Java开发知识点，真正体系化！

由于文件比较大，这里只是将部分目录大纲截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频，并且后续会持续更新

如果你觉得这些内容对你有帮助，可以添加V获取：vip1024b （备注Java）

总结

虽然面试套路众多，但对于技术面试来说，主要还是考察一个人的技术能力和沟通能力。不同类型的面试官根据自身的理解问的问题也不尽相同，没有规律可循。

上面提到的关于这些JAVA基础、三大框架、项目经验、并发编程、JVM及调优、网络、设计模式、spring+mybatis源码解读、Mysql调优、分布式监控、消息队列、分布式存储等等面试题笔记及资料

有些面试官喜欢问自己擅长的问题，比如在实际编程中遇到的或者他自己一直在琢磨的这方面的问题，还有些面试官，尤其是大厂的比如 BAT 的面试官喜欢问面试者认为自己擅长的，然后通过提问的方式深挖细节，刨根到底。

一个人可以走的很快，但一群人才能走的更远。不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎扫码加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！

6Scr-1712790437774)]

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，基本涵盖了95%以上Java开发知识点，真正体系化！

如果你觉得这些内容对你有帮助，可以添加V获取：vip1024b （备注Java）
[外链图片转存中…(img-dg1TLUiL-1712790437774)]

总结

[外链图片转存中…(img-OAR70MwF-1712790437775)]

[外链图片转存中…(img-MK9PPiKu-1712790437775)]

一个人可以走的很快，但一群人才能走的更远。不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎扫码加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！
[外链图片转存中…(img-QRtWyl31-1712790437776)]

2401_84140707

关注

18
点赞
踩
20

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

【kafka源码】ReassignPartitionsCommand源码原理分析(附配套教学视频)

2.2.1 已有任务,尝试限流

2.2.2 当前未有执行任务,开始执行副本重分配任务

2.2.3 Controller监听/admin/reassign_partitions节点变化

onPartitionReassignment

2.2.4 Controller监听节点brokers/topics/{topicName}变化,检查是否有新增分区

2.2.5 Controller监听zk节点/brokers/topics/{topicName}/partitions/{分区号}/state

总结

总结

2.2.3 Controller监听`/admin/reassign_partitions`节点变化

2.2.4 Controller监听节点`brokers/topics/{topicName}`变化,检查是否有新增分区

2.2.5 Controller监听zk节点`/brokers/topics/{topicName}/partitions/{分区号}/state`