【kafka源码】ReassignPartitionsCommand源码原理分析(附配套教学视频)

  1. 检查配置中的Brokers-id是否都存在

  2. 如果发现已经存在副本重分配进程(检查是否有节点/admin/reassign_partitions),则检查是否需要更改限流; 如果有参数(--throttle,--replica-alter-log-dirs-throttle) 则设置限流信息; 而后不再执行下一步

  3. 如果当前没有执行中的副本重分配任务(检查是否有节点/admin/reassign_partitions),则开始进行副本重分配任务;

2.2.1 已有任务,尝试限流

如果zk中有节点/admin/reassign_partitions; 则表示当前已有一个任务在进行,那么当前操作就不继续了,如果有参数

--throttle:

--replica-alter-log-dirs-throttle:

则进行限制

限制当前移动副本的节流阀。请注意,此命令可用于更改节流阀,但如果某些代理已完成重新平衡,则它可能不会更改最初设置的所有限制。所以后面需要将这个限制给移除掉 通过--verify

maybeLimit

def maybeLimit(throttle: Throttle): Unit = {

if (throttle.interBrokerLimit >= 0 || throttle.replicaAlterLogDirsLimit >= 0) {

//当前存在的broker

val existingBrokers = existingAssignment().values.flatten.toSeq

//期望的broker

val proposedBrokers = proposedPartitionAssignment.values.flatten.toSeq ++ proposedReplicaAssignment.keys.toSeq.map(_.brokerId())

//前面broker相加去重

val brokers = (existingBrokers ++ proposedBrokers).distinct

//遍历与之相关的Brokers, 添加限流配置写入到zk节点/config/broker/{brokerId}中

for (id <- brokers) {

//获取broker的配置 /config/broker/{brokerId}

val configs = adminZkClient.fetchEntityConfig(ConfigType.Broker, id.toString)

if (throttle.interBrokerLimit >= 0) {

configs.put(DynamicConfig.Broker.LeaderReplicationThrottledRateProp, throttle.interBrokerLimit.toString)

configs.put(DynamicConfig.Broker.FollowerReplicationThrottledRateProp, throttle.interBrokerLimit.toString)

}

if (throttle.replicaAlterLogDirsLimit >= 0)

configs.put(DynamicConfig.Broker.ReplicaAlterLogDirsIoMaxBytesPerSecondProp, throttle.replicaAlterLogDirsLimit.toString)

adminZkClient.changeBrokerConfig(Seq(id), configs)

}

}

}

/config/brokers/{brokerId}节点配置是Broker端的动态配置,不需要重启Broker实时生效;

  1. 如果传入了参数--throttle: 则从zk节点/config/brokers/{BrokerId}节点获取Broker们的配置信息,然后再加上以下两个配置重新写入到节点/config/brokers/{BrokerId}

leader.replication.throttled.rate 控制leader副本端处理FETCH请求的速率

follower.replication.throttled.rate 控制follower副本发送FETCH请求的速率

  1. 如果传入了参数--replica-alter-log-dirs-throttle: 则将如下配置也写入节点中;

replica.alter.log.dirs.io.max.bytes.per.second: broker内部目录之间迁移数据流量限制功能,限制数据拷贝从一个目录到另外一个目录带宽上限

例如写入之后的数据

{“version”:1,“config”:{“leader.replication.throttled.rate”:“1”,“follower.replication.throttled.rate”:“1”}}

注意: 这里写入的限流配置,是写入所有与之相关的Broker的限流配置;

2.2.2 当前未有执行任务,开始执行副本重分配任务

ReassignPartitionsCommand.reassignPartitions

def reassignPartitions(throttle: Throttle = NoThrottle, timeoutMs: Long = 10000L): Boolean = {

//写入一些限流数据

maybeThrottle(throttle)

try {

//验证分区是否存在

val validPartitions = proposedPartitionAssignment.groupBy(_._1.topic())

.flatMap { case (topic, topicPartitionReplicas) =>

validatePartition(zkClient, topic, topicPartitionReplicas)

}

if (validPartitions.isEmpty) false

else {

if (proposedReplicaAssignment.nonEmpty && adminClientOpt.isEmpty)

throw new AdminCommandFailedException(“bootstrap-server needs to be provided in order to reassign replica to the specified log directory”)

val startTimeMs = System.currentTimeMillis()

// Send AlterReplicaLogDirsRequest to allow broker to create replica in the right log dir later if the replica has not been created yet.

if (proposedReplicaAssignment.nonEmpty)

alterReplicaLogDirsIgnoreReplicaNotAvailable(proposedReplicaAssignment, adminClientOpt.get, timeoutMs)

// Create reassignment znode so that controller will send LeaderAndIsrRequest to create replica in the broker

zkClient.createPartitionReassignment(validPartitions.map({case (key, value) => (new TopicPartition(key.topic, key.partition), value)}).toMap)

// Send AlterReplicaLogDirsRequest again to make sure broker will start to move replica to the specified log directory.

// It may take some time for controller to create replica in the broker. Retry if the replica has not been created.

var remainingTimeMs = startTimeMs + timeoutMs - System.currentTimeMillis()

val replicasAssignedToFutureDir = mutable.Set.empty[TopicPartitionReplica]

while (remainingTimeMs > 0 && replicasAssignedToFutureDir.size < proposedReplicaAssignment.size) {

replicasAssignedToFutureDir ++= alterReplicaLogDirsIgnoreReplicaNotAvailable(

proposedReplicaAssignment.filter { case (replica, _) => !replicasAssignedToFutureDir.contains(replica) },

adminClientOpt.get, remainingTimeMs)

Thread.sleep(100)

remainingTimeMs = startTimeMs + timeoutMs - System.currentTimeMillis()

}

replicasAssignedToFutureDir.size == proposedReplicaAssignment.size

}

} catch {

case _: NodeExistsException =>

val partitionsBeingReassigned = zkClient.getPartitionReassignment()

throw new AdminCommandFailedException("Partition reassignment currently in " +

“progress for %s. Aborting operation”.format(partitionsBeingReassigned))

}

}

  1. maybeThrottle(throttle) 设置副本移动时候的限流配置,这个方法只用于任务初始化的时候

private def maybeThrottle(throttle: Throttle): Unit = {

if (throttle.interBrokerLimit >= 0)

assignThrottledReplicas(existingAssignment(), proposedPartitionAssignment, adminZkClient)

maybeLimit(throttle)

if (throttle.interBrokerLimit >= 0 || throttle.replicaAlterLogDirsLimit >= 0)

throttle.postUpdateAction()

if (throttle.interBrokerLimit >= 0)

println(s"The inter-broker throttle limit was set to ${throttle.interBrokerLimit} B/s")

if (throttle.replicaAlterLogDirsLimit >= 0)

println(s"The replica-alter-dir throttle limit was set to ${throttle.replicaAlterLogDirsLimit} B/s")

}

1.1 将一些topic的限流配置写入到节点/config/topics/{topicName}

在这里插入图片描述

将计算得到的leader、follower 值写入到/config/topics/{topicName}

leader: 找到 TopicPartition中有新增的副本的 那个分区;数据= 分区号:副本号,分区号:副本号

follower: 遍历 预期 TopicPartition,副本= 预期副本-现有副本;数据= 分区号:副本号,分区号:副本号

leader.replication.throttled.replicas: leader

follower.replication.throttled.replicas: follower

在这里插入图片描述

1.2. 执行 《2.2.1 已有任务,尝试限流》流程

  1. 从zk中获取/broker/topics/{topicName}数据来验证给定的分区是否存在,如果分区不存在则忽略此分区的配置,继续流程

  2. 如果Json文件中存在指定Log Dir的情况,则发送 AlterReplicaLogDirsRequest 以允许代理稍后在正确的日志目录中创建副本。

比如:log_dirs 指定了文件存放目录

{“version”:1,“partitions”:[{“topic”:“Topic1”,“partition”:2,“replicas”:[1],“log_dirs”:[“/Users/shirenchuang/work/IdeaPj/didi_source/kafka/k0”]}]}

那么 AlterReplicaLogDirsRequest 请求就会先去创建对应的副本。具体的 跨目录数据迁移请看跨目录数据迁移

{“version”:1,“partitions”:[{“topic”:“test_create_topic1”,“partition”:0,“replicas”:[0,1,2,3]},{“topic”:“test_create_topic1”,“partition”:1,“replicas”:[1,2,0,3]},{“topic”:“test_create_topic1”,“partition”:2,“replicas”:[2,1,0,3]}]}

  1. 再次发送 AlterReplicaLogDirsRequest以确保代理将开始将副本移动到指定的日志目录。控制器在代理中创建副本可能需要一些时间。如果尚未创建副本,请重试。

  2. 像Broker发送alterReplicaLogDirs请求

2.2.3 Controller监听/admin/reassign_partitions节点变化

KafkaController.processZkPartitionReassignment

private def processZkPartitionReassignment(): Set[TopicPartition] = {

// We need to register the watcher if the path doesn’t exist in order to detect future

// reassignments and we get the path exists check for free

if (isActive && zkClient.registerZNodeChangeHandlerAndCheckExistence(partitionReassignmentHandler)) {

val reassignmentResults = mutable.Map.empty[TopicPartition, ApiError]

val partitionsToReassign = mutable.Map.empty[TopicPartition, ReplicaAssignment]

zkClient.getPartitionReassignment().foreach { case (tp, targetReplicas) =>

maybeBuildReassignment(tp, Some(targetReplicas)) match {

case Some(context) => partitionsToReassign.put(tp, context)

case None => reassignmentResults.put(tp, new ApiError(Errors.NO_REASSIGNMENT_IN_PROGRESS))

}

}

reassignmentResults ++= maybeTriggerPartitionReassignment(partitionsToReassign)

val (partitionsReassigned, partitionsFailed) = reassignmentResults.partition(_._2.error == Errors.NONE)

if (partitionsFailed.nonEmpty) {

warn(s"Failed reassignment through zk with the following errors: $partitionsFailed")

maybeRemoveFromZkReassignment((tp, _) => partitionsFailed.contains(tp))

}

partitionsReassigned.keySet

} else {

Set.empty

}

}

  1. 判断是否是Controller角色并且是否存在节点/admin/reassign_partitions

  2. maybeTriggerPartitionReassignment 重分配,如果topic已经被标记为删除了,则此topic流程终止;

  3. maybeRemoveFromZkReassignment将执行失败的一些分区信息从zk中删除;(覆盖信息)

onPartitionReassignment

KafkaController.onPartitionReassignment

private def onPartitionReassignment(topicPartition: TopicPartition, reassignment: ReplicaAssignment): Unit = {

// 暂停一些正在删除的Topic操作

topicDeletionManager.markTopicIneligibleForDeletion(Set(topicPartition.topic), reason = “topic reassignment in progress”)

//更新当前的分配

updateCurrentReassignment(topicPartition, reassignment)

val addingReplicas = reassignment.addingReplicas

val removingReplicas = reassignment.removingReplicas

if (!isReassignmentComplete(topicPartition, reassignment)) {

// A1. Send LeaderAndIsr request to every replica in ORS + TRS (with the new RS, AR and RR).

updateLeaderEpochAndSendRequest(topicPartition, reassignment)

// A2. replicas in AR -> NewReplica

startNewReplicasForReassignedPartition(topicPartition, addingReplicas)

} else {

// B1. replicas in AR -> OnlineReplica

replicaStateMachine.handleStateChanges(addingReplicas.map(PartitionAndReplica(topicPartition, _)), OnlineReplica)

// B2. Set RS = TRS, AR = [], RR = [] in memory.

val completedReassignment = ReplicaAssignment(reassignment.targetReplicas)

controllerContext.updatePartitionFullReplicaAssignment(topicPartition, completedReassignment)

// B3. Send LeaderAndIsr request with a potential new leader (if current leader not in TRS) and

// a new RS (using TRS) and same isr to every broker in ORS + TRS or TRS

moveReassignedPartitionLeaderIfRequired(topicPartition, completedReassignment)

// B4. replicas in RR -> Offline (force those replicas out of isr)

// B5. replicas in RR -> NonExistentReplica (force those replicas to be deleted)

stopRemovedReplicasOfReassignedPartition(topicPartition, removingReplicas)

// B6. Update ZK with RS = TRS, AR = [], RR = [].

updateReplicaAssignmentForPartition(topicPartition, completedReassignment)

// B7. Remove the ISR reassign listener and maybe update the /admin/reassign_partitions path in ZK to remove this partition from it.

removePartitionFromReassigningPartitions(topicPartition, completedReassignment)

// B8. After electing a leader in B3, the replicas and isr information changes, so resend the update metadata request to every broker

sendUpdateMetadataRequest(controllerContext.liveOrShuttingDownBrokerIds.toSeq, Set(topicPartition))

// signal delete topic thread if reassignment for some partitions belonging to topics being deleted just completed

topicDeletionManager.resumeDeletionForTopics(Set(topicPartition.topic))

}

}

  1. 暂停一些正在删除的Topic操作

  2. 更新 Zk节点brokers/topics/{topicName},和内存中的当前分配状态。如果重新分配已经在进行中,那么新的重新分配将取代它并且一些副本将被关闭。

2.1 更新zk中的topic节点信息brokers/topics/{topicName},这里会标记AR哪些副本是新增的,RR哪些副本是要删除的;例如:在这里插入图片描述

在这里插入图片描述

2.2 更新当前内存

2.3 如果重新分配已经在进行中,那么一些当前新增加的副本有可能被立即删除,在这种情况下,我们需要停止副本。

2.4 注册一个监听节点/brokers/topics/{topicName}/partitions/{分区号}/state变更的处理器PartitionReassignmentIsrChangeHandler

  1. 如果该分区的重新分配还没有完成(根据/brokers/topics/{topicName}/partitions/{分区号}/state里面的isr来判断是否已经包含了新增的BrokerId了);则

以下几个名称说明:

ORS: OriginReplicas 原先的副本

TRS: targetReplicas 将要变更成的目标副本

AR: adding_replicas 正在添加的副本

RR:removing_replicas 正在移除的副本

3.1 向 ORS + TRS 中的每个副本发送LeaderAndIsr请求(带有新的 RS、AR 和 RR)。

3.2 给新增加的AR副本 进行状态变更成NewReplica ; 这个过程有发送LeaderAndIsrRequest详细请看【kafka源码】Controller中的状态机

2.2.4 Controller监听节点brokers/topics/{topicName}变化,检查是否有新增分区

这一个流程可以不必在意,因为在这里没有做任何事情;

上面的 2.2.3 的第2小段中不是有将新增的和删掉的副本写入到了 zk中吗

例如:

{“version”:2,“partitions”:{“2”:[0,1],“1”:[0,1],“0”:[0,1]},“adding_replicas”:{“2”:[1],“1”:[1],“0”:[1]},“removing_replicas”:{}}

Controller监听到这个节点之后,执行方法processPartitionModifications

KafkaController.processPartitionModifications

private def processPartitionModifications(topic: String): Unit = {

def restorePartitionReplicaAssignment(

topic: String,

newPartitionReplicaAssignment: Map[TopicPartition, ReplicaAssignment]

): Unit = {

info(“Restoring the partition replica assignment for topic %s”.format(topic))

//从zk节点中获取所有分区

val existingPartitions = zkClient.getChildren(TopicPartitionsZNode.path(topic))

//找到已经存在的分区

val existingPartitionReplicaAssignment = newPartitionReplicaAssignment

.filter(p => existingPartitions.contains(p._1.partition.toString))

.map { case (tp, _) =>

tp -> controllerContext.partitionFullReplicaAssignment(tp)

}.toMap

zkClient.setTopicAssignment(topic,

existingPartitionReplicaAssignment,

controllerContext.epochZkVersion)

}

if (!isActive) return

val partitionReplicaAssignment = zkClient.getFullReplicaAssignmentForTopics(immutable.Set(topic))

val partitionsToBeAdded = partitionReplicaAssignment.filter { case (topicPartition, _) =>

controllerContext.partitionReplicaAssignment(topicPartition).isEmpty

}

if (topicDeletionManager.isTopicQueuedUpForDeletion(topic)) {

if (partitionsToBeAdded.nonEmpty) {

warn(“Skipping adding partitions %s for topic %s since it is currently being deleted”

.format(partitionsToBeAdded.map(_._1.partition).mkString(“,”), topic))

restorePartitionReplicaAssignment(topic, partitionReplicaAssignment)

} else {

// This can happen if existing partition replica assignment are restored to prevent increasing partition count during topic deletion

info(“Ignoring partition change during topic deletion as no new partitions are added”)

}

} else if (partitionsToBeAdded.nonEmpty) {

info(s"New partitions to be added $partitionsToBeAdded")

partitionsToBeAdded.foreach { case (topicPartition, assignedReplicas) =>

controllerContext.updatePartitionFullReplicaAssignment(topicPartition, assignedReplicas)

}

onNewPartitionCreation(partitionsToBeAdded.keySet)

}

}

  1. brokers/topics/{topicName}中获取完整的分配信息,例如

{

“version”: 2,

“partitions”: {

“2”: [0, 1],

“1”: [0, 1],

“0”: [0, 1]

},

“adding_replicas”: {

“2”: [1],

“1”: [1],

“0”: [1]

},

“removing_replicas”: {}

}

  1. 如果有需要新增的分区,如下操作

2.1 如果当前Topic刚好在删掉队列中,那么就没有必要进行分区扩容了; 将zk的brokers/topics/{topicName}数据恢复回去

2.2 如果不在删除队列中,则开始走新增分区的流程;关于新增分区的流程 在[【kafka源码】TopicCommand之创建Topic源码解析

]( )里面已经详细讲过了,跳转后请搜索关键词onNewPartitionCreation

  1. 如果该Topic正在删除中,则跳过该Topic的处理; 并且同时如果有AR(adding_replical),则重写一下zk节点/broker/topics/{topicName}节点的数据; 相当于是还原数据; 移除掉里面的AR;

这一步完全不用理会,因为 分区副本重分配不会出现新增分区的情况;

2.2.5 Controller监听zk节点/brokers/topics/{topicName}/partitions/{分区号}/state

上面2.2.3 里面的 2.4不是有说过注册一个监听节点/brokers/topics/{topicName}/partitions/{分区号}/state变更的处理器PartitionReassignmentIsrChangeHandler

到底是什么时候这个节点有变化呢? 前面我们不是对副本们发送了LEADERANDISR的请求么, 当新增的副本去leader

fetch数据开始同步的时候,当数据同步完成跟上了ISR的节奏,就会去修改这个节点; 修改之后那么下面就开始执行监听流程了

这里跟 2.2.3 中有调用同一个接口; 不过这个时候经过了LeaderAndIsr请求

kafkaController.processPartitionReassignmentIsrChange->

private def processPartitionReassignmentIsrChange(topicPartition: TopicPartition): Unit = {

if (!isActive) return

if (controllerContext.partitionsBeingReassigned.contains(topicPartition)) {

val reassignment = controllerContext.partitionFullReplicaAssignment(topicPartition)

if (isReassignmentComplete(topicPartition, reassignment)) {

// resume the partition reassignment process

info(s"Target replicas ${reassignment.targetReplicas} have all caught up with the leader for " +

s"reassigning partition $topicPartition")

onPartitionReassignment(topicPartition, reassignment)

}

}

}

自我介绍一下,小编13年上海交大毕业,曾经在小公司待过,也去过华为、OPPO等大厂,18年进入阿里一直到现在。

深知大多数Java工程师,想要提升技能,往往是自己摸索成长或者是报班学习,但对于培训机构动则几千的学费,着实压力不小。自己不成体系的自学效果低效又漫长,而且极易碰到天花板技术停滞不前!

因此收集整理了一份《2024年Java开发全套学习资料》,初衷也很简单,就是希望能够帮助到想自学提升又不知道该从何学起的朋友,同时减轻大家的负担。
img
img
img
img
img
img

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,基本涵盖了95%以上Java开发知识点,真正体系化!

由于文件比较大,这里只是将部分目录大纲截图出来,每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频,并且后续会持续更新

如果你觉得这些内容对你有帮助,可以添加V获取:vip1024b (备注Java)
img

总结

虽然面试套路众多,但对于技术面试来说,主要还是考察一个人的技术能力和沟通能力。不同类型的面试官根据自身的理解问的问题也不尽相同,没有规律可循。

上面提到的关于这些JAVA基础、三大框架、项目经验、并发编程、JVM及调优、网络、设计模式、spring+mybatis源码解读、Mysql调优、分布式监控、消息队列、分布式存储等等面试题笔记及资料

有些面试官喜欢问自己擅长的问题,比如在实际编程中遇到的或者他自己一直在琢磨的这方面的问题,还有些面试官,尤其是大厂的比如 BAT 的面试官喜欢问面试者认为自己擅长的,然后通过提问的方式深挖细节,刨根到底。

一个人可以走的很快,但一群人才能走的更远。不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎扫码加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!
img

6Scr-1712790437774)]

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,基本涵盖了95%以上Java开发知识点,真正体系化!

由于文件比较大,这里只是将部分目录大纲截图出来,每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频,并且后续会持续更新

如果你觉得这些内容对你有帮助,可以添加V获取:vip1024b (备注Java)
[外链图片转存中…(img-dg1TLUiL-1712790437774)]

总结

虽然面试套路众多,但对于技术面试来说,主要还是考察一个人的技术能力和沟通能力。不同类型的面试官根据自身的理解问的问题也不尽相同,没有规律可循。

[外链图片转存中…(img-OAR70MwF-1712790437775)]

[外链图片转存中…(img-MK9PPiKu-1712790437775)]

上面提到的关于这些JAVA基础、三大框架、项目经验、并发编程、JVM及调优、网络、设计模式、spring+mybatis源码解读、Mysql调优、分布式监控、消息队列、分布式存储等等面试题笔记及资料

有些面试官喜欢问自己擅长的问题,比如在实际编程中遇到的或者他自己一直在琢磨的这方面的问题,还有些面试官,尤其是大厂的比如 BAT 的面试官喜欢问面试者认为自己擅长的,然后通过提问的方式深挖细节,刨根到底。

一个人可以走的很快,但一群人才能走的更远。不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎扫码加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!
[外链图片转存中…(img-QRtWyl31-1712790437776)]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值