Kafka源码阅读 —— KafkaController(5)

重新分配 replica

当新增机器到集群中时,可能需要调整topic下partition的replica分配。kafka不会根据负载自动调整replica assignment,这时候就需要集群管理员手动调整。
下面的例子是将foo1和foo2两个topic的所有replica重新分配到broker 5和broker 6上。
首先,需要提供文件指明需要迁移哪些topic:

>cat topics-to-move.json
{"topics": [{"topic": "foo1"},
            {"topic": "foo2"}],
 "version":1
}

文件准备好后,执行命令生成kafka建议的replica assignment:

> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file topics-to-move.json --broker-list "5,6" --generate
Current partition replica assignment

{"version":1,
 "partitions":[{"topic":"foo1","partition":2,"replicas":[1,2]},
               {"topic":"foo1","partition":0,"replicas":[3,4]},
               {"topic":"foo2","partition":2,"replicas":[1,2]},
               {"topic":"foo2","partition":0,"replicas":[3,4]},
               {"topic":"foo1","partition":1,"replicas":[2,3]},
               {"topic":"foo2","partition":1,"replicas":[2,3]}]
}

Proposed partition reassignment configuration

{"version":1,
 "partitions":[{"topic":"foo1","partition":2,"replicas":[5,6]},
               {"topic":"foo1","partition":0,"replicas":[5,6]},
               {"topic":"foo2","partition":2,"replicas":[5,6]},
               {"topic":"foo2","partition":0,"replicas":[5,6]},
               {"topic":"foo1","partition":1,"replicas":[5,6]},
               {"topic":"foo2","partition":1,"replicas":[5,6]}]
}

上面的工具只是生成了建议的replica assignment,没有真正执行replica重新分配。将上面生成的建议reassignment另存到文件expand-cluster-reassignment.json中,通过下面的命令执行分配:

> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file expand-cluster-reassignment.json --execute
Current partition replica assignment

{"version":1,
 "partitions":[{"topic":"foo1","partition":2,"replicas":[1,2]},
               {"topic":"foo1","partition":0,"replicas":[3,4]},
               {"topic":"foo2","partition":2,"replicas":[1,2]},
               {"topic":"foo2","partition":0,"replicas":[3,4]},
               {"topic":"foo1","partition":1,"replicas":[2,3]},
               {"topic":"foo2","partition":1,"replicas":[2,3]}]
}

Save this to use as the --reassignment-json-file option during rollback
Successfully started reassignment of partitions
{"version":1,
 "partitions":[{"topic":"foo1","partition":2,"replicas":[5,6]},
               {"topic":"foo1","partition":0,"replicas":[5,6]},
               {"topic":"foo2","partition":2,"replicas":[5,6]},
               {"topic":"foo2","partition":0,"replicas":[5,6]},
               {"topic":"foo1","partition":1,"replicas":[5,6]},
               {"topic":"foo2","partition":1,"replicas":[5,6]}]
}

以上就是通过kafka提供的reassignment工具重新分配replica到新的broker上的过程。

replica reassignment 实现

生成建议的replica assignment的过程比较简单,就是在指定的broker上执行replica分配算法,Kafka源码阅读 —— KafkaController(3)中有说到过replica的分配策略。
执行replica assignment时,kafka-reassign-partitions.sh 命令将分配策略以json格式写入到zookeeper路径/admin/reassign_partitions下,数据格式如下:

{"partitions":[{"topic":"t1","partition":"p1","replicas":"r1"},{"topic":"t2","partition":"p2","replicas":"r2"}]}

类PartitionsReassignedListener监听到zookeeper变化后调用handleDataChange函数,读取zookeeper中的reassign partition数据,组装成ReassignedPartitionsContext,其中包含两个字段:

//新的replica列表
var newReplicas: Seq[Int] = Seq.empty,
//ISR监听类,用于判断新加入的replica是否已经成为ISR
var isrChangeListener: ReassignedPartitionsIsrChangeListener = null

重新分配replica是一个比较复杂的过程,源码中也有比较详细的解释,首先定义几个简写:
RAR = reassigend replicas,即重新分配的replica,可以理解为新的replica列表
OAR = Original list of replicas for partition,即旧的replica列表
AR = current assigned replica,即当前的replica列表
replica reassign
整个过程大致可能分成两个主要步骤:
1. initiateReassignReplicasForTopicPartition函数初始化reassign replica过程,其中会在zookeeper路径/brokers/topics/[topic]/partitions/[partitionId]/state/上添加listener,监听ISR变化;
2. 第一次调用onPartitionReassignment: 将AR更新为ARA+OAR,这个过程会更新zookeeper和context,之后向AR广播LeaderAndIsr消息。回忆一下,broker收到这个消息会干什么呢?Follower会添加到leader的Fetcher,Leader则更新ISR等信息。RAR-OAR中的replica开始向leader(没有重新选举,leader在OAR中)拉取消息。
3. RAR中的所有replica都已经追赶上Leader,成为ISR,触发ReassignedPartitionsIsrChangeListener,第二次调用onPartitionReassignment:之前的leader在OAR中,这里需要重新从ARA中选出新一任的Leader,然后再将OAR-ARA中的replica从AR中移除,发生OnlineReplica->OfflineReplica->NonExistentReplica的状态变更。最后,还要向所有broker发送UpdateMetadataRequest消息。
这里面有两次调用onPartitionReassignment,这个刚开始有点难以理解,后面对LeaderAndIsrRequest消息的处理流程熟悉后就明白了。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值