1 停止原有单节点kafka集群
先停止kafka,再停止zookeeper
cd /usr/local/kafka
./bin/kafka-server-stop.sh
zkServer.sh stop
2 按单集群方式安装新节点软件
zookeeper-3.4.8.tar.gz
(2-1)install_zookeeper.sh
#! /bin/bash
## author:zb
## date:2020.10.24
## 需要提前定义好的信息
myidvalue="1"
hostname=`cat /etc/hostname`
echo '开始安装zookeeper'
echo '(1)解压'
tar -xzvf zookeeper-3.4.8.tar.gz -C /usr/local
echo '(2)配置环境变量ZOOKEEPER_HOME'
ZOOKEEPER_HOME='/usr/local/zookeeper-3.4.8'
echo "export ZOOKEEPER_HOME=${ZOOKEEPER_HOME}" >> /root/.bashrc
echo 'export PATH=$PATH:$ZOOKEEPER_HOME/bin' >> /root/.bashrc
source /root/.bashrc
echo '(3)创建数据和日志目录'
mkdir -p ${ZOOKEEPER_HOME}/data
mkdir -p ${ZOOKEEPER_HOME}/datalog
echo '(4)创建myid文件'
cd ${ZOOKEEPER_HOME}/data
touch myid
echo ${myidvalue} >> myid
echo '(5)修改配置文件zoo.cfg'
cd ${ZOOKEEPER_HOME}/conf
echo "tickTime=2000" >> zoo.cfg
echo "initLimit=10" >> zoo.cfg
echo "syncLimit=5" >> zoo.cfg
echo "dataDir=${ZOOKEEPER_HOME}/data" >> zoo.cfg
echo "dataLogDir=${ZOOKEEPER_HOME}/datalog" >> zoo.cfg
echo "clientPort=2181" >> zoo.cfg
echo "server.1=${hostname}:2888:3888" >> zoo.cfg
(2-2)install_kafka.sh
#! /bin/bash
## author:zb
## date:2020.10.24
## 需要提前定义好的信息
brokeridvalue="1"
hostname=`cat /etc/hostname`
echo '开始安装kafka'
echo '(1)解压'
tar -xzvf kafka_2.12-2.2.0.tgz -C /usr/local
cd /usr/local
mv kafka_2.12-2.2.0 kafka
echo '(2)配置环境变量KAFKA_HOME'
KAFKA_HOME='/usr/local/kafka'
echo "export KAFKA_HOME=${KAFKA_HOME}" >> /root/.bashrc
echo 'export PATH=$PATH:$KAFKA_HOME/bin' >> /root/.bashrc
source /root/.bashrc
echo '(3)创建日志目录'
mkdir -p ${KAFKA_HOME}/datalog
echo '(4)修改配置文件server0.properties'
cd ${KAFKA_HOME}/config
echo "broker.id=${brokeridvalue}" >> server0.properties
echo "listeners=PLAINTEXT://${hostname}:9092" >> server0.properties
echo "log.dir=${KAFKA_HOME}/datalog" >> server0.properties
echo "num.partitions=5" >> server0.properties
echo "log.retention.hours=24" >> server0.properties
echo "zookeeper.connect=${hostname}:2181" >> server0.properties
echo "zookeeper.connection.timeout.ms=60000" >> server0.properties
echo "offsets.topic.replication.factor=1" >> server0.properties
3 按集群格式修改配置文件
(3-1)文件/etc/hosts
192.168.43.48 pda1
192.168.43.54 pda2
保证两个节点之间是互相通的。
(3-2)zookeeper
文件myid:目录/usr/local/zookeeper-3.4.8/data
分别配置1和2
文件zoo.cfg:都配置
server.1=pda1:2888:3888
server.2=pda2:2888:3888
(3-3)kafka
文件server0.properties:分别配置
broker.id=1和2
offsets.topic.replication.factor=集群节点数量
4 启动前查看kafka数据存储情况
du -sh查看当前目录总大小:
pda1中已经写入了一批数据
pda2中没有任何数据
5 启动kafka双节点集群
(5-1)先启动zookeeper双节点集群
注意必须两个节点都启动
zkServer.sh start
(5-2)再启动kafka双节点
(5-2-1)先启动pda2
cd /usr/local/kafka/
nohup ./bin/kafka-server-start.sh ./config/server0.properties >> /tmp/kafkaoutput.log 2>&1 &
发现是没有任何的topic信息的
同时发现新增节点pda2的数据文件目录里面有了内容
(5-2-2)再启动pda1
发现topic信息已经有了。
同时发现pda1的节点里之前新建的topic。
6 kafka集群新增节点后数据重分配
6.1 新增节点的步骤
将其他节点的server.properties配置文件拷贝后修改以下参数
broker.id
log.dirs
zookeeper.connect
6.2数据迁移原理
只有新增的Topic才会将数据分布在新节点上,如果要将现有数据也分配到新节点,需要将Topic中的数据迁移到新节点上。
数据迁移过程是手动启动的,但是是完全自动化的。Kafka会将新节点添加为要迁移的分区的追随者,并允许其完全复制该分区中的现有数据。新节点完全复制此分区的内容并加入同步副本后,现有副本之一将删除其分区的数据。
6.3 数据迁移工具
分区重新分配工具可用于在代理之间移动分区。理想的分区分配将确保所有代理之间的数据负载和分区大小均匀。
分区重新分配工具没有能力自动研究Kafka群集中的数据分布,并四处移动分区以实现均匀的负载分布。因此,必须弄清楚应该移动哪些主题或分区。
分区重新分配工具可以在3种模式下运行:
(1)–generate:在此模式下,给定主题列表和代理列表,该工具会生成分区与副本重新分配的计划,以将指定主题的所有分区在所有节点上重新分配。在给定主题和目标代理的列表的情况下,此选项仅提供了一种方便的方式来生成分区重新分配计划。
(2)–execute:在此模式下,该工具将根据用户提供的重新分配计划启动分区的重新分配。(使用–reassignment-json-file选项)。这可以是管理员手工制作的自定义重新分配计划,也可以使用–generate选项提供。
(3)–verify:在此模式下,该工具会验证上一次–execute期间列出的所有分区的重新分配状态。状态可以是成功完成,失败或进行中。
6.4 示例
6.4.1 示例情况1
现有5个节点的broker_id为1,2,3,4,5;新增节点broker_id为6
Topic:test 有6个分区,5个副本
(1)创建要迁移的topic配置文件
topics-to-move.json
{"topics": [{"topic": "test"}],"version":1}
(2)生成重新分配计划
kafka-reassign-partitions --bootstrap-server localhost:9092 --zookeeper zookeeper-001:2181 --topics-to-move-json-file topics-to-move.json --broker-list "1,2,3,4,5,6" --generate
以上命令将会产生以下内容,将 Proposed partition reassignment configuration 下的内容保存为test-reassign.json文件
Current partition replica assignment
{"version":1,"partitions":[{"topic":"test","partition":0,"replicas":[5,4,1,2,3],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":5,"replicas":[5,2,3,4,1],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":1,"replicas":[1,5,2,3,4],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":4,"replicas":[4,5,1,2,3],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":3,"replicas":[3,4,5,1,2],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":2,"replicas":[2,1,3,4,5],"log_dirs":["any","any","any","any","any"]}]}
Proposed partition reassignment configuration
{"version":1,"partitions":[{"topic":"test","partition":4,"replicas":[5,1,2,3,4],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":1,"replicas":[2,4,5,6,1],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":3,"replicas":[4,6,1,2,3],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":0,"replicas":[1,3,4,5,6],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":5,"replicas":[6,2,3,4,5],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":2,"replicas":[3,5,6,1,2],"log_dirs":["any","any","any","any","any"]}]}
(3)执行数据迁移
kafka-reassign-partitions --bootstrap-server localhost:9092 --zookeeper zookeeper-001:2181 --reassignment-json-file test-reassign.json --execute
(4)检查重新分配的分区状态
kafka-reassign-partitions --bootstrap-server localhost:9092 --zookeeper zookeeper-001:2181 --reassignment-json-file test-reassign.json --verify
6.4.2 示例情况2
现有1个节点的broker_id为1;新增节点broker_id为2
Topic:test 有1个分区,1个副本
(1)创建要迁移的topic配置文件
topics-to-move.json
{"topics": [{"topic": "test"}],"version":1}
(2)生成重新分配计划
kafka-reassign-partitions.sh --bootstrap-server pda1:9092 --zookeeper pda1:2181 --topics-to-move-json-file topics-to-move.json --broker-list "1,2" --generate
以上命令将会产生以下内容,将 Proposed partition reassignment configuration 下的内容保存为test-reassign.json文件
Current partition replica assignment
{"version":1,"partitions":[{"topic":"test","partition":0,"replicas":[1],"log_dirs":["any"]}]}
Proposed partition reassignment configuration
{"version":1,"partitions":[{"topic":"test","partition":0,"replicas":[2],"log_dirs":["any"]}]}
(3)执行数据迁移
kafka-reassign-partitions.sh --bootstrap-server pda1:9092 --zookeeper pda1:2181 --reassignment-json-file test-reassign.json --execute
(4)检查重新分配的分区状态
kafka-reassign-partitions.sh --bootstrap-server pda1:9092 --zookeeper pda1:2181 --reassignment-json-file test-reassign.json --verify
输出显示
Status of partition reassignment:
Reassignment of partition test-0 completed successfully
现在发现pda2节点上已经有了数据
现在发现pda1节点上已经没有了数据
7 kafka主题在线增加分区
Topic:test 有1个分区,1个副本
(1)查看现有分区和副本
kafka-topics.sh --bootstrap-server pda1:9092 --describe -- topic test
(2)增加分区数量
kafka-topics.sh --bootstrap-server pda1:9092 --alter --topic test --partitions 3
增加后,查看分区数量
发现只有partition0有数据,新增的partition1和2都没有数据。
(3)发送数据
向test主题发送消息,新消息会优先发送到新增的分区上。