kafka原理三之partition

yangyanping20108

已于 2023-11-15 11:37:46 修改

阅读量2.6k

点赞数

分类专栏：消息队列文章标签： kafka 队列 java 分布式

于 2020-12-10 14:55:46 首次发布

本文链接：https://blog.csdn.net/yangyanping20108/article/details/110927880

版权

消息队列专栏收录该内容

12 篇文章 1 订阅

订阅专栏

前言

在上一篇中讲述了 kafka 的架构一之入门，以及如何搭建kafka集群，本篇则讲述如何简单的使用 kafka 。

核心概念

Partition主题分区数
kafka通过分区策略，将不同的分区分配在一个集群中的broker上，一般会分散在不同的broker上，当只有一个broker时，所有的分区就只分配到该Broker上。消息会通过负载均衡发布到不同的分区上，消费者会监测偏移量来获取哪个分区有新数据，从而从该分区上拉取消息数据。分区数越多，在一定程度上会提升消息处理的吞吐量，因为kafka是基于文件进行读写，因此也需要打开更多的文件句柄，也会增加一定的性能开销。如果分区过多，那么日志分段也会很多，写的时候由于是批量写，其实就会变成随机写了，随机 I/O 这个时候对性能影响很大。所以一般来说 Kafka 不能有太多的 Partition。
Push vs. Pull
作为一个messaging system，Kafka遵循了传统的方式，选择由producer向broker push消息并由consumer从broker pull消息。事实上，push模式和pull模式各有优劣。
push模式很难适应消费速率不同的消费者，因为消息发送速率是由broker决定的。push模式的目标是尽可能以最快速度传递消息，但是这样很容易造成consumer来不及处理消息，典型的表现就是拒绝服务以及网络拥塞。而pull模式则可以根据consumer的消费能力以适当的速率消费消息。
Topic & Partition
Topic在逻辑上可以被认为是一个queue，每条消息都必须指定它的topic，可以简单理解为必须指明把这条消息放进哪个queue里。为了使得Kafka的吞吐率可以水平扩展，物理上把topic分成一个或多个partition，每个partition在物理上对应一个文件夹，该文件夹下存储这个partition的所有消息和索引文件。

开发准备

创建主题
使用kafka-topics.sh创建一个主题为test1， Partition 为3的topic。

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 3 --topic test1

Maven依赖
这里用的开发语言是Java，构建工具Maven。
Maven的依赖如下:

<dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.12</artifactId>
            <version>1.0.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>1.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-streams</artifactId>
            <version>1.0.0</version>
        </dependency>

分区策略

如果消息的 key 为 null，此时 producer 会使用默认的 partitioner 分区器将消息随机分布到 topic 的可用 partition 中。
如果 key 不为 null，并且使用了默认的分区器，kafka 会使用自己的 hash 算法对 key 取 hash 值，使用 hash 值与 partition 数量取模，从而确定发送到哪个分区。注意：此时 key 相同的消息会发送到相同的分区(只要 partition 的数量不变化)。

默认分区策略

org.apache.kafka.clients.producer.internals.DefaultPartitioner
public class DefaultPartitioner implements Partitioner {
    private final ConcurrentMap<String, AtomicInteger> topicCounterMap = new ConcurrentHashMap();

    public DefaultPartitioner() {
    }

    public void configure(Map<String, ?> configs) {
    }

    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) {
            int nextValue = this.nextValue(topic);
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                int part = Utils.toPositive(nextValue) % availablePartitions.size();
                return ((PartitionInfo)availablePartitions.get(part)).partition();
            } else {
                return Utils.toPositive(nextValue) % numPartitions;
            }
        } else {
            return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

    private int nextValue(String topic) {
        AtomicInteger counter = (AtomicInteger)this.topicCounterMap.get(topic);
        if (null == counter) {
            counter = new AtomicInteger(ThreadLocalRandom.current().nextInt());
            AtomicInteger currentCounter = (AtomicInteger)this.topicCounterMap.putIfAbsent(topic, counter);
            if (currentCounter != null) {
                counter = currentCounter;
            }
        }

        return counter.getAndIncrement();
    }

    public void close() {
    }
}

自定义分区器：在这里我们规定：key值不允许为null。在实际项目中，key为null的消息*，可以发送到同一个分区

/**
 * Hash分区算法
 * @author yangyanping 
 * @date 2020-12-18
 */
public class HashPartitioner implements Partitioner {
    /**
     * 计算partition
     *
     * @param topic      The topic name
     * @param key        The key to partition on (or null if no key)
     * @param keyBytes   serialized key to partition on (or null if no key)
     * @param value      The value to partition on or null
     * @param valueBytes serialized value to partition on or null
     * @param cluster    The current cluster metadata
     */
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitionInfos = cluster.partitionsForTopic(topic);
        int numPartitions = partitionInfos.size();
     
        /**
         *由于我们按key分区，在这里我们规定：key值不允许为null。在实际项目中，key为null的消息*，可以发送到同一个分区。
         */
        if (keyBytes == null) {
            throw new InvalidRecordException("key cannot be null");
        }
        if (((String) key).equals("1")) {
            return 1;
        }
        //如果消息的key值不为1，那么使用hash值取模，确定分区。
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
    }

    public void close() {
    }

    public void configure(Map<String, ?> map) {
    }
}

/**
 * 轮询分区算法
 * @author yangyanping
 * @date 2020-12-18
 */
public class RoundRobinPartitioner implements Partitioner {

    private static AtomicLong next = new AtomicLong(0);

    /*
     * @param topic      The topic name
     * @param key        The key to partition on (or null if no key)
     * @param keyBytes   serialized key to partition on (or null if no key)
     * @param value      The value to partition on or null
     * @param valueBytes serialized value to partition on or null
     * @param cluster    The current cluster metadata
     */
    public int partition(String topicName, Object o, byte[] bytes, Object o1, byte[] bytes1, Cluster cluster) {
        long nextIndex = next.incrementAndGet();
        List<PartitionInfo> partitionInfos = cluster.partitionsForTopic(topicName);
        int partitionCount = partitionInfos.size();
        return (int) nextIndex % partitionCount;
    }

    public void close() {

    }

    public void configure(Map<String, ?> map) {

    }
}

在kafka配置参数时设置分区器的类

//设置自定义分区
kafkaProps.put("partitioner.class", "com.study.kafka.HashPartitioner");

Kafka Producer

在开发生产者的时候，先简单的介绍下kafka各种配置说明：

bootstrap.servers： kafka的地址。
acks:消息的确认机制，默认值是0。
acks=0：如果设置为0，生产者不会等待kafka的响应。
acks=1：这个配置意味着kafka会把这条消息写到本地日志文件中，但是不会等待集群中其他机器的成功响应。
acks=all：这个配置意味着leader会等待所有的follower同步完成。这个确保消息不会丢失，除非kafka集群中所有机器挂掉。这是最强的可用性保证。
retries：配置为大于0的值的话，客户端会在消息发送失败时重新发送。
batch.size:当多条消息需要发送到同一个分区时，生产者会尝试合并网络请求。这会提高client和生产者的效率。
key.serializer: 键序列化，默认org.apache.kafka.common.serialization.StringDeserializer。
value.deserializer:值序列化，默认org.apache.kafka.common.serialization.StringDeserializer。

还有更多配置，可以去查看官方文档，这里就不在说明了。
那么我们kafka 的producer配置如下(集群搭建参考 kafka 的架构一之入门)

		Properties props = new Properties();
        props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.1:9093,127.0.0.1:9094");
        props.put("acks", "all");
        props.put("retries", 0);
        props.put("batch.size", 200);
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", StringSerializer.class.getName());
        this.producer = new KafkaProducer<String, String>(props);

在写好生产者程序之后，那我们先来生产吧！
我这里发送的消息为:

String message = “你好，这是第” + i + “条数据”;
并且只发送10条就退出，结果如下:

发送的信息:你好，这是第0条数据
发送的信息:你好，这是第1条数据
发送的信息:你好，这是第2条数据
发送的信息:你好，这是第3条数据
发送的信息:你好，这是第4条数据
发送的信息:你好，这是第5条数据
发送的信息:你好，这是第6条数据
发送的信息:你好，这是第7条数据
发送的信息:你好，这是第8条数据
发送的信息:你好，这是第9条数据

可以看到信息成功的打印了。
如果不想用程序进行验证程序是否发送成功，以及消息发送的准确性，可以在kafka服务器上使用命令查看。

Kafka Consumer

kafka消费这块应该来说是重点，毕竟大部分的时候，我们主要使用的是将数据进行消费。

kafka消费的配置如下:

bootstrap.servers： kafka的地址。
group.id：组名不同组名可以重复消费。例如你先使用了组名A消费了kafka的1000条数据，但是你还想再次进行消费这1000条数据，并且不想重新去产生，那么这里你只需要更改组名就可以重复消费了。
enable.auto.commit：是否自动提交，默认为true。
auto.commit.interval.ms: 从poll(拉)的回话处理时长。
session.timeout.ms:超时时间。
max.poll.records:一次最大拉取的条数。
auto.offset.reset：消费规则，默认earliest 。
earliest: 当各分区下有已提交的offset时，从提交的offset开始消费；无提交的offset时，从头开始消费。
latest: 当各分区下有已提交的offset时，从提交的offset开始消费；无提交的offset时，消费新产生的该分区下的数据。
none: topic各分区都存在已提交的offset时，从offset后开始消费；只要有一个分区不存在已提交的offset，则抛出异常。
key.serializer: 键序列化，默认org.apache.kafka.common.serialization.StringDeserializer。
value.deserializer:值序列化，默认org.apache.kafka.common.serialization.StringDeserializer。

那么我们kafka 的consumer配置如下:

        Properties props = new Properties();
        props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.1:9093,127.0.0.1:9094");
        props.put("group.id", "GroupA");
        props.put("enable.auto.commit", "true");
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.put("auto.offset.reset", "earliest");
        props.put("key.deserializer", StringDeserializer.class.getName());
        props.put("value.deserializer", StringDeserializer.class.getName());
        this.consumer = new KafkaConsumer<String, String>(props);
        this.topic = topicName;
        this.consumer.subscribe(Arrays.asList(topic));

由于我这是设置的自动提交，所以消费代码如下:
我们需要先订阅一个topic，也就是指定消费哪一个topic。

consumer.subscribe(Arrays.asList(topic));

订阅之后，我们再从kafka中拉取数据:

ConsumerRecords<String, String> msgList=consumer.poll(100);

一般来说进行消费会使用监听，这里我们就用for(;;)来进行监听，并且设置消费100条就退出！
结果如下:

---------开始消费---------
1=======receive: key = null, value = 你好，这是第1条数据 offset===0 partition==0
2=======receive: key = null, value = 你好，这是第4条数据 offset===1 partition==0
3=======receive: key = null, value = 你好，这是第7条数据 offset===2 partition==0
4=======receive: key = null, value = 你好，这是第0条数据 offset===0 partition==2
5=======receive: key = null, value = 你好，这是第3条数据 offset===1 partition==2
6=======receive: key = null, value = 你好，这是第6条数据 offset===2 partition==2
7=======receive: key = null, value = 你好，这是第9条数据 offset===3 partition==2
8=======receive: key = null, value = 你好，这是第2条数据 offset===0 partition==1
9=======receive: key = null, value = 你好，这是第5条数据 offset===1 partition==1
10=======receive: key = null, value = 你好，这是第8条数据 offset===2 partition==1

可以看到我们这里已经成功消费了生产的数据了。

消费者代码

我们编写四个消费者类，分别为ConsumerA，ConsumerB，ConsumerC，ConsumerD，四个类的代码一样，类名不同。
ConsumerA 的代码如下：

package com.study.kafka;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Arrays;
import java.util.Properties;

public class ConsumerA implements Runnable {
    private final KafkaConsumer<String, String> consumer;
    private ConsumerRecords<String, String> msgList;
    private final String topic;
    private static final String GROUPID = "GroupA";

    public KafkaConsumerTestB(String topicName) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.1:9093");
        props.put("group.id", GROUPID);
        props.put("enable.auto.commit", "true");
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.put("auto.offset.reset", "earliest");
        props.put("key.deserializer", StringDeserializer.class.getName());
        props.put("value.deserializer", StringDeserializer.class.getName());
        this.consumer = new KafkaConsumer<String, String>(props);
        this.topic = topicName;
        this.consumer.subscribe(Arrays.asList(topic));
    }

    public void run() {
        int messageNo = 1;
        System.out.println("---------开始消费---------");
        try {
            for (; ; ) {
                msgList = consumer.poll(1000);
                if (null != msgList && msgList.count() > 0) {
                    for (ConsumerRecord<String, String> record : msgList) {
                        //消费100条就打印 ,但打印的数据不一定是这个规律的
                        System.out.println(messageNo + "=======receive: key = " + record.key() + ", value = " + record.value() + " offset===" + record.offset() + " partition==" + record.partition());
                        messageNo++;
                    }
                } else {
                    Thread.sleep(1000);
                }
            }
        } catch (InterruptedException e) {
            e.printStackTrace();
        } finally {
            consumer.close();
        }
    }

    public static void main(String args[]) {
        KafkaConsumerTestB test1 = new KafkaConsumerTestB("test1");
        Thread thread1 = new Thread(test1);
        thread1.start();
    }
}

生产者代码

package com.study.kafka;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

public class KafkaProducerTest implements Runnable {

    private final KafkaProducer<String, String> producer;
    private final String topic;

    public KafkaProducerTest(String topicName) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.1:9093,127.0.0.1:9094");
        props.put("acks", "all");
        props.put("retries", 0);
        props.put("batch.size", 16384);
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", StringSerializer.class.getName());
        this.producer = new KafkaProducer<String, String>(props);
        this.topic = topicName;
    }

    public void run() {

        try {
            for (int i = 0; i < 10; i++) {
                String messageStr = "你好，这是第" + i + "条数据";
                producer.send(new ProducerRecord<String, String>(topic,  messageStr));
                //生产了100条就打印
                System.out.println("发送的信息:" + messageStr);
                Thread.sleep(100);
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            producer.close();
        }
    }

    public static void main(String args[]) {
        KafkaProducerTest test = new KafkaProducerTest("test1");
        Thread thread = new Thread(test);
        thread.start();
    }
}

分区消费关系

我们先启动生产者KafkaProducerTest，生产10条数据，数据内容如下：

发送的信息:你好，这是第0条数据
发送的信息:你好，这是第1条数据
发送的信息:你好，这是第2条数据
发送的信息:你好，这是第3条数据
发送的信息:你好，这是第4条数据
发送的信息:你好，这是第5条数据
发送的信息:你好，这是第6条数据
发送的信息:你好，这是第7条数据
发送的信息:你好，这是第8条数据
发送的信息:你好，这是第9条数据

只启动消费者ConsumerA，消费数据内容如下，发现消费者ConsumerA消费了三个分区的所有的10条数据。

---------开始消费---------
1=======receive: key = null, value = 你好，这是第1条数据 offset===0 partition==0
2=======receive: key = null, value = 你好，这是第4条数据 offset===1 partition==0
3=======receive: key = null, value = 你好，这是第7条数据 offset===2 partition==0
4=======receive: key = null, value = 你好，这是第0条数据 offset===0 partition==2
5=======receive: key = null, value = 你好，这是第3条数据 offset===1 partition==2
6=======receive: key = null, value = 你好，这是第6条数据 offset===2 partition==2
7=======receive: key = null, value = 你好，这是第9条数据 offset===3 partition==2
8=======receive: key = null, value = 你好，这是第2条数据 offset===0 partition==1
9=======receive: key = null, value = 你好，这是第5条数据 offset===1 partition==1
10=======receive: key = null, value = 你好，这是第8条数据 offset===2 partition==1

消费者和分区的关系如图：

启动ConsumerA 和 ConsumerB 消费者
ConsumerA 消费数据内容如下，ConsumerA消费了分区partition_0和分区partition_1共7条数据

---------开始消费---------
1=======receive: key = null, value = 你好，这是第1条数据 offset===3 partition==0
2=======receive: key = null, value = 你好，这是第4条数据 offset===4 partition==0
3=======receive: key = null, value = 你好，这是第7条数据 offset===5 partition==0
4=======receive: key = null, value = 你好，这是第0条数据 offset===3 partition==1
5=======receive: key = null, value = 你好，这是第3条数据 offset===4 partition==1
6=======receive: key = null, value = 你好，这是第6条数据 offset===5 partition==1
7=======receive: key = null, value = 你好，这是第9条数据 offset===6 partition==1

ConsumerB 消费数据内容如下，ConsumerB消费分区partition_2共3条数据

---------开始消费---------
1=======receive: key = null, value = 你好，这是第2条数据 offset===4 partition==2
2=======receive: key = null, value = 你好，这是第5条数据 offset===5 partition==2
3=======receive: key = null, value = 你好，这是第8条数据 offset===6 partition==2

从消费信息中可以看出，ConsumerA 和 ConsumerB 一起消费了所有消息。
消费者和分区的关系如图：

启动ConsumerA 和 ConsumerB，ConsumerC 三个消费者
ConsumerA 消费数据内容如下，ConsumerA 消费分区partition_0共4条数据

---------开始消费---------
1=======receive: key = null, value = 你好，这是第0条数据 offset===6 partition==0
2=======receive: key = null, value = 你好，这是第3条数据 offset===7 partition==0
3=======receive: key = null, value = 你好，这是第6条数据 offset===8 partition==0
4=======receive: key = null, value = 你好，这是第9条数据 offset===9 partition==0

ConsumerB 消费数据内容如下，ConsumerB 消费分区partition_1共3条数据

---------开始消费---------
1=======receive: key = null, value = 你好，这是第2条数据 offset===7 partition==1
2=======receive: key = null, value = 你好，这是第5条数据 offset===8 partition==1
3=======receive: key = null, value = 你好，这是第8条数据 offset===9 partition==1

ConsumerC 消费数据内容如下，ConsumerC 消费分区partition_2共3条数据

---------开始消费---------
1=======receive: key = null, value = 你好，这是第1条数据 offset===7 partition==2
2=======receive: key = null, value = 你好，这是第4条数据 offset===8 partition==2
3=======receive: key = null, value = 你好，这是第7条数据 offset===9 partition==2

从消费信息中可以看出，ConsumerA 和 ConsumerB ，ConsumerC一起消费了所有消息。消费者和分区的关系如图：

启动ConsumerA 和 ConsumerB，ConsumerC ，ConsumerD 四个消费者
ConsumerA 消费数据内容如下，ConsumerA 消费分区partition_1共4条数据

---------开始消费---------
1=======receive: key = null, value = 你好，这是第0条数据 offset===10 partition==1
2=======receive: key = null, value = 你好，这是第3条数据 offset===11 partition==1
3=======receive: key = null, value = 你好，这是第6条数据 offset===12 partition==1
4=======receive: key = null, value = 你好，这是第9条数据 offset===13 partition==1

ConsumerB 消费数据内容如下，ConsumerB 消费分区partition_2共3条数据

---------开始消费---------
1=======receive: key = null, value = 你好，这是第2条数据 offset===10 partition==2
2=======receive: key = null, value = 你好，这是第5条数据 offset===11 partition==2
3=======receive: key = null, value = 你好，这是第8条数据 offset===12 partition==2

ConsumerC 消费数据内容如下，ConsumerC 消费分区partition_0共3条数据

---------开始消费---------
1=======receive: key = null, value = 你好，这是第1条数据 offset===10 partition==0
2=======receive: key = null, value = 你好，这是第4条数据 offset===11 partition==0
3=======receive: key = null, value = 你好，这是第7条数据 offset===12 partition==0

ConsumerD 没有消费任何数据

---------开始消费---------

结论
一个topic 可以配置几个Partition，Producer发送的消息分发到不同的Partition中，Consumer接受数据的时候是按照group来接受，kafka确保每个Partition只能同一个group中的同一个Consumer消费，如果想要重复消费，那么需要其他的组来消费。

顺序消费

Kafka比传统消息系统有更强的顺序性保证，它使用主题的分区作为消息处理的并行单元。Kafka以分区作为最小的粒度，将每个分区分配给消费者组中不同的而且是唯一的消费者，并确保一个分区只属于一个消费者，即这个消费者就是这个分区的唯一读取线程。那么，只要分区的消息是有序的，消费者处理的消息顺序就有保证。每个主题有多个分区，不同的消费者处理不同的分区，所以Kafka不仅保证了消息的有序性，也做到了消费者的负载均衡。
使用 producer.send(new ProducerRecord<String, String>(topic,“test”, messageStr)) 发送消息，消息被发送到同一个分区。

public class KafkaProducerTest implements Runnable {

    private final KafkaProducer<String, String> producer;
    private final String topic;

    public KafkaProducerTest(String topicName) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.1:9093,127.0.0.1:9094");
        props.put("acks", "all");
        props.put("retries", 0);
        props.put("batch.size", 16384);
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", StringSerializer.class.getName());
        this.producer = new KafkaProducer<String, String>(props);
        this.topic = topicName;
    }

    public void run() {

        try {
            for (int i = 1; i <= 10; i++) {
                String messageStr = "你好，这是第" + i + "条数据";
                producer.send(new ProducerRecord<String, String>(topic,"test", messageStr));
                //生产了100条就打印
                System.out.println("发送的信息:" + messageStr);
                Thread.sleep(100);
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            producer.close();
        }
    }

    public static void main(String args[]) {
         KafkaProducerTest test = new KafkaProducerTest("test1");
        Thread thread = new Thread(test);
        thread.start();
    }
}

发送的信息:你好，这是第1条数据
发送的信息:你好，这是第2条数据
发送的信息:你好，这是第3条数据
发送的信息:你好，这是第4条数据
发送的信息:你好，这是第5条数据
发送的信息:你好，这是第6条数据
发送的信息:你好，这是第7条数据
发送的信息:你好，这是第8条数据
发送的信息:你好，这是第9条数据
发送的信息:你好，这是第10条数据

ConsumerA 和 ConsumerB没有消费任何消息

---------开始消费---------

ConsumerC 消费所有10条消息，10条消息都被发送到partition=2 中

---------开始消费---------
1=======receive: key = test, value = 你好，这是第1条数据 offset===333 partition==2
2=======receive: key = test, value = 你好，这是第2条数据 offset===334 partition==2
3=======receive: key = test, value = 你好，这是第3条数据 offset===335 partition==2
4=======receive: key = test, value = 你好，这是第4条数据 offset===336 partition==2
5=======receive: key = test, value = 你好，这是第5条数据 offset===337 partition==2
6=======receive: key = test, value = 你好，这是第6条数据 offset===338 partition==2
7=======receive: key = test, value = 你好，这是第7条数据 offset===339 partition==2
8=======receive: key = test, value = 你好，这是第8条数据 offset===340 partition==2
9=======receive: key = test, value = 你好，这是第9条数据 offset===341 partition==2
10=======receive: key = test, value = 你好，这是第10条数据 offset===342 partition==2

Kafka 集群扩容

修改config/server.properties文件里的broker.id必须在集群中唯一，修改其他必要的配置项，其中zookeeper.connect配置项，写上kafka集群现在使用的zookeeper集群的地址。然后启动kafka就可以加入到集群中了。但是新加入的机器只能对新产生的topic起作用，对已有的topic在没有做处理前，是不会承担任何任务的，所以不会分担集群的压力。

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=3
listeners=PLAINTEXT://:9095
log.dirs=/Users/yangyanping/Downloads/server/kafka/log/node4
offsets.topic.replication.factor=3

在log文件夹下创建node4，然后启动新增的Broker

****-2f32839f6:node4 yangyanping$ ./bin/kafka-server-start.sh config/server.properties

使用zkCli.sh 查看节点信息，broker.id=3已添加到集群中。

[zk: localhost:2181(CONNECTED) 2] ls /brokers/ids
[0, 1, 2, 3]

查看test1 消息主题信息，发现新增的Broker4 没有承担任何工作

ZBMAC-2f32839f6:node4 yangyanping$ ./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test1
Topic: test1	PartitionCount: 3	ReplicationFactor: 3	Configs: 
	Topic: test1	Partition: 0	Leader: 0	Replicas: 0,1,2	Isr: 0,1,2
	Topic: test1	Partition: 1	Leader: 1	Replicas: 1,2,0	Isr: 1,2,0
	Topic: test1	Partition: 2	Leader: 2	Replicas: 2,0,1	Isr: 2,0,1

Kafka 集群缩容

我们来模拟Broker节点宕机的情况，查找Broker3节点的进程Id = 5393

***-2f32839f6:~ yangyanping$ ps -e |grep kafka
***-2f32839f6:~ yangyanping$ kill -9 5393

查看test1 消息主题信息，发现Leader 中没有Broker2

****-2f32839f6:node4 yangyanping$ ./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test1
    Topic: test1	PartitionCount: 3	ReplicationFactor: 3	Configs: 
	Topic: test1	Partition: 0	Leader: 0	Replicas: 0,1,2	Isr: 0,1
	Topic: test1	Partition: 1	Leader: 1	Replicas: 1,2,0	Isr: 1,0
	Topic: test1	Partition: 2	Leader: 0	Replicas: 2,0,1	Isr: 0,1

kafka中不同topic使用同一个Group Id会出现的问题分析

在 Kafka 中，消费者组（consumer group）是一组逻辑上相同的消费者，它们共同消费订阅的 topic 中的消息。如果不同 topic 使用同一个 Group ID，那么它们将视为*同一组消费者，相当于每个消费者实例只能消费组中的一个主题，而其他主题的分区将不会被消费。同时，如果消息被放置在某个主题的分区中，在该主题下任何消费者组中的消费者都有可能消费到该消息，这可能导致数据重复消费或者数据消费不均，会降低整个消费群组的消费效率。因此，在实际应用中，应该为每个 topic 创建一个独立的 Group ID，以确保消费者能够正确地消费对应 topic 下的所有消息。

参考

https://blog.csdn.net/fy_java1995/article/details/106399318
https://blog.csdn.net/qq_41941497/article/details/131070576

yangyanping20108

关注

0
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
kafka原理三之partition

主题分区数。kafka通过分区策略，将不同的分区分配在一个集群中的broker上，一般会分散在不同的broker上，当只有一个broker时，所有的分区就只分配到该Broker上。消息会通过负载均衡发布到不同的分区上，消费者会监测偏移量来获取哪个分区有新数据，从而从该分区上拉取消息数据。分区数越多，在一定程度上会提升消息处理的吞吐量，因为kafka是基于文件进行读写，因此也需要打开更多的文件句柄，也会增加一定的性能开销。如果分区过多，那么日志分段也会很多，写的时候由于是批量写，其实就会变成随机写了，随机 I
复制链接

扫一扫

专栏目录