在分布式消息系统Kafka中,消息的分区策略对于数据的均衡分布和系统的可扩展性至关重要。本文将通过一个具体的实例,详细解析Kafka中如何使用键(key)来决定消息的分区,以及在键不存在时如何进行分区分配。
Kafka分区基础
在Kafka中,一个主题(Topic)可以被分为多个分区(Partition),每个分区由一个或多个副本(Replica)组成,以保证数据的高可用性。当生产者(Producer)发送消息时,可以指定消息的键和值。如果未显式指定分区,则可以使用键来决定消息将发送到哪个分区。所有具有相同键的消息将被发送到同一个分区。
创建主题
首先,我们需要创建一个具有指定分区数的主题。以下是一个使用Kafka Admin API创建主题的Java示例代码:
package com.logicbig.example;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.AdminClientConfig;
import org.apache.kafka.clients.admin.NewTopic;
import java.util.Collections;
import java.util.Properties;
import java.util.stream.Collectors;
public class TopicCreator {
public static void main(String[] args) throws Exception {
createTopic("example-topic-2020-4-21", 4);
}
private static void createTopic(String topicName, int numPartitions) throws Exception {
Properties config = new Properties();
config.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, ExampleHelper.BROKERS);
AdminClient admin = AdminClient.create(config);
boolean alreadyExists = admin.listTopics().names().get().stream()
.anyMatch(existingTopicName -> existingTopicName.equals(topicName));
if (alreadyExists) {
System.out.printf("topic already exists: %s%n", topicName);
} else {
System.out.printf("creating topic: %s%n", topicName);
NewTopic newTopic = new NewTopic(topicName, numPartitions, (short) 1);
admin.createTopics(Collections.singleton(newTopic)).all().get();
}
System.out.println("-- describing topic --");
admin.describeTopics(Collections.singleton(topicName)).all().get()
.forEach((topic, desc) -> {
System.out.println("Topic: " + topic);
System.out.printf("Partitions: %s, partition ids: %s%n", desc.partitions().size(),
desc.partitions()
.stream()
.map(p -> Integer.toString(p.partition()))
.collect(Collectors.joining(",")));
});
admin.close();
}
}
使用键进行分区分配
在发送消息时,如果未指定分区,Kafka将使用键来决定消息的分区。以下是一个发送消息的示例:
package com.logicbig.example;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class PartitionAssignmentExample {
private static int PARTITION_COUNT = 4;
private static String TOPIC_NAME = "example-topic-2020-4-21";
private static int MSG_COUNT = 4;
public static void main(String[] args) throws Exception {
ExecutorService executorService = Executors.newFixedThreadPool(2);
executorService.execute(PartitionAssignmentExample::startConsumer);
executorService.execute(PartitionAssignmentExample::sendMessages);
executorService.shutdown();
executorService.awaitTermination(10, TimeUnit.MINUTES);
}
private static void startConsumer() {
Properties consumerProps = ExampleHelper.getConsumerProps(TOPIC_NAME);
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProps);
consumer.subscribe(Collections.singleton(TOPIC_NAME));
int numMsgReceived = 0;
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(2));
for (ConsumerRecord<String, String> record : records) {
numMsgReceived++;
System.out.printf("consumed: key = %s, value = %s, partition id= %s, offset = %s%n",
record.key(), record.value(), record.partition(), record.offset());
}
consumer.commitSync();
if (numMsgReceived == MSG_COUNT * PARTITION_COUNT) {
break;
}
}
}
private static void sendMessages() {
Properties producerProps = ExampleHelper.getProducerProps();
KafkaProducer<String, String> producer = new KafkaProducer<>(producerProps);
for (int i = 0; i < MSG_COUNT; i++) {
for (int partitionId = 0; partitionId < PARTITION_COUNT; partitionId++) {
String value = "message-" + i;
String key = "key-" + i;
System.out.printf("Sending message topic: %s, key: %s, value: %s, partition id: %s%n",
TOPIC_NAME, key, value, "not-specified");
producer.send(new ProducerRecord<>(TOPIC_NAME, key, value));
}
}
}
}
总结
通过上述示例,我们可以看到,使用键进行分区分配可以确保相同键的消息被发送到同一个分区,这对于需要保证消息顺序的场景非常重要。同时,当键不存在时,Kafka会使用轮询算法将消息随机分配到可用的分区中,以实现负载均衡。
技术栈
- Kafka版本:Apache Kafka 2.4.1
- Java版本:JDK 8
- 构建工具:Maven 3.5.4
希望本文能帮助你更好地理解Kafka的分区策略和消息发送机制。如果你有任何疑问或需要进一步的帮助,请随时联系我们。