kafka相关资料
- 官方中文文档 http://kafka.apachecn.org/
- kafka java api文档 http://kafka.apachecn.org/10/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
首先介绍下相关的概念
-
producer 发送数据的一方
-
consumer 消费数据的一方
-
consumer group 消费者组,当消费者组中的某个消费者消费掉了分区中的某一条消息时,该组中的其他消费者不会在消费该条数据,消费者必须指定消费者组
-
partition 使kafka能够横向扩展,一个topic可以有多个分区,在创建topic时 kafka根据内部的负载均衡算法 将分区均匀的分布在多个broker上,分区可以提高系统的吞吐量,kafka只在partition中是有序的
-
replica 使kafka具有较高的容错性,当某一台broker挂掉时,仍然有broker包含该挂掉的broker上的数据,备份是针对topic的partition而言的,当备份数为n时就代表每一个partition 共有n个副本
-
offset 偏移量,记录partition中要消费的数据起始位置
kafka 客户端jar包
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>1.0.0</version>
</dependency>
生产者api
Class KafkaProducer<K,V>
// 可以通过Properties类设置一些kafka配置信息来初始化一个生产者对象
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 100; i++)
producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
producer.close();
主要说一下ProducerRecord类
发送消息到kafka的topic中,我们知道kafka中topic的消息是以partition为存储单位的,那我们发送到topic中的消息是怎么放到不同的partition中去的了,如果发送消息时指定了partition,那么消息就会发送到指定的partition中,如果没有指定partition但是设置了Key,那么消息会根据Key的hash值映射到partition中去,如果2者都没有指定,那么消息会通过轮流的方式发送到partition中
Class ProducerRecord<K,V>
ProducerRecord类的构造函数如下
ProducerRecord(java.lang.String topic, java.lang.Integer partition, K key, V value)
Creates a record to be sent to a specified topic and partition
ProducerRecord(java.lang.String topic, java.lang.Integer partition, K key, V value, java.lang.Iterable<Header> headers)
Creates a record to be sent to a specified topic and partition
ProducerRecord(java.lang.String topic, java.lang.Integer partition, java.lang.Long timestamp, K key, V value)
Creates a record with a specified timestamp to be sent to a specified topic and partition
ProducerRecord(java.lang.String topic, java.lang.Integer partition, java.lang.Long timestamp, K key, V value, java.lang.Iterable<Header> headers)
Creates a record with a specified timestamp to be sent to a specified topic and partition
ProducerRecord(java.lang.String topic, K key, V value)
Create a record to be sent to Kafka
ProducerRecord(java.lang.String topic, V value)
Create a record with no key
消费者api
//消费数据后自动提交
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
//手动提交
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "false");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
consumer.commitSync();
buffer.clear();
}
}
//指定消费分区
Properties prop = new Properties();
prop.put("bootstrap.servers","node:9092");
prop.put("group.id","test8");
prop.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
prop.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
Consumer consumer = new KafkaConsumer(prop);
TopicPartition p = new TopicPartition("test2",2);
// 指定消费topic的那个分区
consumer.assign(Arrays.asList(p));
// 指定从某个offset开始消费
consumer.seek(p,5);
while (true) {
ConsumerRecords<String, String> c = consumer.poll(100);
for(ConsumerRecord<String, String> c1: c) {
System.out.println("Key: " + c1.key() + " Value: " + c1.value() + " Offset: " + c1.offset() + " Partitions: " + c1.partition());
}
}