kafka幂等性是针对生产者的,需要开启以下配置
1、enable.idempotence:true
2、retries:大于0,如果小于等于0的话则抛出ConfigException
Exception in thread "main" org.apache.kafka.common.config.ConfigException: Must set retries to non-zero when using the idempotent producer.
at org.apache.kafka.clients.producer.ProducerConfig.maybeOverrideAcksAndRetries(ProducerConfig.java:432)
at org.apache.kafka.clients.producer.ProducerConfig.postProcessParsedConfig(ProducerConfig.java:400)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:110)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:129)
at org.apache.kafka.clients.producer.ProducerConfig.<init>(ProducerConfig.java:481)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:326)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:298)
at org.apache.kafka.clients.producer.MyProducer.main(MyProducer.java:18)
3、max.in.flight.requests.per.connection:小于等于5,如果大于5则抛出ConfigException
Caused by: org.apache.kafka.common.config.ConfigException: Must set max.in.flight.requests.per.connection to at most 5 to use the idempotent producer.
测试代码如下:
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093,localhost:9094");
props.put("acks", "all");
props.put("retries", 3);
props.put("batch.size", 163840);
props.put("linger.ms", 10);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("enable.idempotence",true);
props.put("max.in.flight.requests.per.connection",5);
Producer<String, String> producer = new KafkaProducer<>(props);
for(int i = 0; i < 5; i++){
byte[] log = new byte[904800];
String slog = new String(log);
producer.send(new ProducerRecord<String, String>("test",0, Integer.toString(i), slog));
}
producer.close();
}
这里可以把消息设置的小一点,我就是设置的太大了导致控制台都打印不出来。
接下来可以用这个命令查看消息
sh bin/kafka-dump-log.sh --files kafka-logs-0/test2-0/00000000000000000300.log
或者自己启源码了这样配置一下:
可以看到消息存储如下的格式,重点要关注两个数据,producerId、sequence,producerId是内置的生产者编号,sequence是每条消息的序列号,幂等性就是用sequence来实现的,只有比服务器上记录的sequence大1才会被认为是新消息,其他情况都会抛错,从而实现消息的exactly once。
baseOffset: 15 lastOffset: 15 count: 1 baseSequence: 4 lastSequence: 4 producerId: 19000 producerEpoch: 0 partitionLeaderEpoch: 11 isTransactional: false isControl: false position: 3620163 CreateTime: 1629380795051 size: 904873 magic: 2 compresscodec: NONE crc: 1949418200 isvalid: true
| offset: 15 CreateTime: 1629380795051 keysize: 1 valuesize: 904800 sequence: 4 headerKeys: [] key: 4 payload: