Kafka手动提交Topic的Offset
实现步骤
- 关闭自动提交
props.setProperty("enable.auto.commit", "false");//是否自动提交offset
//props.setProperty("auto.commit.interval.ms", "1000");//提交的间隔时间
- 手动提交Offset
//处理完成以后,手动提交offset
consumer.commitSync();
案例代码
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;
/**
* @ClassName KafkaConsumerClient
* @Description TODO 自定义开发一个消费者,手动提交offset
* @Date 2021/3/30 11:38
* @Create By Frank
*/
public class KafkaConsumerManualCommit {
public static void main(String[] args) {
//todo:1-构建连接,消费者对象
Properties props = new Properties();
props.setProperty("bootstrap.servers", "node1:9092");//服务端地址
props.setProperty("group.id", "test01");//消费者组的id
props.setProperty("enable.auto.commit", "false");//是否自动提交offset
// props.setProperty("auto.commit.interval.ms", "1000");//提交的间隔时间
//指定key和value反序列化的类
props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
//todo:2-订阅Topic,消费Topic的数据
//订阅Topic
consumer.subscribe(Arrays.asList("bigdata01"));
while(true){
//消费数据
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
//取出每一条数据
for (ConsumerRecord<String, String> record : records) {
//获取topic
String topic = record.topic();
//获取分区
int partition = record.partition();
//获取offset
long offset = record.offset();
//获取Key
String key = record.key();
//获取Value
String value = record.value();
System.out.println(topic+"\t"+partition+"\t"+offset+"\t"+key+"\t"+value);
}
//处理完成以后,手动提交offset
consumer.commitSync();
}
//todo:3-实时计算的 程序是不停,没有关闭的过程
}
}
手动提交存在的问题
概述
手动提交Topic Offset的过程中会出现数据重复
原因
offset是分区级别,提交offset是按照整个Topic级别来提交的,只要有一个分区失败,整个提交失败,实际上部分分区已经处理成功了
例子
一个消费者,消费一个Topic,Topic有三个分区:
第一次消费:
- part0
0 hadoop
1 hive
- part1
0 hive
1 spark
2 hue
- part2
0 spark
1 hadoop
解决方法
- 提交offset的时候,按照分区来提交
- 消费成功一个分区,就提交一个分区的offset
案例代码
自定义开发一个消费者,手动提交每个分区的offset:
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import java.time.Duration;
import java.util.*;
/**
* @ClassName KafkaConsumerClient
* @Description TODO 自定义开发一个消费者,手动提交每个分区的offset
* @Date 2021/3/30 11:38
* @Create By Frank
*/
public class KafkaConsumerManualPartitionCommit {
public static void main(String[] args) {
//todo:1-构建连接,消费者对象
Properties props = new Properties();
props.setProperty("bootstrap.servers", "node1:9092");//服务端地址
props.setProperty("group.id", "test01");//消费者组的id
props.setProperty("enable.auto.commit", "false");//是否自动提交offset
// props.setProperty("auto.commit.interval.ms", "1000");//提交的间隔时间
//指定key和value反序列化的类
props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
//todo:2-订阅Topic,消费Topic的数据
//订阅Topic
consumer.subscribe(Arrays.asList("bigdata01"));
while(true){
//消费数据
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
//取出数据中所有分区
Set<TopicPartition> partitions = records.partitions();
//取出每个Partition的数据
for (TopicPartition partition : partitions) {
//将这个partition的数据从records中取出
List<ConsumerRecord<String, String>> partRecords = records.records(partition);
//遍历这个分区的每一条数据
//取出每一条数据
long offset = 0;
for (ConsumerRecord<String, String> record : partRecords) {
//获取topic
String topic = record.topic();
//获取分区
int part= record.partition();
//获取offset
offset = record.offset();
//获取Key
String key = record.key();
//获取Value
String value = record.value();
System.out.println(topic+"\t"+part+"\t"+offset+"\t"+key+"\t"+value);
}
//分区数据处理结束,提交分区的offset
Map<TopicPartition, OffsetAndMetadata> offsets = Collections.singletonMap(partition,new OffsetAndMetadata(offset+1));
consumer.commitSync(offsets);
//将offset存储在MySQL或者ZK中
}
}
//todo:3-实时计算的 程序是不停,没有关闭的过程
}
}