kafka 重复数据读取原因

最新推荐文章于 2024-08-09 11:06:07 发布

llflilongfei

最新推荐文章于 2024-08-09 11:06:07 发布

阅读量5.9k

点赞数 1

分类专栏： kafka 文章标签： kafka

本文链接：https://blog.csdn.net/llflilongfei/article/details/81483509

版权

本文探讨了Kafka数据重复读取的原因，主要由于数据处理时间超过max.poll.interval.ms，导致offset自动提交失败。解决方案包括增大max.poll.interval.ms、减小max.poll.records或切换至手动提交offset。

摘要由CSDN通过智能技术生成

之前一直遇到kafka数据读取重复的问题，但都通过一些方式去避免了，今天专门去探究了下原因。出现这个问题，一般都是设置kafkaoffset自动提交的时候发生的。原因在于数据处理时间大于max.poll.interval.ms（默认300s），导致offset自动提交失败，以致offset没有提交。重新读取数据的时候又会读取到kafka之前消费但没有提交offset的数据，从而导致读取到重复数据。具体错误如下：

[WARN]2018-08-07 16:31:51,502 method:org.apache.kafka.clients.consumer.internals.ConsumerCoordinator. maybeAutoCommitOffsetsSync (ConsumerCoordinator.java:664)
[Consumer clientId=consumer-1, groupId=first] Synchronous auto-commit of offsets {autoCommit3-0=OffsetAndMetadata{offset=44 , metadata=''}, autoCommit3-1=OffsetAndMetadata{offset=60, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much ti