之前一直遇到kafka数据读取重复的问题,但都通过一些方式去避免了,今天专门去探究了下原因。出现这个问题,一般都是设置kafkaoffset自动提交的时候发生的。原因在于数据处理时间大于max.poll.interval.ms(默认300s),导致offset自动提交失败,以致offset没有提交。重新读取数据的时候又会读取到kafka之前消费但没有提交offset的数据,从而导致读取到重复数据。具体错误如下:
[WARN]2018-08-07 16:31:51,502 method:org.apache.kafka.clients.consumer.internals.ConsumerCoordinator. maybeAutoCommitOffsetsSync (ConsumerCoordinator.java:664)
[Consumer clientId=consumer-1, groupId=first] Synchronous auto-commit of offsets {autoCommit3-0=OffsetAndMetadata{offset=44 , metadata=''}, autoCommit3-1=OffsetAndMetadata{offset=60, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much ti

本文探讨了Kafka数据重复读取的原因,主要由于数据处理时间超过max.poll.interval.ms,导致offset自动提交失败。解决方案包括增大max.poll.interval.ms、减小max.poll.records或切换至手动提交offset。
最低0.47元/天 解锁文章
3289

被折叠的 条评论
为什么被折叠?



