上文地址: https://blog.csdn.net/a290450134/article/details/100519107
业务场景上从kafka消费数据,根据业务类型,走不同的业务处理路线,有两种业务类型处理耗时较高。默认一次poll500条,碰巧两种业务类型比较集中,导致处理超过max.poll.interval.ms。然后就是rebalance无限的重复消费,偏移量一直不能提交。然后根据业务量计算了一下峰值总和,75Wms左右,TPS不能短时间提高。又看了一下消费速度和生产速度,消费是生产的4倍,就算都是大的业务类型,也是2倍多的速度。
所以最后降低了max.poll.records。监控了一下日志,问题解决。
spring-kafka信息:
Synchronous auto-commit of offsets {input_std_npanther-0=OffsetAndMetadata{offset=373646, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
总结:开始的错误表象就是,数据重复消费,报表看到的数据时间不动,数据量增加。通过报表知道数据在重复消费,然后观察kafka日志就是当前的group一直在加入新代。最开始是一个组消费两个topic,以为是这地方的问题,查看到spring日志才看到精准的问题信息。TPS太低 kafka认为consumer故障了。