python3.7操作kafka_kafka-python 1.4.7 版本触发的一个 rebalance 问题

weixin_39687621

于 2020-12-10 09:45:34 发布

阅读量389

点赞数

文章标签： python3.7操作kafka

在使用kafka-python 1.4.7时遇到CommitFailedError，提示可能因处理时间超过max_poll_interval_ms或max_poll_records限制。通过调整max_poll_interval_ms和max_poll_records无效后，从broker日志发现频繁rebalance。分析可能由poll主线程阻塞心跳线程导致。解决方案包括减小metadata_max_age_ms以避免阻塞，并关注心跳线程切换。参考GitHub上的相关issue和讨论，问题得到缓解。

摘要由CSDN通过智能技术生成

在使用了最新版的 kafka-python 1.4.7 在 broker 对 topic 进行默认配置的情况下报出类似错误

CommitFailedError

CommitFailedError: Commit cannot be completed since the group has already

rebalanced and assigned the partitions to another member.

This means that the time between subsequent calls to poll()

was longer than the configured max_poll_interval_ms, which

typically implies that the poll loopisspending too much

time message processing. You can addressthiseither by

increasing the rebalance timeout with max_poll_interval_ms,

or by reducing the maximum size of batches returnedinpoll()

with max_poll_records.

这里要申明一点，在 1.4.0 以上的 kafka-python 版本使用了独立的心跳线程去上报心跳。

这里报错大概表达的意思是无法在默认 300000ms 中完成处理操作。我们通常会一次性 poll 拉默认 500 条数据下来。我们需要在 300s 中完成 500 条数据的处理。如果不能完成的话就可能会触发这个问题。

因为这个报错的提示写得非常清楚，所以我们先按这个方向去尝试处理这个问题。首先调高了我们的 max_poll_interval_ms 的时间，但是无效。

然后 records 的条数减少，依然无效，该报错还是会报错。这不禁让我怀疑触发这个问题的是否并非这里报错建议的那些地方。

所以我把目前放到了 broker 日志那边去，想看下到底是因为什么原因导致爆出类似错误。

在日志上发现了一些日志，对应的 consumer 在反复的 rebalance：

[2019-08-18 09:19:29,556] INFO [GroupCoordinator 0]: Member kafka-python-1.4.6-05ed83f1-aa90-4950-b097-4cf467598082 in group sync_group_20180321 has fa

最低0.47元/天解锁文章

weixin_39687621

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。