具体错误如下:
Heartbeat failed: local member_id was not recognized; resetting and re-joining group
Heartbeat session expired - marking coordinator dead
Marking the coordinator dead (node 112583)for group fba_inventory_summary_data_report_sync_to_user_db.
OffsetCommit failed for group fba_inventory_summary_data_report_sync_to_user_db due to group error ([Error 25] UnknownMemberIdError: fba_inventory_summary_data_report_sync_to_user_db), will rejoin
OffsetCommit failed for group fba_inventory_summary_data_report_sync_to_user_db due to group error ([Error 25] UnknownMemberIdError: fba_inventory_summary_data_report_sync_to_user_db), will rejoin
OffsetCommit failed for group fba_inventory_summary_data_report_sync_to_user_db due to group error ([Error 25] UnknownMemberIdError: fba_inventory_summary_data_report_sync_to_user_db), will rejoin
OffsetCommit failed for group fba_inventory_summary_data_report_sync_to_user_db due to group error ([Error 25] UnknownMemberIdError: fba_inventory_summary_data_report_sync_to_user_db), will rejoin
Auto offset commit failed: [Error 25] UnknownMemberIdError: fba_inventory_summary_data_report_sync_to_user_db
OffsetCommit failed for group fba_inventory_summary_data_report_sync_to_user_db due to group error ([Error 25] UnknownMemberIdError: fba_inventory_summary_data_report_sync_to_user_db), will rejoin
OffsetCommit failed for group fba_inventory_summary_data_report_sync_to_user_db due to group error ([Error 25] UnknownMemberIdError: fba_inventory_summary_data_report_sync_to_user_db), will rejoin
OffsetCommit failed for group fba_inventory_summary_data_report_sync_to_user_db due to group error ([Error 25] UnknownMemberIdError: fba_inventory_summary_data_report_sync_to_user_db), will rejoin
OffsetCommit failed for group fba_inventory_summary_data_report_sync_to_user_db due to group error ([Error 25] UnknownMemberIdError: fba_inventory_summary_data_report_sync_to_user_db), will rejoin
OffsetCommit failed before join, ignoring: CommitFailedError: ('Commit cannot be completed since the group has already\n rebalanced and assigned the partitions to another member.\n This means that the time between subsequent calls to poll()\n was longer than the configured max_poll_interval_ms, which\n typically implies that the poll loop is spending too much\n time message processing. You can address this either by\n increasing the rebalance timeout with max_poll_interval_ms,\n or by reducing the maximum size of batches returned in poll()\n with max_poll_records.\n ', 'Commit cannot be completed since the group has already rebalanced and may have assigned the partitions to another member')
错误分析:
根据错误信息的提示是max_poll_interval_ms设置太小,但经过修改后发现并不是这个原因;
再次查看错误信息,怀疑是消费组心跳没有发送/发送心跳超市导致服务端认为消费组已经死亡,从而当消费组发起心跳后会出现异常。
解决:
1.修改了很多消费组的配置,诸如session_timeout_ms、max_poll_interval_ms,最终都没有什么效果;
并且session_timeout_ms设置过大还会造成另外的问题:
Error sending JoinGroupRequest_v2 to node 112581 [[Error 7] RequestTimedOutError] -- marking coordinator dead
2. 最终只有了一个治标的方法:在耗时并且只能同步执行的代码处加入0.1睡眠时间,这0.1秒的睡眠时间可以让消费组去发送心跳
其他
如果其他更好的解决方案,欢迎在评论区中评论一下