1.Exception:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
2.判断consumer是否活跃的机制
Detecting Consumer Failures
After subscribing to a set of topics, the consumer will automatically join the group whenpoll(long)
is invoked. The poll API is designed to ensure consumer liveness. As long as you continue to call poll, the consumer will stay in the group and continue to receive messages from the partitions it was assigned. Underneath the covers, the consumer sends periodic heartbeats to the server. If the consumer crashes or is unable to send heartbeats for a duration ofsession.timeout.ms
, then the consumer will be considered dead and its partitions will be reassigned.It is also possible that the consumer could encounter a "livelock" situation where it is continuing to send heartbeats, but no progress is being made. To prevent the consumer from holding onto its partitions indefinitely in this case, we provide a liveness detection mechanism using the
max.poll.interval.ms
setting. Basically if you don't call poll at least as frequently as the configured max interval, then the client will proactively leave the group so that another consumer can take over its partitions. When this happens, you may see an offset commit failure (as indicated by aCommitFailedException
thrown from a call tocommitSync()
). This is a safety mechanism which guarantees that only active members of the group are able to commit offsets. So to stay in the group, you must continue to call poll.The consumer provides two configuration settings to control the behavior of the poll loop:
max.poll.interval.ms
: By increasing the interval between expected polls, you can give the consumer more time to handle a batch of records returned frompoll(long)
. The drawback is that increasing this value may delay a group rebalance since the consumer will only join the rebalance inside the call to poll. You can use this setting to bound the time to finish a rebalance, but you risk slower progress if the consumer cannot actually callpoll
often enough.max.poll.records
: Use this setting to limit the total records returned from a single call to poll. This can make it easier to predict the maximum that must be handled within each poll interval. By tuning this value, you may be able to reduce the poll interval, which will reduce the impact of group rebalancing.For use cases where message processing time varies unpredictably, neither of these options may be sufficient. The recommended way to handle these cases is to move message processing to another thread, which allows the consumer to continue calling
poll
while the processor is still working. Some care must be taken to ensure that committed offsets do not get ahead of the actual position. Typically, you must disable automatic commits and manually commit processed offsets for records only after the thread has finished handling them (depending on the delivery semantics you need). Note also that you will need topause
the partition so that no new records are received from poll until after thread has finished handling those previously returned.
判断consumer是否存活的两个参数
session.timeout.ms is for heartbeat thread. If coordinator fails to get any heartbeat from a consumer before this time interval elapsed, it marks consumer as failed and triggers a new round of rebalance.
heartbeat.interval.ms is used to have other healthy consumers aware of the rebalance much faster. If coordinator triggers a rebalance, other consumers will only know of this by receiving the heartbeat response with REBALANCE_IN_PROGRESS exception encapsulated. Quicker the heartbeat request is sent, faster the consumer knows it needs to rejoin the group.a relatively low value, better 1/3 of the session.timeout.ms
判断consumer存活期间是否干活的两个参数
max.poll.interval.ms
: By increasing the interval between expected polls, you can give the consumer more time to handle a batch of records returned from poll(long)
. The drawback is that increasing this value may delay a group rebalance since the consumer will only join the rebalance inside the call to poll. You can use this setting to bound the time to finish a rebalance, but you risk slower progress if the consumer cannot actually call poll
often enough.
max.poll.records
: Use this setting to limit the total records returned from a single call to poll. This can make it easier to predict the maximum that must be handled within each poll interval. By tuning this value, you may be able to reduce the poll interval, which will reduce the impact of group rebalancing.
3.解决方案代码
修改配置参数,调大间隔 或 调小一次处理的最大任务数量
props.put("max.poll.records", 300); // default 500
props.put("max.poll.interval.ms", "600000"); // default 300000
参考文章
http://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
https://blog.csdn.net/shibuwodai_/article/details/80678717
https://www.jianshu.com/p/1120e26244c2