Kafka 异常 : DefaultOffsetCommitCallback.onComplete(ConsumerCoordinator.java:537) -Offset commit failed
异常详情:
ConsumerCoordinator$DefaultOffsetCommitCallback.onComplete(ConsumerCoordinator.java:537) -Offset commit failed.
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:600)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:541)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
环境参数,大数据环境 HDP, kafka 版本 2.0.0,spark版本 2.3.2
由于业务逻辑的修改,调大 SparkStreaming的批次间的时间间隔至5分钟。启动服务之后就会报上述的错误。
从报错内容来看是消费者在执行poll操作时超过了这个线程的默认最大的空闲时间,导致消费者组认为该消费者已离开消费者组,所以消费者组执行了再均衡操作,从而导致了sparkStreaming poll 失败。
解决方法:通过调大这个空闲时间的参数(max.poll.interval.ms)来解决这个问题。
但是现实情况确实即便是调大到1小时,这个错误依旧会出现。
继续查看日志,到任务启动阶段我们可以找到下面这部分Kafka消费者日志信息。
2022-03-29 14:09:49,285 INFO org.apache.kafka.common.config.AbstractConfig.logAll(AbstractConfig.java:178) -ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [192.168.1.177:6667, 192.168.0.130:6667, 192.168.0.117:6667]
ssl.keystore.type = JKS
enable.auto.commit = false
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id = consumer-1
ssl.endpoint.identification.algorithm = null
max.poll.records = 2147483647
check.crcs = true
request.timeout.ms = 600000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
group.id = tinyeyePortScan
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 480000
session.timeout.ms = 300000
metrics.num.samples = 2
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = latest
2022-03-29 14:09:49,322 WARN org.apache.kafka.common.config.AbstractConfig.logUnused(AbstractConfig.java:186) -The configuration max.poll.interval.ms = 450000 was supplied but isn't a known config.
2022-03-29 14:09:49,326 INFO org.apache.kafka.common.utils.AppInfoParser$AppInfo.<init>(AppInfoParser.java:83) -Kafka version : 0.10.0.1
2022-03-29 14:09:49,326 INFO org.apache.kafka.common.utils.AppInfoParser$AppInfo.<init>(AppInfoParser.java:84) -Kafka commitId : a7a17cdec9eaa6c5
值得注意的是Kafka日志信息显示 没有 max.poll.interval.ms 这个配置项,并且 kafka 的版本为 0.10.0.1。
这说明我们的配置的kafka配置项没有生效。二是我们的kafka实际的集群版本是 2.0.0 与实际的消费者API版本不符。
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
如果直接在maven项目中引用 spark-streaming-kafka-0-10_ 默认同时会引用 kafka-client 版本是 0.10.0.1的 kafka api,而我们需要配置的 max.poll.interval.ms 在0.10.0.1版本中还不是可配置项,所以才会出现上面的情况。
解决方法
在pom中添加
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.0.0</version>
</dependency>
配置完成之后, max.poll.interval.ms 参数生效,问题解决。