kafka 运维中遇到的问题

1,java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled

kafka 进程失败,重启失败 检查日志发现这个,原因是磁盘满了,清除 kafka-logs 即可

2,kafka定时清理日志不生效

网上很多文档,说是要设置log.retention.hour等等参数。
默认是保留7天,但我实测下来发现日志根本没有任何变化。

原因是因为kafka只会回收上个分片的数据

配置没有生效的原因就是,数据并没有分片,所以没有回收

在配置中加入 log.roll.hours=12既可以解决问题 12小时切一次片

log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=24
log.roll.hours=12
log.cleanup.policy=delete
log.retention.bytes=5368709120

至少需要配置这些信息 

3、 Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner  authentication information from the user

确认自己的配置文件没问题的情况下 可以先

kdestroy 

kinit -kt /var/lib/keytab/kafka.keytab kafka

这样清空缓存再连接

4、Failed to elect leader for partition crawer-contact-info-23 under strategy 

问题原因 新增加的副本的offset 副本的offset比leader的新 所以在elect的时候 出现问题
解决方法 :
在kafka的home path 的bin目录下 执行自带平衡topic 脚本

kafka-preferred-replica-election.sh --zookeeper 192.168.1.66:2181

 然后重启 kafka 问题解决

5、由于zookeeper挂掉,造成kafka出现:There are 60 offline partitions。

问题原因:由于kafka之前Topic在zookeeper中的数据还在,再重新建立会产生冲突导致失败。

进入Zookeeper中将之前的脏数据删掉再重启kafka。

#1.进入zookeeper
sh /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/zookeeper/bin/zkCli.sh
#2.删除掉脏数据
deleteall /brokers/topics

6、 kafka消费异常

Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.

15:10:10.857 logback [org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-info01-3, groupId=info01] Failing OffsetCommit request since the consumer is not part of an active group
15:10:10.857 logback [org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:151)
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:103)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1241)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1002)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1109)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:976)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1511)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2149)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2134)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.timedAcks(KafkaMessageListenerContainer.java:1981)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:1961)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1036)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:970)
... 3 common frames omitted
15:10:10.858 logback [org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-info01-3, groupId=info01] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
15:10:10.858 logback [org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-info01-3, groupId=info01] Lost previously assigned partitions zxk-ann-5
15:10:10.858 logback [org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1] INFO o.s.k.l.KafkaMessageListenerContainer - info01: partitions lost: [zxk-ann-5]
15:10:10.858 logback [org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1] INFO o.s.k.l.KafkaMessageListenerContainer - info01: partitions revoked: [zxk-ann-5]
15:10:10.858 logback [org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-info01-3, groupId=info01] (Re-)joining group
15:10:10.871 logback [org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-info01-3, groupId=info01] Join group failed with org.apache.kafka.common.errors.MemberIdRequiredException: The group member needs to have a valid member id before actually entering a consumer group
15:10:10.871 logback [org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-info01-3, groupId=info01] (Re-)joining group

原因分析:

1,kafka数据量激增,因为这个程序前两天未报错,并且前两条未发现不良问题。

2,kafka 组件与其他组件混装在一个集群,性能被其他组件占用。

3,问题三解决:当前消费的数据是年报数据,数据体过大,当前一次拉的批次 max.poll.records 过大,拉取的批 max.poll.interval.ms 超时,当调整 max.poll.records 为 50 后正常(默认500);

 客户端调大 max.poll.interval.ms 参数,或者调小 max.poll.records 参数,使得一个批次中消息消费时间别超过 session.timeout.ms 。

参数解析

max.poll.interval.ms	使用消费者组管理时poll()调用之间的最大延迟。消费者在获取更多记录之前可以空闲的时间量的上限。如果此超时时间期满之前poll()没有调用,则消费者被视为失败,并且分组将重新平衡,以便将分区重新分配给别的成员。
max.poll.records	在单次调用poll()中返回的最大消息数。

  • 6
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值