Kafka平台问题总结

 

总结一下运营kafka平台过程中遇到的问题。

 

 

网络限制

      新扩容的机器在由于其他原因导致在交换机上做了流量限制,出只能达到300+Mb,这个问题很隐蔽,不容易被发现。解决方式也很容易直接去掉限流即可。

 

 

 

 

Zookeeper连接上限

      基于kafka客户端做了一层包装,某些版本上有bug会导致zookeeper连接有泄漏,很容易就超过一个ip Max60个连接的限制,最终导致访问zookeeper失败。

 

a01.zookeeper.kafka.javagc
     60 192.168.200.36
     60 192.168.200.193
     60 192.168.200.19
     59 192.168.200.35
     59 192.168.200.194

 

 

 

ConsumerRebalanceFailedException

     consumer rebalancing fails (you will see ConsumerRebalanceFailedException): This is due to conflicts when two consumers are trying to own the same topic partition. The log will show you what caused the conflict (search for "conflict in ").
    If your consumer subscribes to many topics and your ZK server is busy, this could be caused by consumers not having enough time to see a consistent view of all consumers in the same group. If this is the case, try Increasing rebalance.max.retries and rebalance.backoff.ms.
    Another reason could be that one of the consumers is hard killed. Other consumers during rebalancing won't realize that consumer is gone after zookeeper.session.timeout.ms time. In the case, make sure that rebalance.max.retries * rebalance.backoff.ms > zookeeper.session.timeout.ms.

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped,why?>

问题分析:http://blog.csdn.net/lizhitao/article/details/49589825

 

kafka 0.9以前的consumer client的设计不太好,建议升级至0.9+版本,这部分重新设计了。

Consumer Client Re-Design

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design

 

 

 

NotLeaderForPartitionException

 

kafka.common.NotLeaderForPartitionException: null
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.7.0_76]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) ~[na:1.7.0_76]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.7.0_76]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[na:1.7.0_76]
        at java.lang.Class.newInstance(Class.java:379) ~[na:1.7.0_76]
        at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:70) ~[stormjar.jar:na]
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:157) ~[stormjar.jar:na]
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:157) ~[stormjar.jar:na]
        at kafka.utils.Logging$class.warn(Logging.scala:88) [stormjar.jar:na]
        at kafka.utils.ShutdownableThread.warn(ShutdownableThread.scala:23) [stormjar.jar:na]
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:156) [stormjar.jar:na]
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:112) [stormjar.jar:na]
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:105) [stormjar.jar:na]
        at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:112) [stormjar.jar:na]
        at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88) [stormjar.jar:na]
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) [stormjar.jar:na]

 kafka Server挂起,需要检查log.dirs参数中配置的所有路径,磁盘损坏的情况会发生这种情况。

 

 

 

阶段性网络中断

      一台kafka机器2个小时一个周期,会有2分钟中断通讯,导致Leader切换。排查不到原因,很可能是交换机或其他设备做了限制,直接下线解决。

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值