Kafka平台问题总结

最新推荐文章于 2024-11-14 09:21:17 发布

iteye_4064

最新推荐文章于 2024-11-14 09:21:17 发布

阅读量140

点赞数

分类专栏： Kafka 文章标签：大数据 scala java

本文链接：https://blog.csdn.net/iteye_4064/article/details/82643576

版权

Kafka 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

总结一下运营kafka平台过程中遇到的问题。

网络限制

新扩容的机器在由于其他原因导致在交换机上做了流量限制，出只能达到300+Mb，这个问题很隐蔽，不容易被发现。解决方式也很容易直接去掉限流即可。

Zookeeper连接上限

基于kafka客户端做了一层包装，某些版本上有bug会导致zookeeper连接有泄漏，很容易就超过一个ip Max60个连接的限制，最终导致访问zookeeper失败。

a01.zookeeper.kafka.javagc
     60 192.168.200.36
     60 192.168.200.193
     60 192.168.200.19
     59 192.168.200.35
     59 192.168.200.194

ConsumerRebalanceFailedException

consumer rebalancing fails (you will see ConsumerRebalanceFailedException): This is due to conflicts when two consumers are trying to own the same topic partition. The log will show you what caused the conflict (search for "conflict in ").
If your consumer subscribes to many topics and your ZK server is busy, this could be caused by consumers not having enough time to see a consistent view of all consumers in the same group. If this is the case, try Increasing rebalance.max.retries and rebalance.backoff.ms.
Another reason could be that one of the consumers is hard killed. Other consumers during rebalancing won't realize that consumer is gone after zookeeper.session.timeout.ms time. In the case, make sure that rebalance.max.retries * rebalance.backoff.ms > zookeeper.session.timeout.ms.

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped,why?>

问题分析：http://blog.csdn.net/lizhitao/article/details/49589825

kafka 0.9以前的consumer client的设计不太好，建议升级至0.9+版本，这部分重新设计了。

Consumer Client Re-Design

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design

NotLeaderForPartitionException

kafka.common.NotLeaderForPartitionException: null
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.7.0_76]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) ~[na:1.7.0_76]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.7.0_76]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[na:1.7.0_76]
        at java.lang.Class.newInstance(Class.java:379) ~[na:1.7.0_76]
        at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:70) ~[stormjar.jar:na]
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:157) ~[stormjar.jar:na]
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:157) ~[stormjar.jar:na]
        at kafka.utils.Logging$class.warn(Logging.scala:88) [stormjar.jar:na]
        at kafka.utils.ShutdownableThread.warn(ShutdownableThread.scala:23) [stormjar.jar:na]
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:156) [stormjar.jar:na]
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:112) [stormjar.jar:na]
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:105) [stormjar.jar:na]
        at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:112) [stormjar.jar:na]
        at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88) [stormjar.jar:na]
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) [stormjar.jar:na]

kafka Server挂起,需要检查log.dirs参数中配置的所有路径，磁盘损坏的情况会发生这种情况。