异常:
20160817 18:58:48 ERROR com.xxx.lac.service.impl.ComparePriceServiceImpl-307 kafka-producer-network-thread | lac_compare_price_service_producer_3 - sendComplete execption This server is not the leader for that topic-partition. |
重现异常:
1.开启一个消息提供者,两个broker ,一个topic(4个partition,2个备份)
2.将其中一个broker关闭,再重新启动,等待集群重新选举leader(或主动运行bin/kafka-preferred-replica-election.sh --zookeeper 192.168.9.161:2181,192.168.9.162:2181,192.168.9.163:2181/zhiweikafka)
3.producer端报异常如下:
org.apache.kafka.common.errors.NotLeaderForPartitionException : This server is not the leader for that topic-partition.
[2016-11-01 11:07:44,756] DEBUG Trying to send metadata request to node 2 (org.apache.kafka.clients.NetworkClient)
[2016-11-01 11:07:44,756] DEBUG Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=122,client_id=producer-1}, body={topics=[test]})) to node 2 (org.apache.kafka.clients.NetworkClient)
[2016-11-01 11:07:44,762] DEBUG Updated cluster metadata version 7 to Cluster(nodes = [Node(2, 192.168.11.126, 9092), Node(1, 192.168.11.126, 9091)], partitions = [Partition(topic = test, partition = 1, leader = 1, replicas = [1,2,], isr = [2,1,], Partition(topic = test, partition = 0, leader = 2, replicas = [2,1,], isr = [2,], Partition(topic = test, partition = 3, leader = 1, replicas = [1,2,], isr = [2,1,], Partition(topic = test, partition = 2, leader = 2, replicas = [2,1,], isr = [2,]]) (org.apache.kafka.clients.producer.internals.Metadata)
[2016-11-01 11:07:47,751] DEBUG Initiating connection to node 1 at 192.168.11.126:9091. (org.apache.kafka.clients.NetworkClient)
[2016-11-01 11:07:47,752] DEBUG Completed connection to node 1 (org.apache.kafka.clients.NetworkClient)
kafka-0.8.2\core\src\main\scala\kafka\server\KafkaApis.scala
这里抛出This server is not the leader for that topic-partition. 这个异常并返回给调用方
ps: 如果不主动运行kafka-preferred-replica-election.sh ,集群默认(配置文件auto.leader.rebalance.enable=true)也会进行重新选举 leader操作。
源码 kafka-0.8.2\core\src\main\scala\kafka\server\KafkaConfig.scala 中定义了多久进行一次rebalance 300s即5分钟
解决方案:该异常是由kafka集群broker节点发送变化导致,在producer端配置失败重试次数(retries=3,默认retries=0),
参考相关文档