现象:k8s 部署kafka broker 频繁重启
[root@k8s-master-01 ~]# kubectl get po | grep kafka
Connection to 1 was disconnected before the response was read
[2023-05-29 06:50:28,983] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error sending fetch request (sessionId=799450652, epoch=20 267) to node 1: java.io.IOException: Connection to 1 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandl er)
[2023-05-29 06:50:28,993] WARN [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error in response for fetch request (type=FetchRequest, re plicaId=0, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={v2xRTData-0=(offset=103315, logStartOffset=0, maxBytes=1048576)}, isolationLe vel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=799450652, epoch=20267)) (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 1 was disconnected before the response was read
at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:237)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:40)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:149)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2023-05-29 06:50:31,014] WARN [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2023-05-29 06:50:31,014] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error sending fetch request (sessionId=799450652, epoch=IN ITIAL) to node 1: java.io.IOException: Connection to 10.244.7.147:9092 (id: 1 rack: null) failed.. (org.apache.kafka.clients.FetchSessionHandle r)
[2023-05-29 06:50:31,014] WARN [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error in response for fetch request (type=FetchRequest, re plicaId=0, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={__consumer_offsets-17=(offset=0, logStartOffset=0, maxBytes=1048576), __consu mer_offsets-28=(offset=2, logStartOffset=0, maxBytes=1048576), __consumer_offsets-32=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer _offsets-10=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-47=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_of fsets-44=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-14=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offse ts-40=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-29=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets- 11=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-41=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-22= (offset=0, logStartOffset=0, maxBytes=1048576), v2xRTData-0=(offset=103315, logStartOffset=0, maxBytes=1048576), __consumer_offsets-23=(offset= 0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-26=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-34=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-4=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-8=(offset=0, logSt artOffset=0, maxBytes=1048576), __consumer_offsets-38=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-16=(offset=15, logStar tOffset=0, maxBytes=1048576), __consumer_offsets-20=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-5=(offset=0, logStartOff set=0, maxBytes=1048576), __consumer_offsets-46=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-35=(offset=0, logStartOffset =0, maxBytes=1048576), __consumer_offsets-2=(offset=0, logStartOffset=0, maxBytes=1048576)}, isolationLevel=READ_UNCOMMITTED, toForget=, metada ta=(sessionId=799450652, epoch=INITIAL)) (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 10.244.7.147:9092 (id: 1 rack: null) failed.
at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:70)
at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:91)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:237)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:40)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:149)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2023-05-29 06:50:33,016] WARN [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2023-05-29 06:50:33,016] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error sending fetch request (sessionId=799450652, epoch=IN ITIAL) to node 1: java.io.IOException: Connection to 10.244.7.147:9092 (id: 1 rack: null) failed.. (org.apache.kafka.clients.FetchSessionHandle r)
[2023-05-29 06:50:33,016] WARN [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error in response for fetch request (type=FetchRequest, re plicaId=0, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={__consumer_offsets-17=(offset=0, logStartOffset=0, maxBytes=1048576), __consu mer_offsets-28=(offset=2, logStartOffset=0, maxBytes=1048576), __consumer_offsets-32=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer _offsets-10=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-47=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_of fsets-44=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-14=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offse ts-40=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-29=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets- 11=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-41=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-22= (offset=0, logStartOffset=0, maxBytes=1048576), v2xRTData-0=(offset=103315, logStartOffset=0, maxBytes=1048576), __consumer_offsets-23=(offset= 0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-26=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-34=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-4=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-8=(offset=0, logSt artOffset=0, maxBytes=1048576), __consumer_offsets-38=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-16=(offset=15, logStar tOffset=0, maxBytes=1048576), __consumer_offsets-20=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-5=(offset=0, logStartOff set=0, maxBytes=1048576), __consumer_offsets-46=(offset=0, logStartOffset=0, maxBytes=1048576), __consumer_offsets-35=(offset=0, logStartOffset =0, maxBytes=1048576), __consumer_offsets-2=(offset=0, logStartOffset=0, maxBytes=1048576)}, isolationLevel=READ_UNCOMMITTED, toForget=, metada ta=(sessionId=799450652, epoch=INITIAL)) (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 10.244.7.147:9092 (id: 1 rack: null) failed.
at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:70)
at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:91)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:237)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:40)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:149)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2023-05-29 06:50:33,565] DEBUG [Controller id=0] Broker 2 has been elected as the controller, so stopping the election process. (kafka.control ler.KafkaController)
https://stackoverflow.com/questions/55689109/kafka-broker-meet-fatal-issue-error-log-like-connection-to-x-was-disconnected
failed to elect leader for partition xxx-0 under strategy OfflinePartitionLeaderElectionStrategy
[2023-05-29 07:07:13,675] INFO [PartitionStateMachine controllerId=1] Triggering online partition state changes (kafka.controller.PartitionStateMachine)
[2023-05-29 07:07:13,704] ERROR [Controller id=1 epoch=2568] Controller 1 epoch 2568 failed to change state for partition euhtTerminalData36p-0 from OfflinePartition to OnlinePartition (state.change.logger)
kafka.common.StateChangeFailedException: Failed to elect leader for partition euhtTerminalData36p-0 under strategy OfflinePartitionLeaderElectionStrategy
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:328)
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:326)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:326)
at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:254)
at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:175)
at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116)
at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:106)
at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:62)
at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:253)
at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1212)
at kafka.controller.KafkaController$Reelect$.process(KafkaController.scala:1509)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2023-05-29 07:07:13,705] ERROR [Controller id=1 epoch=2568] Controller 1 epoch 2568 failed to change state for partition euhtTerminalData6683-0 from OfflinePartition to OnlinePartition (state.change.logger)
kafka.common.StateChangeFailedException: Failed to elect leader for partition euhtTerminalData6683-0 under strategy OfflinePartitionLeaderElectionStrategy
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:328)
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:326)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:326)
at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:254)
at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:175)
at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116)
at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:106)
at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:62)
at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:253)
at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1212)
at kafka.controller.KafkaController$Reelect$.process(KafkaController.scala:1509)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2023-05-29 07:07:13,705] ERROR [Controller id=1 epoch=2568] Controller 1 epoch 2568 failed to change state for partition cdctest_topic-0 from OfflinePartition to OnlinePartition (state.change.logger)
kafka.common.StateChangeFailedException: Failed to elect leader for partition cdctest_topic-0 under strategy OfflinePartitionLeaderElectionStrategy
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:328)
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:326)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:326)
at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:254)
at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:175)
at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116)
at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:106)
at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:62)
at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:253)
at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1212)
at kafka.controller.KafkaController$Reelect$.process(KafkaController.scala:1509)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
https://blog.csdn.net/h952520296/article/details/110871486
增加的副本的offset比leader的新 所以在elect的时候 出现问题
Controller 2’s connection to broker 10.244.7.147:9092 (id: 1 rack: null) was unsuccessful
[2023-05-29 07:17:10,449] WARN [Controller id=2, targetBrokerId=1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2023-05-29 07:17:10,450] WARN [RequestSendThread controllerId=2] Controller 2's connection to broker 10.244.7.147:9092 (id: 1 rack: null) was unsuccessful (kafka.controller.RequestSendThread)
java.io.IOException: Connection to 10.244.7.147:9092 (id: 1 rack: null) failed.
at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:70)
at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:279)
at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
相关stackoverflow: