[KAFKA-1724]Errors after reboot in single node setup

24 篇文章 0 订阅

来自Apache官网:

https://issues.apache.org/jira/browse/KAFKA-1724


报错描述,提交者在关闭Kafka时,遇到一个异常错误,详细描述如下:

Details
Type:Bug Bug
Status:RESOLVED
Priority:Major Major
Resolution: Fixed
Affects Version/s:
0.8.2.0
Fix Version/s:
0.9.0.0
Component/s: None
Labels:
newbie

受影响的版本,Kafka0.8.2.0及以前的版本


提交者遇到的报错:

在一个单一节点设置,重启后,遇到了一下问题:

[2014-10-22 16:37:22,206] INFO [Controller 0]: Controller starting up (kafka.controller.KafkaController)
[2014-10-22 16:37:22,419] INFO [Controller 0]: Controller startup complete (kafka.controller.KafkaController)
[2014-10-22 16:37:22,554] INFO conflict in /brokers/ids/0 data: {"jmx_port":-1,"timestamp":"1413995842465","host":"ip-10-91-142-54.eu-west-1.compute.internal","version":1,"port":9092} stored data: {"jmx_port":-1,"timestamp":"1413994171579","host":"ip-10-91-142-54.eu-west-1.compute.internal","version":1,"port":9092} (kafka.utils.ZkUtils$)
[2014-10-22 16:37:22,736] INFO I wrote this conflicted ephemeral node [{"jmx_port":-1,"timestamp":"1413995842465","host":"ip-10-91-142-54.eu-west-1.compute.internal","version":1,"port":9092}] at /brokers/ids/0 a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
[2014-10-22 16:37:25,010] ERROR Error handling event ZkEvent[Data of /controller changed sent to kafka.server.ZookeeperLeaderElector$LeaderChangeListener@a6af882] (org.I0Itec.zkclient.ZkEventThread)
java.lang.IllegalStateException: Kafka scheduler has not been started
        at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
        at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
        at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
        at kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162)
        at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138)
        at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
        at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
        at kafka.utils.Utils$.inLock(Utils.scala:535)
        at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134)
        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
[2014-10-22 16:37:28,757] INFO Registered broker 0 at path /brokers/ids/0 with address ip-10-91-142-54.eu-west-1.compute.internal:9092. (kafka.utils.ZkUtils$)
[2014-10-22 16:37:28,849] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
[2014-10-22 16:38:56,718] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor)
[2014-10-22 16:38:56,850] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor)
[2014-10-22 16:38:56,985] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor)

The last log line repeats forever and is correlated with errors on the app side.
Restarting Kafka fixes the errors.

Steps to reproduce (with help from the mailing list):

  1. start zookeeper
  2. start kafka-broker
  3. create topic or start a producer writing to a topic
  4. stop zookeeper
  5. stop kafka-broker( kafka broker shutdown goes into WARN Session
    0x14938d9dc010001 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused)
  6. kill -9 kafka-broker
  7. restart zookeeper and than kafka-broker leads into the the error above

最后一行一直循环,而且在app端也有同步的错误。

重新启动Kafka来修护这个错误。

过程:

1. 启动Zookeeper

2. 创建一个kafka-broker

3. 创建一个topic和给这个topic写数据的producer

4. 停止Zookeeper

5. 停止kafka-broker(Kafka broker关闭,进入WARN Session 0x14938d9dc010001,报错,关闭socket连接,视图重新连接,

(org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused)

6. kill -9 kafka-broker,使用kill命令关闭kafka-broker

7. 重启zookeeper,kafka-broker出现了以上的错误。


Otis Gospodnetic added a comment -  04/Nov/14 17:31

sriharsha chintalapani - I noticed you assigned yourself to this. Are you working on this by any chance?

Sriharsha Chintalapani added a comment -  04/Nov/14 17:39

Otis Gospodnetic I started working on this. Will send a patch soon.

Otis Gospodnetic added a comment -  04/Nov/14 18:07

Great, thanks! Still aiming for 0.8.2?

Sriharsha Chintalapani added a comment -  14/Nov/14 01:34

Created reviewboard https://reviews.apache.org/r/28027/diff/
against branch origin/trunk

Sriharsha Chintalapani added a comment -  14/Nov/14 01:39

Jun Rao Neha Narkhede
This issue happens in a single node setup as per above steps. When the user brings up zookeeper and immediately starts a kafka broker 
ZookeeperLeaderElector will be able to read /controller data from zookeeper which will gets deleted as its a ephemeral node triggering
ZookeeperLeaderElector.handleDataDeleted calling KafkaController.onControllerResignation
as it tries shutdown KafkaScheduler which isn't started yet causing it throw up IllegalStateException. Please check the patch. Thanks.

Sriharsha Chintalapani added a comment -  02/Dec/14 17:18

Jun Rao can you please look at the reply to your review. Please let me know if this approach makes sense or not. I do see the kafka scheduler error in multi broker env too.

Sriharsha Chintalapani added a comment -  18/Jan/15 17:50

Jun Rao Can you please take a look at my reply to the review. Thanks.

Sriharsha Chintalapani added a comment -  23/Feb/15 23:38

Jun Rao Thanks for the comments on the patch. So it looks like this is already fixed in the trunk. We can close this JIRA.

Jun Rao added a comment -  24/Feb/15 00:09

Sriharsha Chintalapani, so, this is fixed as part of KAFKA-1760?

Sriharsha Chintalapani added a comment -  24/Feb/15 00:20

Jun Rao Yes. We've isStarted in KafkaScheduler which gets set after its started and in shutdown we check isStarted and go through shutdown process.
Tested it in a cluster to reproduce don't see any errors.


提交者在发现Bug后,自己写了一个补丁,提交了。




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值