[2]Storm Bug Fix:supervisor {taskid} still hasn't started

本文介绍了解决Storm集群中Topology提交后一直处于分派状态的问题。通过分析日志发现是由于ZMQ连接失败引起,并最终定位到主机名解析问题。提供了解决方案以确保正确配置hosts文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

原创文章,欢迎转载。转载请注明出处:http://blog.csdn.net/jmppok/article/details/17073397


1.问题描述

在Storm中提交Topology后,一直处于分派状态,查看Supervisor日至,显示

2013-12-02 14:49:52 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:52 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:53 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:53 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:54 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:54 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:55 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started

注意:如果只显示几次后停止,则说明worker启动成功,或Task已被转移到其他supervisor。

只有不停的显示该消息才说明执行task的worker无法启动成功。


通过查看worker的日志,可看到详细的错误信息:

2013-12-02 13:28:02 worker [ERROR] Error on initialization of server mk-worker
org.zeromq.ZMQException: Invalid argument(0x16)
        at org.zeromq.ZMQ$Socket.connect(Native Method)
        at zilch.mq$connect.invoke(mq.clj:74)
        at backtype.storm.messaging.zmq.ZMQContext.connect(zmq.clj:65)
        at backtype.storm.daemon.worker$mk_refresh_connections$this__4293$iter__4300__4304$fn__4305.invoke(worker.clj:244)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:473)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$dorun.invoke(core.clj:2725)
        at clojure.core$doall.invoke(core.clj:2741)
        at backtype.storm.daemon.worker$mk_refresh_connections$this__4293.invoke(worker.clj:238)
        at backtype.storm.daemon.worker$fn__4348$exec_fn__1228__auto____4349.invoke(worker.clj:351)
        at clojure.lang.AFn.applyToHelper(AFn.java:185)
        at clojure.lang.AFn.applyTo(AFn.java:151)
        at clojure.core$apply.invoke(core.clj:601)
        at backtype.storm.daemon.worker$fn__4348$mk_worker__4404.doInvoke(worker.clj:323)
        at clojure.lang.RestFn.invoke(RestFn.java:512)
        at backtype.storm.daemon.worker$_main.invoke(worker.clj:433)
        at clojure.lang.AFn.applyToHelper(AFn.java:172)
        at clojure.lang.AFn.applyTo(AFn.java:151)
        at backtype.storm.daemon.worker.main(Unknown Source)



具体该看哪个work的log可以通过观察supervisor.log中的启动命令获得,如在supervisor.log中看到如下信息:

2013-12-02 14:49:51 supervisor [INFO] Launching worker with command: java -server -Xmx768m  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dlogfile.name=worker-6703.log -Dstorm.home=/opt/storm -Dlog4j.configuration=storm.log.properties -cp /opt/storm/storm-0.8.2.jar:/opt/storm/lib/commons-exec-1.1.jar:/opt/storm/lib/jetty-util-6.1.26.jar:/opt/storm/lib/minlog-1.2.jar:/opt/storm/lib/snakeyaml-1.9.jar:/opt/storm/lib/clj-time-0.4.1.jar:/opt/storm/lib/compojure-1.1.3.jar:/opt/storm/lib/curator-framework-1.0.1.jar:/opt/storm/lib/joda-time-2.0.jar:/opt/storm/lib/reflectasm-1.07-shaded.jar:/opt/storm/lib/log4j-1.2.16.jar:/opt/storm/lib/json-simple-1.1.jar:/opt/storm/lib/jline-0.9.94.jar:/opt/storm/lib/hiccup-0.3.6.jar:/opt/storm/lib/slf4j-log4j12-1.5.8.jar:/opt/storm/lib/clojure-1.4.0.jar:/opt/storm/lib/asm-4.0.jar:/opt/storm/lib/carbonite-1.5.0.jar:/opt/storm/lib/servlet-api-2.5.jar:/opt/storm/lib/servlet-api-2.5-20081211.jar:/opt/storm/lib/disruptor-2.10.1.jar:/opt/storm/lib/ring-servlet-0.3.11.jar:/opt/storm/lib/junit-3.8.1.jar:/opt/storm/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/lib/core.incubator-0.1.0.jar:/opt/storm/lib/tools.macro-0.1.0.jar:/opt/storm/lib/math.numeric-tower-0.0.1.jar:/opt/storm/lib/zookeeper-3.3.3.jar:/opt/storm/lib/curator-client-1.0.1.jar:/opt/storm/lib/libthrift7-0.7.0.jar:/opt/storm/lib/tools.cli-0.2.2.jar:/opt/storm/lib/tools.logging-0.2.3.jar:/opt/storm/lib/jgrapht-0.8.3.jar:/opt/storm/lib/kryo-2.17.jar:/opt/storm/lib/guava-13.0.jar:/opt/storm/lib/commons-logging-1.1.1.jar:/opt/storm/lib/ring-core-1.1.5.jar:/opt/storm/lib/commons-codec-1.4.jar:/opt/storm/lib/httpclient-4.1.1.jar:/opt/storm/lib/commons-lang-2.5.jar:/opt/storm/lib/commons-io-1.4.jar:/opt/storm/lib/slf4j-api-1.5.8.jar:/opt/storm/lib/jetty-6.1.26.jar:/opt/storm/lib/jzmq-2.1.0.jar:/opt/storm/lib/httpcore-4.1.jar:/opt/storm/lib/clout-1.0.1.jar:/opt/storm/lib/commons-fileupload-1.2.1.jar:/opt/storm/lib/objenesis-1.2.jar:/opt/storm/log4j:/opt/storm/conf:/tmp/storm_tmp/supervisor/stormdist/mytest-2-1385966991/stormjar.jar backtype.storm.daemon.worker mytest-2-1385966991 dc89a2b5-267f-4ed8-b94a-f900ed6300e4 6703 0916c7a9-c47d-43ae-9d88-13ec574ee5e6
2013-12-02 14:49:51 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:52 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:52 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:53 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:53 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:54 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:54 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started

注意第一行命令中的worker-6703.log,就是它了。


2.问题解决办法

Storm中关于ZMQ和ZooKeeper连接错误的问题,一般都是本机的host配置有问题导致无法连接。需要在Storm集群中的所有节点,进行如下修改:

1)添加本机IP和主机名的信息,如192.168.0.2    node1

2)添加Strom Cluster中其他主机的信息,如192.168.0.3  node2

                                                                              192.168.0.4 node3


从而使ZMQ或Zookeeper在连接时能解析到正确的主机。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值