[2]Storm Bug Fix:supervisor {taskid} still hasn't started

原创 2013年12月02日 15:06:46

原创文章,欢迎转载。转载请注明出处:http://blog.csdn.net/jmppok/article/details/17073397


1.问题描述

在Storm中提交Topology后,一直处于分派状态,查看Supervisor日至,显示

2013-12-02 14:49:52 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:52 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:53 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:53 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:54 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:54 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started
2013-12-02 14:49:55 supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started

注意:如果只显示几次后停止,则说明worker启动成功,或Task已被转移到其他supervisor。

只有不停的显示该消息才说明执行task的worker无法启动成功。


通过查看worker的日志,可看到详细的错误信息:

2013-12-02 13:28:02 worker [ERROR] Error on initialization of server mk-worker
org.zeromq.ZMQException: Invalid argument(0x16)
        at org.zeromq.ZMQ$Socket.connect(Native Method)
        at zilch.mq$connect.invoke(mq.clj:74)
        at backtype.storm.messaging.zmq.ZMQContext.connect(zmq.clj:65)
        at backtype.storm.daemon.worker$mk_refresh_connections$this__4293$iter__4300__4304$fn__4305.invoke(worker.clj:244)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:473)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$dorun.invoke(core.clj:2725)
        at clojure.core$doall.invoke(core.clj:2741)
        at backtype.storm.daemon.worker$mk_refresh_connections$this__4293.invoke(worker.clj:238)
        at backtype.storm.daemon.worker$fn__4348$exec_fn__1228__auto____4349.invoke(worker.clj:351)
        at clojure.lang.AFn.applyToHelper(AFn.java:185)
        at clojure.lang.AFn.applyTo(AFn.java:151)
        at clojure.core$apply.invoke(core.clj:601)
        at backtype.storm.daemon.worker$fn__4348$mk_worker__4404.doInvoke(worker.clj:323)
        at clojure.lang.RestFn.invoke(RestFn.java:512)
        at backtype.storm.daemon.worker$_main.invoke(worker.clj:433)
        at clojure.lang.AFn.applyToHelper(AFn.java:172)
        at clojure.lang.AFn.applyTo(AFn.java:151)
        at backtype.storm.daemon.worker.main(Unknown Source)



具体该看哪个work的log可以通过观察supervisor.log中的启动命令获得,如在supervisor.log中看到如下信息:

2013-12-02 14:49:51 supervisor [INFO] Launching worker with command: java -server -Xmx768m  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dlogfile.name=worker-6703.log -Dstorm.home=/opt/storm -Dlog4j.configuration=storm.log.properties -cp /opt/storm/storm-0.8.2.jar:/opt/storm/lib/commons-exec-1.1.jar:/opt/storm/lib/jetty-util-6.1.26.jar:/opt/storm/lib/minlog-1.2.jar:/opt/storm/lib/snakeyaml-1.9.jar:/opt/storm/lib/clj-time-0.4.1.jar:/opt/storm/lib/compojure-1.1.3.jar:/opt/storm/lib/curator-framework-1.0.1.jar:/opt/storm/lib/joda-time-2.0.jar:/opt/storm/lib/reflectasm-1.07-shaded.jar:/opt/storm/lib/log4j-1.2.16.jar:/opt/storm/lib/json-simple-1.1.jar:/opt/storm/lib/jline-0.9.94.jar:/opt/storm/lib/hiccup-0.3.6.jar:/opt/storm/lib/slf4j-log4j12-1.5.8.jar:/opt/storm/lib/clojure-1.4.0.jar:/opt/storm/lib/asm-4.0.jar:/opt/storm/lib/carbonite-1.5.0.jar:/opt/storm/lib/servlet-api-2.5.jar:/opt/storm/lib/servlet-api-2.5-20081211.jar:/opt/storm/lib/disruptor-2.10.1.jar:/opt/storm/lib/ring-servlet-0.3.11.jar:/opt/storm/lib/junit-3.8.1.jar:/opt/storm/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/lib/core.incubator-0.1.0.jar:/opt/storm/lib/tools.macro-0.1.0.jar:/opt/storm/lib/math.numeric-tower-0.0.1.jar:/opt/storm/lib/zookeeper-3.3.3.jar:/opt/storm/lib/curator-client-1.0.1.jar:/opt/storm/lib/libthrift7-0.7.0.jar:/opt/storm/lib/tools.cli-0.2.2.jar:/opt/storm/lib/tools.logging-0.2.3.jar:/opt/storm/lib/jgrapht-0.8.3.jar:/opt/storm/lib/kryo-2.17.jar:/opt/storm/lib/guava-13.0.jar:/opt/storm/lib/commons-logging-1.1.1.jar:/opt/storm/lib/ring-core-1.1.5.jar:/opt/storm/lib/commons-codec-1.4.jar:/opt/storm/lib/httpclient-4.1.1.jar:/opt/storm/lib/commons-lang-2.5.jar:/opt/storm/lib/commons-io-1.4.jar:/opt/storm/lib/slf4j-api-1.5.8.jar:/opt/storm/lib/jetty-6.1.26.jar:/opt/storm/lib/jzmq-2.1.0.jar:/opt/storm/lib/httpcore-4.1.jar:/opt/storm/lib/clout-1.0.1.jar:/opt/storm/lib/commons-fileupload-1.2.1.jar:/opt/storm/lib/objenesis-1.2.jar:/opt/storm/log4j:/opt/storm/conf:/tmp/storm_tmp/supervisor/stormdist/mytest-2-1385966991/stormjar.jar backtype.storm.daemon.worker mytest-2-1385966991 dc89a2b5-267f-4ed8-b94a-f900ed6300e4 6703 0916c7a9-c47d-43ae-9d88-13ec574ee5e6
2013-12-02 14:49:51 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:52 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:52 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:53 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:53 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:54 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started
2013-12-02 14:49:54 supervisor [INFO] 0916c7a9-c47d-43ae-9d88-13ec574ee5e6 still hasn't started

注意第一行命令中的worker-6703.log,就是它了。


2.问题解决办法

Storm中关于ZMQ和ZooKeeper连接错误的问题,一般都是本机的host配置有问题导致无法连接。需要在Storm集群中的所有节点,进行如下修改:

1)添加本机IP和主机名的信息,如192.168.0.2    node1

2)添加Strom Cluster中其他主机的信息,如192.168.0.3  node2

                                                                              192.168.0.4 node3


从而使ZMQ或Zookeeper在连接时能解析到正确的主机。

Storm work 启动一直still hasn't started

遇到此问题,首先查看storm目录下的logs文件。 若发现supervisor.log中如果出现很多still hasn't started。并且没有提示worker出现错误。 查看worker日志...
  • ddmonk
  • ddmonk
  • 2015年02月23日 15:52
  • 1996

storm0.9.0.1升级安装

1,下载0.9.0.1 http://storm.incubator.apache.org/downloads.html2,安装就jdk,python,zookeeper  在服务器 "192.168...

vhost device still attached , ovs crash bug fix

vhost device still attached , ovs crash

Storm上的Nimbus、Supervisor以及Worker之间的关系

转自:http://blog.csdn.net/cuihaolong/article/details/52652686 1.Storm中各节点介绍 1.1 主控节点和工作节点 ...

Twitter Storm源代码分析之Nimbus/Supervisor本地目录结构

Twitter Storm源代码分析之Nimbus/Supervisor本地目录结构 发表于 2012 年 01 月 04 日 由 xumingming 作者: xumingmin...

eclipse oxygen 4.7.0 bug 518987 fix

  • 2017年09月11日 14:58
  • 615KB
  • 下载

IE 常见bug 及其fix

  • 2013年03月25日 14:00
  • 244KB
  • 下载

Storm上的Nimbus、Supervisor以及Worker之间的关系

1.Storm中各节点介绍 1.1 主控节点和工作节点 Storm将每个节点分为主控节点和工作节点两种,其中主控节点只有一个,工作节点可以有多个。 1.2 Nimbus 主控节点运行Nimbus守护进...

delphi2005 IDE UPDATE FIX BUG 2005-9-24

  • 2006年02月23日 15:31
  • 2.36MB
  • 下载

Storm启动异常[ERROR] Error when processing event,Supervisor启不来

当storm服务非正常终止(如关机没关虚拟机,或机器断电)时,下次再启动storm时会报如下错误 2015-03-28 02:26:08 b.s.d.supervisor [INFO] Star...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:[2]Storm Bug Fix:supervisor {taskid} still hasn't started
举报原因:
原因补充:

(最多只允许输入30个字)