集群环境:1个nimbus 1个supervisor(ssh免密码登录)
org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused at org.apache.thrift7.transport.TSocket.open(TSocket.java:183) at org.apache.thrift7.transport.TFramedTransport.open(TFramedTransport.java:81) at backtype.storm.thrift$nimbus_client_and_conn.invoke(thrift.clj:75) at backtype.storm.ui.core$supervisor_summary.invoke(core.clj:479) at backtype.storm.ui.core$fn__8225.invoke(core.clj:791) at compojure.core$make_route$fn__3365.invoke(core.clj:93) at compojure.core$if_route$fn__3353.invoke(core.clj:39) at compojure.core$if_method$fn__3346.invoke(core.clj:24) at compojure.core$routing$fn__3371.invoke(core.clj:106) at clojure.core$some.invoke(core.clj:2443) at compojure.core$routing.doInvoke(core.clj:106) at clojure.lang.RestFn.applyTo(RestFn.java:139) at clojure.core$apply.invoke(core.clj:619) at compojure.core$routes$fn__3375.invoke(core.clj:111) at ring.middleware.reload$wrap_reload$fn__7540.invoke(reload.clj:14) at backtype.storm.ui.core$catch_errors$fn__8268.invoke(core.clj:858) at ring.middleware.keyword_params$wrap_keyword_params$fn__4029.invoke(keyword_params.clj:27) at ring.middleware.nested_params$wrap_nested_params$fn__4068.invoke(nested_params.clj:65) at ring.middleware.params$wrap_params$fn__4001.invoke(params.clj:55) at ring.middleware.multipart_params$wrap_multipart_params$fn__4096.invoke(multipart_params.clj:103) at ring.middleware.flash$wrap_flash$fn__4277.invoke(flash.clj:14) at ring.middleware.session$wrap_session$fn__4266.invoke(session.clj:43) at ring.middleware.cookies$wrap_cookies$fn__4197.invoke(cookies.clj:160) at ring.adapter.jetty$proxy_handler$fn__7179.invoke(jetty.clj:16) at ring.adapter.jetty.proxy$org.mortbay.jetty.handler.AbstractHandler$0.handle(Unknown Source) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at org.apache.thrift7.transport.TSocket.open(TSocket.java:178)
storm运行过程中UI页面莫名其妙出现这种错误
1.进入集群,主从服务器互ping,发现网络是正常连接的。
2.查看进程发现:nimbus上jps只有Jps Core QuorumPeerMain supervisor上jps只有jps
首先查看nimbus日志文件,发现:
2014-09-19 13:41:30 o.a.z.ClientCnxn [INFO] Unable to read additional data from server sessionid 0x1488b886fe70001, likely server has closed socket, closing socket connection and attempting reconnect 2014-09-19 13:41:30 o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED 2014-09-19 13:41:30 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper. 2014-09-19 13:41:30 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered. 2014-09-19 13:41:31 o.a.z.ClientCnxn [INFO] Opening socket connection to server slave2/192.168.195.202:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2014-09-19 13:41:37 o.a.z.ClientCnxn [WARN] Session 0x1488b886fe70001 for server null, unexpected error, closing socket connection and attempting reconnect
查看supervisor日志文件,发现:
2014-09-19 13:41:45 o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 17162ms for sessionid 0x2488b886ff10001, closing socket connection and attempting reconnect 2014-09-19 13:41:51 o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED 2014-09-19 13:41:54 o.a.z.ClientCnxn [INFO] Opening socket connection to server master/192.168.195.199:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2014-09-19 13:41:54 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered. 2014-09-19 13:41:55 o.a.z.ClientCnxn [INFO] Socket connection established to master/192.168.195.199:2181, initiating session 2014-09-19 13:41:57 o.a.z.ClientCnxn [INFO] Unable to read additional data from server sessionid 0x2488b886ff10001, likely server has closed socket, closing socket connection and attempting reconnect 2014-09-19 13:41:58 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper. 2014-09-19 13:42:00 o.a.z.ClientCnxn [INFO] Opening socket connection to server slave2/192.168.195.202:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2014-09-19 13:42:05 o.a.z.ClientCnxn [WARN] Session 0x2488b886ff10001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.NoRouteToHostException: No route to host
然后查看master上的zookeeper日志文件,发现:
2014-09-19 13:41:29,400 [myid:1] - WARN [SyncThread:1:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:1 took 4575ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 2014-09-19 13:41:30,453 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2014-09-19 13:41:30,533 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
然后查看slave上的zookeeper日志文件,发现:
2014-09-19 13:41:35,621 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@490] - Shutting down 2014-09-19 13:41:45,574 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@496] - Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 1 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) 2014-09-19 13:41:43,820 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.195.199:58486 2014-09-19 13:41:42,856 [myid:2] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x1488b886fe70009, timeout of 20000ms exceeded 2014-09-19 13:41:45,576 [myid:2] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x1488b886fe70001, timeout of 20000ms exceeded 2014-09-19 13:41:45,576 [myid:2] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x1488b886fe7000b, timeout of 20000ms exceeded 2014-09-19 13:41:46,003 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x1488b886fe70009 2014-09-19 13:41:46,003 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x1488b886fe70001 2014-09-19 13:41:46,004 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x1488b886fe7000b 2014-09-19 13:41:46,003 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2014-09-19 13:41:46,005 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.195.199:58486 (no session established for client) 2014-09-19 13:41:46,247 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception