task和jobmanager不知道为什么挂了
flink-flink-taskexecutor-0-node2.log日志如下:
2021-04-04 10:03:15,058 INFO org.apache.flink.runtime.io.network.netty.NettyConfig - NettyConfig [server address: /192.168.11.132, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 1 (manual), number of client threads: 1 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
2021-04-04 10:03:15,297 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Temporary file directory '/tmp': total 26 GB, usable 22 GB (84.62% usable)
2021-04-04 10:03:16,123 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 102 MB for network buffer pool (number of memory segments: 3278, bytes per segment: 32768).
2021-04-04 10:03:16,197 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Starting the network environment and its components.
2021-04-04 10:03:16,252 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful initialization (took 52 ms).
2021-04-04 10:03:16,309 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful initialization (took 56 ms). Listening on SocketAddress /192.168.11.132:37718.
2021-04-04 10:03:16,310 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Limiting managed memory to 0.7 of the currently free heap space (641 MB), memory will be allocated lazily.
2021-04-04 10:03:16,314 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-5cb46d08-d7bd-41bb-91d0-e67a2ca8ab47 for spill files.
2021-04-04 10:03:16,409 INFO org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have a max timeout of 10000 ms
2021-04-04 10:03:16,421 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/taskmanager_0 .
2021-04-04 10:03:16,438 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2021-04-04 10:03:16,439 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Start job leader service.
2021-04-04 10:03:16,441 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-9bd42cb9-9f68-419a-9381-95693ff61ac5
2021-04-04 10:03:16,452 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting to ResourceManager akka.tcp://flink@localhost:46715/user/resourcemanager(97844b5c0749ea747b4749fffa964081).
2021-04-04 10:03:16,570 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:46715
2021-04-04 10:03:16,577 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@localhost:46715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@localhost:46715]] Caused by: [拒绝连接: localhost/127.0.0.1:46715]
2021-04-04 10:03:16,583 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@localhost:46715/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@localhost:46715/user/resourcemanager..
2021-04-04 10:03:26,617 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:46715
2021-04-04 10:03:26,623 WARN akka.remote.ReliableDeliverySupervisor
......
2021-04-04 10:08:07,454 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:46715
2021-04-04 10:08:07,455 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@localhost:46715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@localhost:46715]] Caused by: [拒绝连接: localhost/127.0.0.1:46715]
2021-04-04 10:08:07,456 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@localhost:46715/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@localhost:46715/user/resourcemanager..
2021-04-04 10:08:16,468 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - Fatal error occurred in TaskExecutor akka.tcp://flink@192.168.11.132:45382/user/taskmanager_0.
org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.
at org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1034)
at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3(TaskExecutor.java:1020)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor.aroundReceive(Actor.scala:502)
at akka.actor.Actor.aroundReceive$(Actor.scala:500)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
2021-04-04 10:08:16,472 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down...
org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.
at org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1034)
at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3(TaskExecutor.java:1020)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor.aroundReceive(Actor.scala:502)
at akka.actor.Actor.aroundReceive$(Actor.scala:500)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
2021-04-04 10:08:16,478 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopping TaskExecutor akka.tcp://flink@192.168.11.132:45382/user/taskmanager_0.
2021-04-04 10:08:16,478 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
2021-04-04 10:08:16,507 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2021-04-04 10:08:16,507 INFO org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - Shutting down TaskExecutorLocalStateStoresManager.
2021-04-04 10:08:16,514 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /tmp/flink-io-5cb46d08-d7bd-41bb-91d0-e67a2ca8ab47
2021-04-04 10:08:16,514 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Shutting down the network environment and its components.
2021-04-04 10:08:16,515 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful shutdown (took 0 ms).
2021-04-04 10:08:16,518 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful shutdown (took 1 ms).
2021-04-04 10:08:16,532 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
2021-04-04 10:08:16,532 INFO org.apache.flink.runtime.filecache.FileCache - removed file cache directory /tmp/flink-dist-cache-9bd42cb9-9f68-419a-9381-95693ff61ac5
2021-04-04 10:08:16,539 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopped TaskExecutor akka.tcp://flink@192.168.11.132:45382/user/taskmanager_0.
2021-04-04 10:08:16,540 INFO org.apache.flink.runtime.blob.PermanentBlobCache - Shutting down BLOB cache
2021-04-04 10:08:16,540 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
2021-04-04 10:08:16,553 INFO org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2021-04-04 10:08:16,565 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Session: 0x10000007e9d0008 closed
2021-04-04 10:08:16,565 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2021-04-04 10:08:16,583 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2021-04-04 10:08:16,594 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x10000007e9d0008
2021-04-04 10:08:16,597 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2021-04-04 10:08:16,601 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2021-04-04 10:08:16,611 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2021-04-04 10:08:16,640 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2021-04-04 10:08:16,641 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2021-04-04 10:08:16,661 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
原因:flink-conf.yaml配置zookeeper错误,改正后
high-availability.zookeeper.quorum: node1:2181,node2:2181,node3:2181
另外lib里面jar的权限改为了755,后面就正确了。
另外,虚拟机直接reboot发现,或3台机器一起启动taskmanager,也可能造成上面的错误,估计是多个taskmanager启动太过于同步导致的