Flink在执行过程中突然异常退出
Sink: time-kafka(1/1) switched to SCHEDULED
04/29/2019 10:10:20 Job execution switched to status FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase thenumber of slots per TaskManager in the configuration. Task to schedule: < Attempt #10 (Source: source -> (Filter, Timestamps/Watermarks -> Filter) (12/12)) @ (unassigned) - [SCHEDULED] > with groupID < d460da9a057758d795825417554f0e72 > in sharing group < SlotSharingGroup [d460da9a057758d795825417554f0e72, 0f5d1bbb1c312ef7bcca697263389b15, 3b928584ed2bd5c041cea2f3dba3aa0e, a57d18a89c6c239247f95ebb9819ce1e, dabc4aa3951942f45c2de75c800930c3] >. Resources available to scheduler: Number of instances=11, total number of slots=11, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:263)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:142)
at org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$1(Execution.java:440)
at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
at org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:438)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:503)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:891)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:845)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1193)
at org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59)
at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(1/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(2/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(3/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(4/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(5/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(6/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(7/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(8/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(9/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(10/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(11/12) switched to CANCELED
04/29/2019 10:10:20 Source: source -> (Filter, Timestamps/Watermarks -> Filter)(12/12) switched to CANCELED
04/29/2019 10:10:20 counter(1/12) switched to CANCELED
04/29/2019 10:10:20 counter(2/12) switched to CANCELED
04/29/2019 10:10:20 counter(3/12) switched to CANCELED
04/29/2019 10:10:20 counter(4/12) switched to CANCELED
04/29/2019 10:10:20 counter(5/12) switched to CANCELED
04/29/2019 10:10:20 counter(6/12) switched to CANCELED
04/29/2019 10:10:20 counter(7/12) switched to CANCELED
04/29/2019 10:10:20 counter(8/12) switched to CANCELED
04/29/2019 10:10:20 counter(9/12) switched to CANCELED
04/29/2019 10:10:20 counter(10/12) switched to CANCELED
04/29/2019 10:10:20 counter(11/12) switched to CANCELED
04/29/2019 10:10:20 counter(12/12) switched to CANCELED
04/29/2019 10:10:20 Sink: counter-kafka(1/1) switched to CANCELED
04/29/2019 10:10:20 timer1(1/12) switched to CANCELED
04/29/2019 10:10:20 timer1(2/12) switched to CANCELED
04/29/2019 10:10:20 timer1(3/12) switched to CANCELED
04/29/2019 10:10:20 timer1(4/12) switched to CANCELED
04/29/2019 10:10:20 timer1(5/12) switched to CANCELED
04/29/2019 10:10:20 timer1(6/12) switched to CANCELED
04/29/2019 10:10:20 timer1(7/12) switched to CANCELED
04/29/2019 10:10:20 timer1(8/12) switched to CANCELED
04/29/2019 10:10:20 timer1(9/12) switched to CANCELED
04/29/2019 10:10:20 timer1(10/12) switched to CANCELED
04/29/2019 10:10:20 timer1(11/12) switched to CANCELED
04/29/2019 10:10:20 timer1(12/12) switched to CANCELED
04/29/2019 10:10:20 Sink: time-kafka(1/1) switched to CANCELED
04/29/2019 10:10:20 Job execution switched to status FAILED.
2019-04-29 10:10:20,666 INFO org.apache.flink.yarn.YarnClusterClient - Sending shutdown request to the Application Master
2019-04-29 10:10:20,666 INFO org.apache.flink.yarn.YarnClusterClient - Start application client.
2019-04-29 10:10:20,859 INFO org.apache.flink.yarn.ApplicationClient - Notification about new leader address akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2019-04-29 10:10:20,868 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager.
2019-04-29 10:10:20,869 INFO org.apache.flink.yarn.ApplicationClient - Received address of new leader akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2019-04-29 10:10:20,869 INFO org.apache.flink.yarn.ApplicationClient - Disconnect from JobManager null.
2019-04-29 10:10:20,872 INFO org.apache.flink.yarn.ApplicationClient - Trying to register at JobManager akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager.
2019-04-29 10:10:20,878 INFO org.apache.flink.yarn.ApplicationClient - Successfully registered at the ResourceManager using JobManager Actor[akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager#-153942343]
2019-04-29 10:10:21,888 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager.
2019-04-29 10:10:23,747 INFO org.apache.flink.yarn.YarnClusterClient - Application application_1556227576661_0231 finished with state FINISHED and final stateSUCCEEDED at 1556503821989
2019-04-29 10:10:23,747 INFO org.apache.flink.yarn.YarnClusterClient - YARN Client is shutting down
2019-04-29 10:10:23,911 INFO org.apache.flink.yarn.ApplicationClient - Stopped Application client.
2019-04-29 10:10:23,911 INFO org.apache.flink.yarn.ApplicationClient - Disconnect from JobManager Actor[akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager#-153942343].
2019-04-29 10:10:25.282 [main] ERROR c.a.e.f.a.j.l.impl.CommonShellJobLauncherImpl - [FNI-09F180DD19111D0F_0] Failed to execute command, exit code=1
2019-04-29 10:10:25.296 [main] INFO c.a.e.f.a.j.l.impl.CommonShellJobLauncherImpl - [FNI-09F180DD19111D0F_0] Finished command line, exit code=1.
Mon Apr 29 10:10:25 CST 2019 [JobLauncherRunner] INFO Closing job launcher ...
2019-04-29 10:10:25.298 [main] INFO c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] Closing ...
2019-04-29 10:10:25.298 [main] INFO c.a.e.f.a.j.l.impl.CommonShellJobLauncherImpl - [FNI-09F180DD19111D0F_0] Stopping command executor ...
Mon Apr 29 10:10:25 CST 2019 [YarnJobLauncherAM] INFO Closing launcher am ...
Mon Apr 29 10:10:25 CST 2019 [YarnJobLauncherAM] INFO Emr flow launcher is quit.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.doMain(YarnJobLauncherAM.java:72)
at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.main(YarnJobLauncherAM.java:137)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.aliyun.emr.flow.agent.jobs.launcher.JobLauncherRunner.run(JobLauncherRunner.java:59)
at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.launchJob(YarnJobLauncherAM.java:104)
at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.access$000(YarnJobLauncherAM.java:32)
at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM$1.run(YarnJobLauncherAM.java:75)
at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM$1.run(YarnJobLauncherAM.java:72)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
... 2 more
Caused by: com.aliyun.emr.flow.agent.common.exceptions.EmrFlowRuntimeException: ###[E10012,JOB]: Execute job FNI-09F180DD19111D0F_0 failed, exit code: 1, message: .
at com.aliyun.emr.flow.agent.common.utils.Throwables.propagate(Throwables.java:68)
at com.aliyun.emr.flow.agent.jobs.launcher.impl.CommonShellJobLauncherImpl.doLaunch(CommonShellJobLauncherImpl.java:221)
at com.aliyun.emr.flow.agent.jobs.launcher.impl.CommonShellJobLauncherImpl.launch(CommonShellJobLauncherImpl.java:207)
... 14 more
2019-04-29 10:10:25.613 [Shutdown-FNI-09F180DD19111D0F_0] INFO c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] Call shutdown hook.
2019-04-29 10:10:25.614 [Shutdown-FNI-09F180DD19111D0F_0] INFO c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] Closing ...
2019-04-29 10:10:25.614 [Shutdown-FNI-09F180DD19111D0F_0] INFO c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] This launcher is closed already, skip.
Flink参数设置slot数量增加,Flink无法启动的bug
/2019 14:07:03 counter(62/96) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 96, but only 25 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
at java.lang.Thread.run(Thread.java:748)
04/29/2019 14:07:03 counter(63/96) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 96, but only 26 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
at java.lang.Thread.run(Thread.java:748)
04/29/2019 14:07:03 timer1(57/96) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 96, but only 26 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
at java.lang.Thread.run(Thread.java:748)
04/29/2019 14:07:03 Job execution switched to status FAILING.
java.io.IOException: Insufficient number of network buffers: required 96, but only 25 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
at java.lang.Thread.run(Thread.java:748)```
解决:调整Flink里面flink-conf.yaml里面的新增参数增加可支持的slot数量
taskmanager.network.memory.fraction: 0.1
taskmanager.network.memory.min: 268435456
taskmanager.network.memory.max: 4294967296