Flink异常问题提总结

Flink在执行过程中突然异常退出

Sink: time-kafka(1/1) switched to SCHEDULED
04/29/2019 10:10:20     Job execution switched to status FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase thenumber of slots per TaskManager in the configuration. Task to schedule: < Attempt #10 (Source: source -> (Filter, Timestamps/Watermarks -> Filter) (12/12)) @ (unassigned) - [SCHEDULED] > with groupID < d460da9a057758d795825417554f0e72 > in sharing group < SlotSharingGroup [d460da9a057758d795825417554f0e72, 0f5d1bbb1c312ef7bcca697263389b15, 3b928584ed2bd5c041cea2f3dba3aa0e, a57d18a89c6c239247f95ebb9819ce1e, dabc4aa3951942f45c2de75c800930c3] >. Resources available to scheduler: Number of instances=11, total number of slots=11, available slots=0
        at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:263)
        at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:142)
        at org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$1(Execution.java:440)
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
        at org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:438)
        at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:503)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:891)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:845)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1193)
        at org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59)
        at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(1/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(2/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(3/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(4/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(5/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(6/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(7/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(8/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(9/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(10/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(11/12) switched to CANCELED
04/29/2019 10:10:20     Source: source -> (Filter, Timestamps/Watermarks -> Filter)(12/12) switched to CANCELED
04/29/2019 10:10:20     counter(1/12) switched to CANCELED
04/29/2019 10:10:20     counter(2/12) switched to CANCELED
04/29/2019 10:10:20     counter(3/12) switched to CANCELED
04/29/2019 10:10:20     counter(4/12) switched to CANCELED
04/29/2019 10:10:20     counter(5/12) switched to CANCELED
04/29/2019 10:10:20     counter(6/12) switched to CANCELED
04/29/2019 10:10:20     counter(7/12) switched to CANCELED
04/29/2019 10:10:20     counter(8/12) switched to CANCELED
04/29/2019 10:10:20     counter(9/12) switched to CANCELED
04/29/2019 10:10:20     counter(10/12) switched to CANCELED
04/29/2019 10:10:20     counter(11/12) switched to CANCELED
04/29/2019 10:10:20     counter(12/12) switched to CANCELED
04/29/2019 10:10:20     Sink: counter-kafka(1/1) switched to CANCELED
04/29/2019 10:10:20     timer1(1/12) switched to CANCELED
04/29/2019 10:10:20     timer1(2/12) switched to CANCELED
04/29/2019 10:10:20     timer1(3/12) switched to CANCELED
04/29/2019 10:10:20     timer1(4/12) switched to CANCELED
04/29/2019 10:10:20     timer1(5/12) switched to CANCELED
04/29/2019 10:10:20     timer1(6/12) switched to CANCELED
04/29/2019 10:10:20     timer1(7/12) switched to CANCELED
04/29/2019 10:10:20     timer1(8/12) switched to CANCELED
04/29/2019 10:10:20     timer1(9/12) switched to CANCELED
04/29/2019 10:10:20     timer1(10/12) switched to CANCELED
04/29/2019 10:10:20     timer1(11/12) switched to CANCELED
04/29/2019 10:10:20     timer1(12/12) switched to CANCELED
04/29/2019 10:10:20     Sink: time-kafka(1/1) switched to CANCELED
04/29/2019 10:10:20     Job execution switched to status FAILED.
2019-04-29 10:10:20,666 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master
2019-04-29 10:10:20,666 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.
2019-04-29 10:10:20,859 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2019-04-29 10:10:20,868 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.
2019-04-29 10:10:20,869 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2019-04-29 10:10:20,869 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.
2019-04-29 10:10:20,872 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager.
2019-04-29 10:10:20,878 INFO  org.apache.flink.yarn.ApplicationClient                       - Successfully registered at the ResourceManager using JobManager Actor[akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager#-153942343]
2019-04-29 10:10:21,888 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.
2019-04-29 10:10:23,747 INFO  org.apache.flink.yarn.YarnClusterClient                       - Application application_1556227576661_0231 finished with state FINISHED and final stateSUCCEEDED at 1556503821989
2019-04-29 10:10:23,747 INFO  org.apache.flink.yarn.YarnClusterClient                       - YARN Client is shutting down
2019-04-29 10:10:23,911 INFO  org.apache.flink.yarn.ApplicationClient                       - Stopped Application client.
2019-04-29 10:10:23,911 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager Actor[akka.tcp://flink@emr-worker-3.cluster-70637:36513/user/jobmanager#-153942343].
2019-04-29 10:10:25.282 [main] ERROR c.a.e.f.a.j.l.impl.CommonShellJobLauncherImpl - [FNI-09F180DD19111D0F_0] Failed to execute command, exit code=1
2019-04-29 10:10:25.296 [main] INFO  c.a.e.f.a.j.l.impl.CommonShellJobLauncherImpl - [FNI-09F180DD19111D0F_0] Finished command line, exit code=1.
Mon Apr 29 10:10:25 CST 2019 [JobLauncherRunner] INFO Closing job launcher ...
2019-04-29 10:10:25.298 [main] INFO  c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] Closing ...
2019-04-29 10:10:25.298 [main] INFO  c.a.e.f.a.j.l.impl.CommonShellJobLauncherImpl - [FNI-09F180DD19111D0F_0] Stopping command executor ...
Mon Apr 29 10:10:25 CST 2019 [YarnJobLauncherAM] INFO Closing launcher am ...
Mon Apr 29 10:10:25 CST 2019 [YarnJobLauncherAM] INFO Emr flow launcher is quit.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.doMain(YarnJobLauncherAM.java:72)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.main(YarnJobLauncherAM.java:137)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.aliyun.emr.flow.agent.jobs.launcher.JobLauncherRunner.run(JobLauncherRunner.java:59)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.launchJob(YarnJobLauncherAM.java:104)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM.access$000(YarnJobLauncherAM.java:32)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM$1.run(YarnJobLauncherAM.java:75)
        at com.aliyun.emr.flow.agent.jobs.launcher.yarn.YarnJobLauncherAM$1.run(YarnJobLauncherAM.java:72)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        ... 2 more
Caused by: com.aliyun.emr.flow.agent.common.exceptions.EmrFlowRuntimeException: ###[E10012,JOB]:  Execute job FNI-09F180DD19111D0F_0 failed, exit code: 1, message: .
        at com.aliyun.emr.flow.agent.common.utils.Throwables.propagate(Throwables.java:68)
        at com.aliyun.emr.flow.agent.jobs.launcher.impl.CommonShellJobLauncherImpl.doLaunch(CommonShellJobLauncherImpl.java:221)
        at com.aliyun.emr.flow.agent.jobs.launcher.impl.CommonShellJobLauncherImpl.launch(CommonShellJobLauncherImpl.java:207)
        ... 14 more
2019-04-29 10:10:25.613 [Shutdown-FNI-09F180DD19111D0F_0] INFO  c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] Call shutdown hook.
2019-04-29 10:10:25.614 [Shutdown-FNI-09F180DD19111D0F_0] INFO  c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] Closing ...
2019-04-29 10:10:25.614 [Shutdown-FNI-09F180DD19111D0F_0] INFO  c.a.emr.flow.agent.jobs.launcher.JobLauncherBase - [FNI-09F180DD19111D0F_0] This launcher is closed already, skip.

Flink参数设置slot数量增加,Flink无法启动的bug

/2019 14:07:03     counter(62/96) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 96, but only 25 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
        at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
        at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
        at java.lang.Thread.run(Thread.java:748)

04/29/2019 14:07:03     counter(63/96) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 96, but only 26 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
        at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
        at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
        at java.lang.Thread.run(Thread.java:748)

04/29/2019 14:07:03     timer1(57/96) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 96, but only 26 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
        at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
        at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
        at java.lang.Thread.run(Thread.java:748)

04/29/2019 14:07:03     Job execution switched to status FAILING.
java.io.IOException: Insufficient number of network buffers: required 96, but only 25 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
        at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
        at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:235)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
        at java.lang.Thread.run(Thread.java:748)```
解决:调整Flink里面flink-conf.yaml里面的新增参数增加可支持的slot数量

taskmanager.network.memory.fraction: 0.1
taskmanager.network.memory.min: 268435456
taskmanager.network.memory.max: 4294967296
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值