在生产环境中,经常会遇到 Flink 任务因为各种原因发生 failover ,发生 failover 的原因非常多,今天就来总结一下比较常见的几种情况,以及应该怎么解决它们。
资源不足问题
2022-02-09 10:22:18
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 14000, slots allocated: 13965
at org.apache.flink.runtime.executiongraph.SchedulingUtils.lambda$scheduleEager$3(SchedulingUtils.java:245)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:633