Sum of configured Framework Heap Memory exceed configured Total Flink Memory
Exception in thread "main" org.apache.flink.configuration.IllegalConfigurationException: TaskManager memory configuration failed: Sum of configured Framework Heap Memory (128.000mb (134217728 bytes)), Framework Off-Heap Memory (128.000mb (134217728 bytes)), Task Off-Heap Memory (0 bytes), Managed Memory (512.000mb (536870920 bytes)) and Network Memory (1024.000mb (1073741824 bytes)) exceed configured Total Flink Memory (1.250gb (1342177280 bytes)).
at org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:163)
at org.apache.flink.runtime.util.bash.BashJavaUtils.getTmResourceParams(BashJavaUtils.java:85)
at org.apache.flink.runtime.util.bash.BashJavaUtils.runCommand(BashJavaUtils.java:67)
at org.apache.flink.runtime.util.bash.BashJavaUtils.main(BashJavaUtils.java:56)
Caused by: org.apache.flink.configuration.IllegalConfigurationException: Sum of configured Framework Heap Memory (128.000mb (134217728 bytes)), Framework Off-Heap Memory (128.000mb (134217728 bytes)), Task Off-Heap Memory (0 bytes), Managed Memory (512.000mb (536870920 bytes)) and Network Memory (1024.000mb (1073741824 bytes)) exceed configured Total Flink Memory (1.250gb (1342177280 bytes)).
at org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:178)
at org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:42)
at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:119)
at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:84)
at org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:160)
... 3 more
task内存模型如下图所示(来自flink的ui),所以设置内存时要满足这些关系。上面的报错可以看到是因为network的内存太大,导致总的内存超过total process memory
- Caused by: java.io.IOException: Insufficient number of network buffers:
Caused by: java.io.IOException: Insufficient number of network buffers: required 1792, but only 0 available. The total number of network buffers is currently set to 4096 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and 'taskmanager.memory.network.max'.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.internalCreateBufferPool(NetworkBufferPool.java:416)
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:384)
at com.alibaba.flink.shuffle.plugin.transfer.RemoteShuffleResultPartitionFactory.lambda$createBufferPoolFactory$0(RemoteShuffleResultPartitionFactory.java:189)
at org.apache.flink.runtime.io.network.partition.ResultPartition.setup(ResultPartition.java:151)
at com.alibaba.flink.shuffle.plugin.transfer.RemoteShuffleResultPartition.setup(RemoteShuffleResultPartition.java:112)
at org.apache.flink.runtime.taskmanager.Task.setupPartitionsAndGates(Task.java:969)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:664)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:748)
解决方案
说明需要的network buffer很多,但是现在没有资源,那么需要增大taskmanager.memory.network.fraction taskmanager.memory.network.min taskmanager.memory.network.max 这三个值,比如调大之后,如下所示。
注意点
- taskmanager.memory.network.max也不要超过最大的内存限制,要按照上面的内存模型设置
- taskmanager.memory.network.fraction这个值调大,如果内存还是不够,说明总的heap.size就比较小,就要调整taskmanager的heap大小,比如设置:taskmanager.heap.size: 2048m
taskmanager.memory.network.fraction: 0.4
taskmanager.memory.network.min: 1024mb
taskmanager.memory.network.max: 2048mb