今天,我对 Flink ON YARN 集群的内存进行下调整。我调整了 Container 容器最小的分配内存,导致了一系列的问题,这里记录一下。
(1)yarn.nodemanager.resource.memory-mb
表示该节点上YARN可使用的物理内存总量,默认是8192(MB),注意,如果你的节点内存资源不够8GB,则需要调减小这个值,而YARN不会智能的探测节点的物理内存总量。(2)yarn.nodemanager.vmem-pmem-ratio
任务每使用1MB物理内存,最多可使用虚拟内存量,默认是2.1。(3) yarn.nodemanager.pmem-check-enabled
是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true。(4) yarn.nodemanager.vmem-check-enabled
是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true。(5)yarn.scheduler.minimum-allocation-mb
单个container可申请的最少物理内存量,默认是1024(MB),如果一个任务申请的物理内存量少于该值,则该对应的值改为这个数。(6)yarn.scheduler.maximum-allocation-mb
单个container可申请的最多物理内存量,默认是8192(MB)。
这里,我们调整了 yarn.scheduler.minimum-allocation-mb , 单个container 可以申请的最小的物理内存量。由 1024M 默认值调整为 256M 。
这个是因为我们的 jobmanger 使用 64M , 而 taskmanager 使用 512M 。 这样会导致浪费一部分的内存。因此,我进行了相应的调整。
但是调整之后,导致了新的问题,在启动 yarn-session 任务的时候,报了如下的错误 :
Caused by: java.lang.IllegalArgumentException: The configuration value 'containerized.heap-cutoff-min' is higher (600) than the requested amount of memory 256
完整报错 :
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:381)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:548)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
Caused by: java.lang.IllegalArgumentException: The configuration value 'containerized.heap-cutoff-min' is higher (600) than the requested amount of memory 256
at org.apache.flink.runtime.clusterframework.BootstrapTools.calculateHeapSize(BootstrapTools.java:704)
at org.apache.flink.yarn.YarnClusterDescriptor.setupApplicationMasterContainer(YarnClusterDescriptor.java:1544)
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:895)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:374)
... 7 more
修改 conf/flink-conf.yaml
增加如下配置 :
#allocate by mb
containerized.heap-cutoff-min: 64
这个值表示 承载 TaskManager 的 YARN Container 需要预留多少内存给TaskManager 之外的逻辑来使用。
我们看下源代码:
参考文章:
https://www.jianshu.com/p/4e4c188f5d7b
接下来首先调用ContaineredTaskManagerParameters.calculateCutoffMB()方法,它负责计算一个承载TM的YARN Container需要预留多少内存给TM之外的逻辑来使用。
public static long calculateCutoffMB(Configuration config, long containerMemoryMB) {
Preconditions.checkArgument(containerMemoryMB > 0);
// (1) check cutoff ratio
final float memoryCutoffRatio = config.getFloat(
ResourceManagerOptions.CONTAINERIZED_HEAP_CUTOFF_RATIO);
if (memoryCutoffRatio >= 1 || memoryCutoffRatio <= 0) {
throw new IllegalArgumentException("The configuration value '"
+ ResourceManagerOptions.CONTAINERIZED_HEAP_CUTOFF_RATIO.key() + "' must be between 0 and 1. Value given="
+ memoryCutoffRatio);
}
// (2) check min cutoff value
final int minCutoff = config.getInteger(
ResourceManagerOptions.CONTAINERIZED_HEAP_CUTOFF_MIN);
if (minCutoff >= containerMemoryMB) {
throw new IllegalArgumentException("The configuration value '"
+ ResourceManagerOptions.CONTAINERIZED_HEAP_CUTOFF_MIN.key() + "'='" + minCutoff
+ "' is larger than the total container memory " + containerMemoryMB);
}
// (3) check between heap and off-heap
long cutoff = (long) (containerMemoryMB * memoryCutoffRatio);
if (cutoff < minCutoff) {
cutoff = minCutoff;
}
return cutoff;
}
该方法的执行流程如下:
- 获取containerized.heap-cutoff-ratio参数,它代表Container预留的非TM内存占设定的TM内存的比例,默认值0.25;
- 获取containerized.heap-cutoff-min参数,它代表Container预留的非TM内存的最小值,默认值600MB;
- 按比例计算预留内存,并保证结果不小于最小值。
由此可见,在Flink on YARN时,我们设定的TM内存实际上是Container的内存。也就是说,一个TM能利用的总内存(包含堆内和堆外)是:
tm_total_memory = taskmanager.heap.size - max[containerized.heap-cutoff-min, taskmanager.heap.size * containerized.heap-cutoff-ratio]
用文章开头给的参数实际计算一下:
tm_total_memory = 4096 - max[600, 4096 * 0.25] = 3072
接下来看TaskManagerServices.calculateHeapSizeMB()方法。
public static long calculateHeapSizeMB(long totalJavaMemorySizeMB, Configuration config) {
Preconditions.checkArgument(totalJavaMemorySizeMB > 0);
// all values below here are in bytes
final long totalProcessMemory = megabytesToBytes(totalJavaMemorySizeMB);
final long networkReservedMemory = getReservedNetworkMemory(config, totalProcessMemory);
final long heapAndManagedMemory = totalProcessMemory - networkReservedMemory;
if (config.getBoolean(TaskManagerOptions.MEMORY_OFF_HEAP)) {
final long managedMemorySize = getManagedMemoryFromHeapAndManaged(config, heapAndManagedMemory);
ConfigurationParserUtils.checkConfigParameter(managedMemorySize < heapAndManagedMemory, managedMemorySize,
TaskManagerOptions.MANAGED_MEMORY_SIZE.key(),
"Managed memory size too large for " + (networkReservedMemory >> 20) +
" MB network buffer memory and a total of " + totalJavaMemorySizeMB +
" MB JVM memory");
return bytesToMegabytes(heapAndManagedMemory - managedMemorySize);
}
else {
return bytesToMegabytes(heapAndManagedMemory);
}
}
除此之外,我还调整了
容器内存增量
yarn.scheduler.increment-allocation-mb = 512M
最终我启动了 yarn-session 模式的集群,并且启动后运行 streaming job 并未产生问题
总结
containerized.heap-cutoff-min 我们应该留有充足的 containerized.heap-cutoff-min 空间 (TaskManager 的 YARN Container 需要预留多少内存给TaskManager 之外的逻辑来使用。)
为什么这么说,我们看一个图就知道了
由此图可以看出来 Flink 的JobManager 与 YARN AppMaster 运行在同一个容器中,都需要有各自的空间。所以 container 容器的大小至少是 Flink 的JobManager 和 YARN AppMaster 的内存之和。
其中 YARN AppMaster 主要负责申请容器