探究 Flink on YARN 模式下 TaskManager 的内存分配

一、任务提交

我们使用如下的参数提交了Flink on YARN作业

flink run -m yarn-cluster -p 5 -yjm 3072 -ytm 4096 -ynm flink-test -d -c com.test.Test flink-test-1.0.0-SNAPSHOT.jar

二、查看内存情况

该作业启动了1个TaskManager,占用5个slot,并正常运行。来到该任务的Web界面,打开TaskManager页面,看看它的内存情况。

虽然我们在参数中设置了 TaskManager 的内存为4GB,但图上显示的JVM堆大小只有2.66GB,另外还有一项“Flink Managed Memory”为1.79GB。这些值都是怎么计算出来的呢?

三、TM内存分布

为了减少 object overhead,Flink主要采用序列化的方式存储各种对象。序列化存储的最小单位叫做 MemorySegment,底层为字节数组,大小由 taskmanager.memory.segment-size 参数指定,默认32KB大。下面分别介绍各块内存:

  • 网络缓存(Network Buffer):用于网络传输及与网络相关的动作(shuffle、广播等)的内存块,由MemorySegment组成。从Flink 1.5版本之后,网络缓存固定分配在堆外,这样可以充分利用零拷贝等技术。与它相关的三个参数及我们的设定值如下:
# 网络缓存占TM内存的默认比例,默认0.1
taskmanager.network.memory.fraction: 0.1
# 网络缓存的最小值和最大值 ,默认64MB和1GB
taskmanager.network.memory.min: 64mb
taskmanager.network.memory.max: 1gb

   

详见:flink 官网

  • 托管内存(Flink Managed Memory):用于所有Flink内部算子逻辑的内存分配,以及中间数据的存储,同样由MemorySegment组成,并通过Flink的MemoryManager组件管理。它默认在堆内分配,如果开启堆外内存分配的开关,也可以在堆内、堆外同时分配。与它相关的两个参数如下:
# 堆内托管内存占TM堆内内存的比例,默认0.7
taskmanager.memory.fraction: 0.7
# 是否允许分配堆外托管内存,默认不允许
taskmanager.memory.off-heap: false

详见:官网

Flink的内存管理不像Spark一样区分Storage和Execution内存,而是直接合二为一,更加灵活。

  • 空闲内存(Free):虽然名为空闲,但实际上是存储用户代码和数据结构的,固定在堆内,可以理解为堆内内存除去托管内存后剩下的那部分。

    如果我们想知道文章开头的问题中各块内存的大小是怎么来的,最好的办法自然是去读源码。下面以Flink 1.8.1源码为例来探索。

四、JobManager内存参数设置

JM启动时, 去掉yarn container占用内存大小, 即为JM大小。 container大小为25%*heap, 但是不得小于600MB。

jm heap = max[600, 3072*0.25] = 768

所以剩余JM heap大小为(3072-768)MB=2304MB。 

五、TaskManager内存分配逻辑

 org.apache.flink.client.deployment.ClusterSpecification:

/**
 * Description of the cluster to start by the {@link ClusterDescriptor}.
 */
public final class ClusterSpecification {
	private final int masterMemoryMB;
	private final int taskManagerMemoryMB;
	private final int numberTaskManagers;
	private final int slotsPerTaskManager;

	private ClusterSpecification(int masterMemoryMB, int taskManagerMemoryMB, int numberTaskManagers, int slotsPerTaskManager) {
		this.masterMemoryMB = masterMemoryMB;
		this.taskManagerMemoryMB = taskManagerMemoryMB;
		this.numberTaskManagers = numberTaskManagers;
		this.slotsPerTaskManager = slotsPerTaskManager;
	}
……
}

ClusterSpecification对象持有该集群的4个基本参数:JobManager内存大小、TaskManager内存大小、TaskManager数量、每个TaskManager的slot数。

org.apache.flink.runtime.clusterframework.ContaineredTaskManagerParameters:

calculateCutoffMB() 方法负责计算一个TM的YARN Container需要预留多少内存给TM之外的逻辑来使用。


	public static long calculateCutoffMB(Configuration config, long containerMemoryMB) {
		Preconditions.checkArgument(containerMemoryMB > 0);

		// (1) check cutoff ratio
		final float memoryCutoffRatio = config.getFloat(
			ResourceManagerOptions.CONTAINERIZED_HEAP_CUTOFF_RATIO);

		if (memoryCutoffRatio >= 1 || memoryCutoffRatio <= 0) {
			throw new IllegalArgumentException("The configuration value '"
				+ ResourceManagerOptions.CONTAINERIZED_HEAP_CUTOFF_RATIO.key() + "' must be between 0 and 1. Value given="
				+ memoryCutoffRatio);
		}

		// (2) check min cutoff value
		final int minCutoff = config.getInteger(
			ResourceManagerOptions.CONTAINERIZED_HEAP_CUTOFF_MIN);

		if (minCutoff >= containerMemoryMB) {
			throw new IllegalArgumentException("The configuration value '"
				+ ResourceManagerOptions.CONTAINERIZED_HEAP_CUTOFF_MIN.key() + "'='" + minCutoff
				+ "' is larger than the total container memory " + containerMemoryMB);
		}

		// (3) check between heap and off-heap
		long cutoff = (long) (containerMemoryMB * memoryCutoffRatio);
		if (cutoff < minCutoff) {
			cutoff = minCutoff;
		}
		return cutoff;
	}

该方法的执行流程如下:

  1. 获取containerized.heap-cutoff-ratio参数,它代表Container预留的非TM内存占设定的TM内存的比例,默认值0.25;
  2. 获取containerized.heap-cutoff-min参数,它代表Container预留的非TM内存的最小值,默认值600MB;
  3. 按比例计算预留内存,并保证结果不小于最小值。

由此可见,在Flink on YARN时,我们设定的TM内存实际上是Container的内存。也就是说,一个TM能利用的总内存(包含堆内和堆外):

tm_total_memory = taskmanager.heap.size - max[containerized.heap-cutoff-min, taskmanager.heap.size * containerized.heap-cutoff-ratio]

用文章开头给的参数实际计算一下:

tm_total_memory = 4096 - max[600, 4096 * 0.25] = 3072

接下来看TaskManagerServices.calculateHeapSizeMB()方法:


	public static long calculateHeapSizeMB(long totalJavaMemorySizeMB, Configuration config) {
		Preconditions.checkArgument(totalJavaMemorySizeMB > 0);

		// subtract the Java memory used for network buffers (always off-heap)
		final long networkBufMB =
			calculateNetworkBufferMemory(
				totalJavaMemorySizeMB << 20, // megabytes to bytes
				config) >> 20; // bytes to megabytes
		final long remainingJavaMemorySizeMB = totalJavaMemorySizeMB - networkBufMB;

		// split the available Java memory between heap and off-heap

		final boolean useOffHeap = config.getBoolean(TaskManagerOptions.MEMORY_OFF_HEAP);

		final long heapSizeMB;
		if (useOffHeap) {

			long offHeapSize;
			String managedMemorySizeDefaultVal = TaskManagerOptions.MANAGED_MEMORY_SIZE.defaultValue();
			if (!config.getString(TaskManagerOptions.MANAGED_MEMORY_SIZE).equals(managedMemorySizeDefaultVal)) {
				try {
					offHeapSize = MemorySize.parse(config.getString(TaskManagerOptions.MANAGED_MEMORY_SIZE), MEGA_BYTES).getMebiBytes();
				} catch (IllegalArgumentException e) {
					throw new IllegalConfigurationException(
						"Could not read " + TaskManagerOptions.MANAGED_MEMORY_SIZE.key(), e);
				}
			} else {
				offHeapSize = Long.valueOf(managedMemorySizeDefaultVal);
			}

			if (offHeapSize <= 0) {
				// calculate off-heap section via fraction
				double fraction = config.getFloat(TaskManagerOptions.MANAGED_MEMORY_FRACTION);
				offHeapSize = (long) (fraction * remainingJavaMemorySizeMB);
			}

			TaskManagerServicesConfiguration
				.checkConfigParameter(offHeapSize < remainingJavaMemorySizeMB, offHeapSize,
					TaskManagerOptions.MANAGED_MEMORY_SIZE.key(),
					"Managed memory size too large for " + networkBufMB +
						" MB network buffer memory and a total of " + totalJavaMemorySizeMB +
						" MB JVM memory");

			heapSizeMB = remainingJavaMemorySizeMB - offHeapSize;
		} else {
			heapSizeMB = remainingJavaMemorySizeMB;
		}

		return heapSizeMB;
	}

为了简化问题及符合我们的实际应用,就不考虑开启堆外托管内存的情况了。这里涉及到了计算Network buffer大小的方法。


	public static long calculateNetworkBufferMemory(TaskManagerServicesConfiguration tmConfig, long maxJvmHeapMemory) {
		final NetworkEnvironmentConfiguration networkConfig = tmConfig.getNetworkConfig();

		final float networkBufFraction = networkConfig.networkBufFraction();
		final long networkBufMin = networkConfig.networkBufMin();
		final long networkBufMax = networkConfig.networkBufMax();

		if (networkBufMin == networkBufMax) {
			// fixed network buffer pool size
			return networkBufMin;
		}

		// relative network buffer pool size using the fraction...

		// The maximum heap memory has been adjusted as in
		// calculateHeapSizeMB(long totalJavaMemorySizeMB, Configuration config))
		// and we need to invert these calculations.

		final MemoryType memType = tmConfig.getMemoryType();

		final long jvmHeapNoNet;
		if (memType == MemoryType.HEAP) {
			jvmHeapNoNet = maxJvmHeapMemory;
		} else if (memType == MemoryType.OFF_HEAP) {

			// check if a value has been configured
			long configuredMemory = tmConfig.getConfiguredMemory() << 20; // megabytes to bytes

			if (configuredMemory > 0) {
				// The maximum heap memory has been adjusted according to configuredMemory, i.e.
				// maxJvmHeap = jvmHeapNoNet - configuredMemory

				jvmHeapNoNet = maxJvmHeapMemory + configuredMemory;
			} else {
				// The maximum heap memory has been adjusted according to the fraction, i.e.
				// maxJvmHeap = jvmHeapNoNet - jvmHeapNoNet * managedFraction = jvmHeapNoNet * (1 - managedFraction)

				final float managedFraction = tmConfig.getMemoryFraction();
				jvmHeapNoNet = (long) (maxJvmHeapMemory / (1.0 - managedFraction));
			}
		} else {
			throw new RuntimeException("No supported memory type detected.");
		}

		// finally extract the network buffer memory size again from:
		// jvmHeapNoNet = jvmHeap - networkBufBytes
		//              = jvmHeap - Math.min(networkBufMax, Math.max(networkBufMin, jvmHeap * netFraction)
		final long networkBufBytes = Math.min(networkBufMax, Math.max(networkBufMin,
			(long) (jvmHeapNoNet / (1.0 - networkBufFraction) * networkBufFraction)));

		TaskManagerServicesConfiguration
			.checkConfigParameter(networkBufBytes < maxJvmHeapMemory,
				"(" + networkBufFraction + ", " + networkBufMin + ", " + networkBufMax + ")",
				"(" + TaskManagerOptions.NETWORK_BUFFERS_MEMORY_FRACTION.key() + ", " +
					TaskManagerOptions.NETWORK_BUFFERS_MEMORY_MIN.key() + ", " +
					TaskManagerOptions.NETWORK_BUFFERS_MEMORY_MAX.key() + ")",
				"Network buffer memory size too large: " + networkBufBytes + " >= " +
					maxJvmHeapMemory + "(maximum JVM heap size)");

		return networkBufBytes;
	}

由此可见,网络缓存的大小这样确定:

network_buffer_memory = min[taskmanager.network.memory.max, taskmanager.network.memory.min, tm_total_memory * taskmanager.network.memory.fraction)]

代入参数:

network_buffer_memory = min[1024, max(64, 3072 * 0.1)] = 307.2

也就是说,TM真正使用的堆内内存为:

tm_heap_memory = tm_total_memory - network_buffer_memory = 3072 - 307.2 = 2764.8

同理,可以看一下TaskManager UI中的网络缓存MemorySegment计数:

9425 * 32kb / 1024 = 294.53125 mb

通过计算得知,网络缓存的实际值与上面算出来的network_buffer_memory值是非常接近的。

那么堆内托管内存的值是怎么计算出来的呢?前面提到了托管内存由MemoryManager管理,来看看TaskManagerServices.createMemoryManager()方法,它用设定好的参数来初始化一个MemoryManager。

private static MemoryManager createMemoryManager(
			TaskManagerServicesConfiguration taskManagerServicesConfiguration,
			long freeHeapMemoryWithDefrag,
			long maxJvmHeapMemory) throws Exception {
		// computing the amount of memory to use depends on how much memory is available
		// it strictly needs to happen AFTER the network stack has been initialized

		// check if a value has been configured
		long configuredMemory = taskManagerServicesConfiguration.getConfiguredMemory();

		MemoryType memType = taskManagerServicesConfiguration.getMemoryType();

		final long memorySize;

		boolean preAllocateMemory = taskManagerServicesConfiguration.isPreAllocateMemory();

		if (configuredMemory > 0) {
			if (preAllocateMemory) {
				LOG.info("Using {} MB for managed memory." , configuredMemory);
			} else {
				LOG.info("Limiting managed memory to {} MB, memory will be allocated lazily." , configuredMemory);
			}
			memorySize = configuredMemory << 20; // megabytes to bytes
		} else {
			// similar to #calculateNetworkBufferMemory(TaskManagerServicesConfiguration tmConfig)
			float memoryFraction = taskManagerServicesConfiguration.getMemoryFraction();

			if (memType == MemoryType.HEAP) {
				// network buffers allocated off-heap -> use memoryFraction of the available heap:
				long relativeMemSize = (long) (freeHeapMemoryWithDefrag * memoryFraction);
				if (preAllocateMemory) {
					LOG.info("Using {} of the currently free heap space for managed heap memory ({} MB)." ,
						memoryFraction , relativeMemSize >> 20);
				} else {
					LOG.info("Limiting managed memory to {} of the currently free heap space ({} MB), " +
						"memory will be allocated lazily." , memoryFraction , relativeMemSize >> 20);
				}
				memorySize = relativeMemSize;
			} else if (memType == MemoryType.OFF_HEAP) {
				// The maximum heap memory has been adjusted according to the fraction (see
				// calculateHeapSizeMB(long totalJavaMemorySizeMB, Configuration config)), i.e.
				// maxJvmHeap = jvmTotalNoNet - jvmTotalNoNet * memoryFraction = jvmTotalNoNet * (1 - memoryFraction)
				// directMemorySize = jvmTotalNoNet * memoryFraction
				long directMemorySize = (long) (maxJvmHeapMemory / (1.0 - memoryFraction) * memoryFraction);
				if (preAllocateMemory) {
					LOG.info("Using {} of the maximum memory size for managed off-heap memory ({} MB)." ,
						memoryFraction, directMemorySize >> 20);
				} else {
					LOG.info("Limiting managed memory to {} of the maximum memory size ({} MB)," +
						" memory will be allocated lazily.", memoryFraction, directMemorySize >> 20);
				}
				memorySize = directMemorySize;
			} else {
				throw new RuntimeException("No supported memory type detected.");
			}
		}

		// now start the memory manager
		final MemoryManager memoryManager;
		try {
			memoryManager = new MemoryManager(
				memorySize,
				taskManagerServicesConfiguration.getNumberOfSlots(),
				taskManagerServicesConfiguration.getNetworkConfig().networkBufferSize(),
				memType,
				preAllocateMemory);
		} catch (OutOfMemoryError e) {
			if (memType == MemoryType.HEAP) {
				throw new Exception("OutOfMemory error (" + e.getMessage() +
					") while allocating the TaskManager heap memory (" + memorySize + " bytes).", e);
			} else if (memType == MemoryType.OFF_HEAP) {
				throw new Exception("OutOfMemory error (" + e.getMessage() +
					") while allocating the TaskManager off-heap memory (" + memorySize +
					" bytes).Try increasing the maximum direct memory (-XX:MaxDirectMemorySize)", e);
			} else {
				throw e;
			}
		}
		return memoryManager;
	}

简要叙述一下流程:

  1. 获取taskmanager.memory.size参数,用来确定托管内存的绝对大小;
  2. 如果taskmanager.memory.size未设置,就继续获取前面提到过的taskmanager.memory.fraction参数;
  3. 只考虑堆内内存的情况,调用TaskManagerServicesConfiguration.getFreeHeapMemoryWithDefrag()方法,先主动触发GC,然后获取可用的堆内存量。
  4. 计算托管内存大小和其他参数,返回MemoryManager实例。

一般来讲我们都不会简单粗暴地设置taskmanager.memory.size。所以:

flink_managed_memory = tm_heap_memory * taskmanager.memory.fraction = 2764.8 * 0.7 = 1935.36 MB = 1.89 GB 

这就是TaskManager UI中显示的托管内存大小了。

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

magic_kid_2010

你的支持将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值