在谈及Flink的内存管理时, 必须要回答一个问题. Flink为什么需要自己管理内存, 而不是使用JVM来管理内存?
现在通常的回答是: 1) jvm存储对象密度低. 2) full gc 会极大的影响性能, 可能达到秒级甚至分钟级.
3) OOM问题等.
1. JobManager/TaskManager的内存模型
源码中的位置为:
org.apache.flink.configuration.TaskManagerOptions
org.apache.flink.configuration.JobManagerOptions
2. 内存分配源码解读
2.1 JobManager的内存源码解读
org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(){
...
final JobManagerProcessSpec processSpec = JobManagerProcessUtils.processSpecFromConfigWithNewOptionToInterpretLegacyHeap(
flinkConfiguration,
JobManagerOptions.TOTAL_PROCESS_MEMORY); //设置总进程内存
final ContainerLaunchContext amContainer = setupApplicationMasterContainer(
yarnClusterEntrypoint,
hasKrb5,
processSpec);
...
}
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(){
if (options.getRequiredFineGrainedOptions().stream().allMatch(config::contains)) { //如果设置了 flink使用内存 和 总进程内存
// all internal memory options are configured, use these to derive total Flink and process memory
return deriveProcessSpecWithExplicitInternalMemory(config);
} else if (config.contains(options.getTotalFlinkMemoryOption())) { // 如果设置了flink使用内存
// internal memory options are not configured, total Flink memory is configured,
// derive from total flink memory
return deriveProcessSpecWithTotalFlinkMemory(config);
} else if (config.contains(options.getTotalProcessMemoryOption())) {
// total Flink memory is not configured, total process memory is configured,
// derive from total process memory 如果设置了总进程内存
return deriveProcessSpecWithTotalProcessMemory(config);
}
return failBecauseRequiredOptionsNotConfigured();
}
//各项内存的分配均在org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.
2.2 TaskManager的内存源码解读
org.apache.flink.yarn.YarnResourceManager.startTaskExecutorInContainer(){
...
try {
// Context information used to start a TaskExecutor Java process
ContainerLaunchContext taskExecutorLaunchContext = createTaskExecutorLaunchContext(
resourceId.toString(),
container.getNodeId().getHost(),
TaskExecutorProcessUtils.processSpecFromWorkerResourceSpec(flinkConfig, workerResourceSpec)); //此处是设置内存的地方, k8s中设置此处的类是org.apache.flink.kubernetes.KubernetesResourceManager
nodeManagerClient.startContainerAsync(container, taskExecutorLaunchContext);
} catch (Throwable t) {
releaseFailedContainerAndRequestNewContainerIfRequired(container.getId(), t);
}
}
org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromWorkerResourceSpec(){
final Configuration config, final WorkerResourceSpec workerResourceSpec) {
final MemorySize frameworkHeapMemorySize = TaskExecutorFlinkMemoryUtils.getFrameworkHeapMemorySize(config);
final MemorySize frameworkOffHeapMemorySize = TaskExecutorFlinkMemoryUtils.getFrameworkOffHeapMemorySize(config);
final TaskExecutorFlinkMemory flinkMemory = new TaskExecutorFlinkMemory(
frameworkHeapMemorySize,
frameworkOffHeapMemorySize,
workerResourceSpec.getTaskHeapSize(),
workerResourceSpec.getTaskOffHeapSize(),
workerResourceSpec.getNetworkMemSize(),
workerResourceSpec.getManagedMemSize());
final JvmMetaspaceAndOverhead jvmMetaspaceAndOverhead =
PROCESS_MEMORY_UTILS.deriveJvmMetaspaceAndOverheadFromTotalFlinkMemory(
config, flinkMemory.getTotalFlinkMemorySize());
return new TaskExecutorProcessSpec(workerResourceSpec.getCpuCores(), flinkMemory, jvmMetaspaceAndOverhead);
}
2.3 内存释放源码解读
flink自行管理内存, 那么必然有内存的释放. 触发堆外内存释放的条件有: 内存使用完毕或者task停止执行(包括异常停止和正常停止)
org.apache.flink.util.JavaGcCleanerWrapper
//提供了两种释放内存的类, 一种是java9以前, 一种是java9及之后
createLegacyCleanerProvider() {// java9之前
...
String cleanerClassName = "sun.misc.Cleaner";
...
}
createJava9CleanerProvider(){ // java9之后
...
String cleanerClassName = "java.lang.ref.Cleaner";
...
}
// JavaGcCleanerWrapper会为每个Owner创建一个包含Cleaner的Runnable对象, 在每个MemorySegment释放内存时,调用此Cleaner进行内存释放.
//当MemorySegment关闭时, 会对所有申请的MemorySegment进行释放, 交还给操作系统.
org.apache.flink.core.memory.MemorySegmentFactory.allocateOffHeapUnsafeMemory(){
long address = MemoryUtils.allocateUnsafe(size);
ByteBuffer offHeapBuffer = MemoryUtils.wrapUnsafeMemoryWithByteBuffer(address, size);
//此处就是在创建或者获取GcCleaner
MemoryUtils.createMemoryGcCleaner(offHeapBuffer, address, customCleanupAction);
//目前flink使用MemorySegment均为HybridMemorySegment
return new HybridMemorySegment(offHeapBuffer, owner);
}
3. 内存结构
flink的内存管理分为内存段MemorySegment 和 内存页.
MemorySegment 内存段 flink最小内存分配单元,默认值为32KB
分为 HybridMemorySegment(包含有HeapMemorySegment所有的功能和方法) 和 HeapMemorySegment(1.7之后就没有再使用)
内存页 是内存段的更高级抽象, 屏蔽了对内存段的操作细节,无须关心MemorySegment的细节,该层自动处理跨MemorySegment的读取和写入
为 DataInputView 和 DataOutputView. DataInputView是从MemorySegment读取抽象视图. DataOutputView是数据写入MemorySegment的抽象视图.
DataStream操作基本都是基于DataInputView 和DataOutputView.