MapReduce内存参数自动推断。在Hadoop 2.0中,为MapReduce作业设置内存参数非常繁琐,涉及到两个参数:mapreduce.{map,reduce}.memory.mb和mapreduce.{map,reduce}.java.opts,一旦设置不合理,则会使得内存资源浪费严重,比如将前者设置为4096MB,但后者却是“-Xmx2g”,则剩余2g实际上无法让java heap使用到。
对应patch MAPREDUCE-5785
相关知识
mapreduce.map.java.opts和mapreduce.map.memory.mb
mapreduce.map.java.opts和mapreduce.map.memory.mb参数之间,有什么联系呢?
mapreduce.map.memory.mb 是task 所申请container的内存限制。mapreduce.{map|reduce}.java.opts 是在container中运行 jvm的限制。
在yarn container这种模式下,JVM进程跑在container中,mapreduce.{map|reduce}.java.opts能够通过Xmx设置JVM最大的heap的使用,一般设置为0.75倍的memory.mb,因为需要为java code,非JVM内存使用等预留些空间
具体逻辑
mapreduce.map/reduce.memory.mb键的内存值如果保留为默认值-1,则现在将自动从为mapreduce.map/reduce.java.opts键指定的堆大小值系统属性(-Xmx)推断。
反之亦然,即如果指定了mapreduce.map/reduce.memory.mb值,但没有为 mapreduce.map/reduce.java.opts键提供-Xmx,则-Xmx值将从前者的值派生。
I
如果两者都未指定,mapreduce.map/reduce.memory.mb 则使用默认值1024 MB。
对于这两种转换,使用属性mapreduce.job.heap.memory-mb.ratio(默认是0.8)指定的比例因子,以说明堆使用与实际物理内存使用之间的开销。已显式指定这两组属性的现有任务或作业代码将不受此推断更改的影响。
公式
mapreduce.map/reduce.memory.mb *mapreduce.job.heap.memory-mb.ratio =mapreduce.map/reduce.java.opts
参数
<property>
<name>mapreduce.job.heap.memory-mb.ratio</name>
<value>0.8</value>
<description>The ratio of heap-size to container-size. If no -Xmx is
specified, it is calculated as
(mapreduce.{map|reduce}.memory.mb * mapreduce.heap.memory-mb.ratio).
If -Xmx is specified but not mapreduce.{map|reduce}.memory.mb, it is
calculated as (heapSize / mapreduce.heap.memory-mb.ratio).
</description>
</property>
主要代码
public String getTaskJavaOpts(TaskType taskType) {
String javaOpts = getConfiguredTaskJavaOpts(taskType);
if (!javaOpts.contains("-Xmx")) {
float heapRatio = getFloat(MRJobConfig.HEAP_MEMORY_MB_RATIO,
MRJobConfig.DEFAULT_HEAP_MEMORY_MB_RATIO);
if (heapRatio > 1.0f || heapRatio < 0) {
LOG.warn("Invalid value for " + MRJobConfig.HEAP_MEMORY_MB_RATIO
+ ", using the default.");
heapRatio = MRJobConfig.DEFAULT_HEAP_MEMORY_MB_RATIO;
}
int taskContainerMb = getMemoryRequired(taskType);
int taskHeapSize = (int)Math.ceil(taskContainerMb * heapRatio);
String xmxArg = String.format("-Xmx%dm", taskHeapSize);
LOG.info("Task java-opts do not specify heap size. Setting task attempt" +
" jvm max heap size to " + xmxArg);
javaOpts += " " + xmxArg;
}
return javaOpts;
}
@Private
public int getMemoryRequired(TaskType taskType) {
int memory = 1024;
int heapSize = parseMaximumHeapSizeMB(getConfiguredTaskJavaOpts(taskType));
float heapRatio = getFloat(MRJobConfig.HEAP_MEMORY_MB_RATIO,
MRJobConfig.DEFAULT_HEAP_MEMORY_MB_RATIO);
if (taskType == TaskType.MAP) {
if (get(MRJobConfig.MAP_MEMORY_MB) == null && heapSize > 0) {
memory = (int) Math.ceil(heapSize / heapRatio);
LOG.info(MRJobConfig.MAP_MEMORY_MB +
" not specified. Derived from javaOpts = " + memory);
} else {
memory = getInt(MRJobConfig.MAP_MEMORY_MB,
MRJobConfig.DEFAULT_MAP_MEMORY_MB);
}
} else if (taskType == TaskType.REDUCE) {
if (get(MRJobConfig.REDUCE_MEMORY_MB) == null && heapSize > 0) {
memory = (int) Math.ceil(heapSize / heapRatio);
LOG.info(MRJobConfig.REDUCE_MEMORY_MB +
" not specified. Derived from javaOpts = " + memory);
} else {
memory = getInt(MRJobConfig.REDUCE_MEMORY_MB,
MRJobConfig.DEFAULT_REDUCE_MEMORY_MB);
}
}
return memory;
}