Hadoop - Map/Reduce 中的执行参数汇总

其实,在Hadoop的doc主页中,已经对于所有的参数有了详尽的介绍,这里只是为了方便以后查找,将这部分转载过来。

-------------------------

Configuring Memory Requirements For A Job

MapReduce tasks are launched with some default memory limits that are provided by the system or by the cluster's administrators. Memory intensive jobs might need to use more than these default values. Hadoop has some configuration options that allow these to be changed. Without such modifications, memory intensive jobs could fail due to OutOfMemory errors in tasks or could get killed when the limits are enforced by the system. This section describes the various options that can be used to configure specific memory requirements.

  • mapreduce.{map|reduce}.java.opts: If the task requires more Java heap space, this option must be used. The value of this option should pass the desired heap using the JVM option -Xmx. For example, to use 1G of heap space, the option should be passed in as -Xmx1024m. Note that other JVM options are also passed using the same option. Hence, append the heap space option along with other options already configured.
  • mapreduce.{map|reduce}.ulimit: The slaves where tasks are run could be configured with a ulimit value that applies a limit to every process that is launched on the slave. If the task, or any child that the task launches (like in streaming), requires more than the configured limit, this option must be used. The value is given in kilobytes. For example, to increase the ulimit to 1G, the option should be set to 1048576. Note that this value is a per process limit. Since it applies to the JVM as well, the heap space given to the JVM through the mapreduce.{map|reduce}.java.opts should be less than the value configured for the ulimit. Otherwise the JVM will not start.
  • mapreduce.{map|reduce}.memory.mb: In some environments, administrators might have configured a total limit on the virtual memory used by the entire process tree for a task, including all processes launched recursively by the task or its children, like in streaming. More details about this can be found in the section on Monitoring Task Memory Usage in the Cluster SetUp guide. If a task requires more virtual memory for its entire tree, this option must be used. The value is given in MB. For example, to set the limit to 1G, the option should be set to 1024. Note that this value does not automatically influence the per process ulimit or heap space. Hence, you may need to set those parameters as well (as described above) in order to give your tasks the right amount of memory.
  • mapreduce.{map|reduce}.memory.physical.mb: This parameter is similar to mapreduce.{map|reduce}.memory.mb, except it specifies how much physical memory is required by a task for its entire tree of processes. The parameter is applicable if administrators have configured a total limit on the physical memory used by all MapReduce tasks.

As seen above, each of the options can be specified separately for map and reduce tasks. It is typically the case that the different types of tasks have different memory requirements. Hence different values can be set for the corresponding options.

The memory available to some parts of the framework is also configurable. In map and reduce tasks, performance may be influenced by adjusting parameters influencing the concurrency of operations and the frequency with which data will hit disk. Monitoring the filesystem counters for a job- particularly relative to byte counts from the map and into the reduce- is invaluable to the tuning of these parameters.

Note: The memory related configuration options described above are used only for configuring the launched child tasks from the tasktracker. Configuring the memory options for daemons is documented under Configuring the Environment of the Hadoop Daemons (Cluster Setup).

Map Parameters

A record emitted from a map and its metadata will be serialized into a buffer. As described in the following options, when the record data exceed a threshold, the contents of this buffer will be sorted and written to disk in the background (a "spill") while the map continues to output records. If the remainder of the buffer fills during the spill, the map thread will block. When the map is finished, any buffered records are written to disk and all on-disk segments are merged into a single file. Minimizing the number of spills to disk can decrease map time, but a larger buffer also decreases the memory available to the mapper.

NameTypeDescription
mapreduce.task.io.sort.mbintThe cumulative size of the serialization and accounting buffers storing records emitted from the map, in megabytes.
mapreduce.map.sort.spill.percentfloatThis is the threshold for the accounting and serialization buffer. When this percentage of the io.sort.mb has filled, its contents will be spilled to disk in the background. Note that a higher value may decrease the number of- or even eliminate- merges, but will also increase the probability of the map task getting blocked. The lowest average map times are usually obtained by accurately estimating the size of the map output and preventing multiple spills.

Other notes

  • If the spill threshold is exceeded while a spill is in progress, collection will continue until the spill is finished. For example, if mapreduce.map.sort.spill.percent is set to 0.33, and the remainder of the buffer is filled while the spill runs, the next spill will include all the collected records, or 0.66 of the buffer, and will not generate additional spills. In other words, the thresholds are defining triggers, not blocking.
  • A record larger than the serialization buffer will first trigger a spill, then be spilled to a separate file. It is undefined whether or not this record will first pass through the combiner.
Shuffle/Reduce Parameters

As described previously, each reduce fetches the output assigned to it by the Partitioner via HTTP into memory and periodically merges these outputs to disk. If intermediate compression of map outputs is turned on, each output is decompressed into memory. The following options affect the frequency of these merges to disk prior to the reduce and the memory allocated to map output during the reduce.

NameTypeDescription
mapreduce.task.io.sort.factorintSpecifies the number of segments on disk to be merged at the same time. It limits the number of open files and compression codecs during the merge. If the number of files exceeds this limit, the merge will proceed in several passes. Though this limit also applies to the map, most jobs should be configured so that hitting this limit is unlikely there.
mapreduce.reduce.merge.inmem.thresholdintThe number of sorted map outputs fetched into memory before being merged to disk. Like the spill thresholds in the preceding note, this is not defining a unit of partition, but a trigger. In practice, this is usually set very high (1000) or disabled (0), since merging in-memory segments is often less expensive than merging from disk (see notes following this table). This threshold influences only the frequency of in-memory merges during the shuffle.
mapreduce.reduce.shuffle.merge.percentfloatThe memory threshold for fetched map outputs before an in-memory merge is started, expressed as a percentage of memory allocated to storing map outputs in memory. Since map outputs that can't fit in memory can be stalled, setting this high may decrease parallelism between the fetch and merge. Conversely, values as high as 1.0 have been effective for reduces whose input can fit entirely in memory. This parameter influences only the frequency of in-memory merges during the shuffle.
mapreduce.reduce.shuffle.input.buffer.percentfloatThe percentage of memory- relative to the maximum heapsize as typically specified in mapreduce.reduce.java.opts- that can be allocated to storing map outputs during the shuffle. Though some memory should be set aside for the framework, in general it is advantageous to set this high enough to store large and numerous map outputs.
mapreduce.reduce.input.buffer.percentfloatThe percentage of memory relative to the maximum heapsize in which map outputs may be retained during the reduce. When the reduce begins, map outputs will be merged to disk until those that remain are under the resource limit this defines. By default, all map outputs are merged to disk before the reduce begins to maximize the memory available to the reduce. For less memory-intensive reduces, this should be increased to avoid trips to disk.

Other notes

  • If a map output is larger than 25 percent of the memory allocated to copying map outputs, it will be written directly to disk without first staging through memory.
  • When running with a combiner, the reasoning about high merge thresholds and large buffers may not hold. For merges started before all map outputs have been fetched, the combiner is run while spilling to disk. In some cases, one can obtain better reduce times by spending resources combining map outputs- making disk spills small and parallelizing spilling and fetching- rather than aggressively increasing buffer sizes.
  • When merging in-memory map outputs to disk to begin the reduce, if an intermediate merge is necessary because there are segments to spill and at least mapreduce.task.io.sort.factor segments already on disk, the in-memory map outputs will be part of the intermediate merge.
Directory Structure

The task tracker has local directory, ${mapreduce.cluster.local.dir}/taskTracker/ to create localized cache and localized job. It can define multiple local directories (spanning multiple disks) and then each filename is assigned to a semi-random local directory. When the job starts, task tracker creates a localized job directory relative to the local directory specified in the configuration. Thus the task tracker directory structure looks as following:

  • ${mapreduce.cluster.local.dir}/taskTracker/distcache/ : The public distributed cache for the jobs of all users. This directory holds the localized public distributed cache. Thus localized public distributed cache is shared among all the tasks and jobs of all users.
  • ${mapreduce.cluster.local.dir}/taskTracker/$user/distcache/ : The private distributed cache for the jobs of the specific user. This directory holds the localized private distributed cache. Thus localized private distributed cache is shared among all the tasks and jobs of the specific user only. It is not accessible to jobs of other users.
  • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/ : The localized job directory
    • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/work/ : The job-specific shared directory. The tasks can use this space as scratch space and share files among them. This directory is exposed to the users through the configuration property mapreduce.job.local.dir. It is available as System property also. So, users (streaming etc.) can call System.getProperty("mapreduce.job.local.dir") to access the directory.
    • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/jars/ : The jars directory, which has the job jar file and expanded jar. The job.jar is the application's jar file that is automatically distributed to each machine. Any library jars that are dependencies of the application code may be packaged inside this jar in a lib/ directory. This directory is extracted from job.jar and its contents are automatically added to the classpath for each task. The job.jar location is accessible to the application through the API Job.getJar() . To access the unjarred directory, Job.getJar().getParent() can be called.
    • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/job.xml : The job.xml file, the generic job configuration, localized for the job.
    • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid : The task directory for each task attempt. Each task directory again has the following structure :
      • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/job.xml : A job.xml file, task localized job configuration, Task localization means that properties have been set that are specific to this particular task within the job. The properties localized for each task are described below.
      • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/output : A directory for intermediate output files. This contains the temporary map reduce data generated by the framework such as map output files etc.
      • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/work : The curernt working directory of the task. With jvm reuse enabled for tasks, this directory will be the directory on which the jvm has started
      • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/work/tmp : The temporary directory for the task. (User can specify the property mapreduce.task.tmp.dir to set the value of temporary directory for map and reduce tasks. This defaults to ./tmp. If the value is not an absolute path, it is prepended with task's working directory. Otherwise, it is directly assigned. The directory will be created if it doesn't exist. Then, the child java tasks are executed with option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and streaming are set with environment variable, TMPDIR='the absolute path of the tmp dir'). This directory is created, if mapreduce.task.tmp.dir has the value ./tmp
Task JVM Reuse

Jobs can enable task JVMs to be reused by specifying the job configuration mapreduce.job.jvm.numtasks. If the value is 1 (the default), then JVMs are not reused (i.e. 1 task per JVM). If it is -1, there is no limit to the number of tasks a JVM can run (of the same job). One can also specify some value greater than 1 using the api Job.getConfiguration().setInt(Job.JVM_NUM_TASKS_TO_RUN, int).

Configured Parameters

The following properties are localized in the job configuration for each task's execution:

NameTypeDescription
mapreduce.job.idStringThe job id
mapreduce.job.jarStringjob.jar location in job directory
mapreduce.job.local.dirStringThe job specific shared scratch space
mapreduce.task.idStringThe task id
mapreduce.task.attempt.idStringThe task attempt id
mapreduce.task.ismapbooleanIs this a map task
mapreduce.task.partitionintThe id of the task within the job
mapreduce.map.input.fileStringThe filename that the map is reading from
mapreduce.map.input.startlongThe offset of the start of the map input split
mapreduce.map.input.lengthlongThe number of bytes in the map input split
mapreduce.task.output.dirStringThe task's temporary output directory

Note: During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( _ ). For example, mapreduce.job.id becomes mapreduce.job.id and mapreduce.job.jar becomes mapreduce.job.jar. To get the values in a streaming job's mapper/reducer use the parameter names with the underscores.



  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值