ccah-500 第9题 How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
9.You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
A. For a 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O
B. Increase the io.sort.mb to 1GB
C. Decrease the io.sort.mb value to 0
D. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records.
- This sets the size of memory buffer used during sort operations.
This buffer is contained within the map/reduce task’s JVM heap as defined in mapred.child.java.opts. If this buffer size is too small for the amount of input data, it can lead to intermediate spills to disk and which will later need to be read and merged.
Increasing this value will reduce or eliminate the number of intermediate spills going to disk and reduce the overall I/O load on your system.
Default value: 100 Mb
Recommended value: Use 1/4 to 1/2 of the map/reduce task Java heap size setting (in mapred.child.java.opts).
Auto-tuned value: 1/2 of the map/reduce Java heap size
Reference from the book ” Hadoop Operations” of Eric Sammer:
“The value of io.sort.mbis specified in megabytes and, by default, is 100.
Increasing the size of this buffer results in fewer spills to disk and, as a consequence, reduces the number of spill files that must be merged when the map task completes.
The io.sort.mbparameter is one way administrators and job developers can trade more memory for reduced disk IO.
The downside of this is that this buffer must be contained within the child task’s JVM heap allocation, as defined by mapred.child.java.opts.
For example, with a child heap size of 1GB and io.sort.mbset to 128, only 896MB is really available to the user’s code
Remember that ultimately, all records output by map tasks must be spilled so,
in the ideal scenario, these numbers are equal.”