31.You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric?
A. When your workload generates a large amount of output data, significantly larger than the amount of intermediate data
B. When your workload consumes a large amount of input data, relative to the entire capacity if HDFS
C. When your workload consists of processor-intensive tasks
D. When your workload generates a large amount of intermediate data, on the order of the input data itself
Answer: D
A 当负载生成的输出数据显著大于中间数据的量时。
B 当工作负载需要大量输入数据,相对于hdfs整个容量。
C 当工作负载由处理器密集型任务组成。
A 有点道理.
Questions enforces more on Network Fabric not I/O bound which are local.
Large data output means, large data shuffle across network for Reducer.
D 的依据更明显。
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
“When we encounter applications that produce large amounts of intermediate data — outputting data on the same order as the amount read in — we recommend two ports on a single Ethernet card or two channel-bonded Ethernet cards to provide 2 Gbps per machine.
Cloudera recommends:
Consider 10Gb/sec in the cases:
- Clusters storing very large amounts of data
- Clusters in which typical MapReduce jobs produce large amounts of intermediate data.
please take note that: Intermediate data is transferred across the network to the Reducers