hadoop版本: 2.7.3
所在工程:hadoop-mapreduce-client、 hadoop-mapreduce-client-core
API版本:旧版API org.apache.hadoop.mapred
一、类关系
二、FileInputFormat
主要功能: 获取splits,方法:getSplits
概念:
globalSize: totalsize/numSplits ;
minSize: InputSplit 最小值 配置参数
blockSize: block大小
公式: splitSize = max{ minSize, min{globalSize,blockSize}}
/** Splits files returned by {@link #listStatus(JobConf)} when
* they're too big.*/
public InputSplit[] getSplits(JobConf job, int numSplits)
throws IOException {
StopWatch sw = new StopWatch().start();
FileStatus[] files = listStatus(job);
// Save the number of input files for metrics/loadgen
job.setLong(NUM_INPUT_FILES, files.length);
long totalSize = 0; // compute total size
for (FileStatus file: