即便删除如下加粗的部分,程序还是能跑起来,A/B程序的运行效果是一样的。
public class MinimalMapReduceWithDefaults extends Configured implements Tool {
@Override
public int run(String[] args) throws IOException {
Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
if ( job == null) {
return -1;
}
job.setInputFormat(TextInputFormat.class);
job.setMapperClass(Mapper.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setPartitionerClass(HashPartitioner.class);
job.setNumReduceTasks(1);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws IOException {
int exitCode = ToolRunner.run(new MinimalMapReduceWithDefaults(), args);
System.exit(exitCode);
}
}
1. 代码说明:
几乎所有的驱动程序都有输入和输出2个参数,所以此处简化代码,新建了JobBuilder类。
1) JobBuilder.java
public static Job parseInputAndOutput(Tool tool, Configuration conf, String[] args) {
if (args.length != 2) {
printUsage(tool, "<input> <output>");
return null;
}
Job job = Job.getInstance(conf, "Demo");
job.setJarByClass(tool.getClass);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job;
}
public static void printUsage(Tool tool, String extraArgsUsage) {
System.err.printf("Usage: %s [genericOptions] %s\n\n", tool.getClass().getSimpleName(), extraArgsUsage);
GenericOptionsParser.printGenericCommandUsage(System.err);
}
2) 默认的输入格式是 TextInputFormat,产生的键类型是LongWritable(文件中每行开始的偏移量,由此可保证唯一性)
3)默认的Partitioner是 HashPartitioner,它对每条记录的键进行哈希操作来决定该条记录属于哪个分区,每个分区对应一个Reducer任务,所以:分区数等于Reducer数!
public class HashPartitioner<K, V> extends Partitioner<K, V> {
public int getPartition(K key, V value, int numPartitions) {
return ((key.hashCode() & Integer.MAX_VALUE) % numPartitions);
}
4)默认情况下,只有一个Reducer,所以只有一个分区,numPartions=1,由此可得上面getPartition()的结果总是等于0,所以设不设置Partitioner无关紧要了。
5)你可能注意到并没有设置Mapper数量。原因是:Map任务数量等于输入文件被划分成的分片数,取决于输入文件的大小和文件块的大小。
先记录到这里,做一个知识收纳~