以WordCount为例, MapReduce任务包含至少三个类: Driver类、Mapper类、Reducer类.
Mapper类和Reducer类暂时不说.
Driver类的开发更像是"八股文"一般, 有着固定的格式.
①配置Job
Configuration configuration = new Configuration();
Job job = Job.getInstance(configuration);
②配置Driver类、Mapper类、Reducer类
job.setJarByClass(WordCountDriver.class);
job.setMapperClass(WordcountMapper.class);
job.setReducerClass(WordcountReducer.class);
③配置mapper类输出和最终输出的key/value类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
④配置输入输出路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
⑤提交
Boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
完整代码如下:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountDriver2 {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//1. 配置job
Configuration configuration = new Configuration();
Job job = Job.getInstance(configuration);
//2. 配置driver、map、reduce
job.setJarByClass(WordCountDriver.class);
job.setMapperClass(WordcountMapper.class);
job.setReducerClass(WordcountReducer.class);
//3. 配置map和最终输出的key/value类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//4. 配置输入输出路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//5. 提交
Boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
那先这样, 祝看官大人码出优秀的"八股文", 接着"蟾宫折桂".