写了几个MapReduce代码,发现有好几种提交方式,网上搜了一下,没有找到对比这几种提交写法的帖子,所以准备自己写一下。比较简单,没有什么花头
同样是新版API中,最开始接触到的都是这种最基本的提交方式
public static void main(String[] args) throws Exception {
BasicConfigurator.configure();
Configuration conf = new Configuration();
LOGGER.info(conf.toString());
Job job = Job.getInstance(conf);
job.setJarByClass(IdDataProcess.class);
job.setMapperClass(IdDataProcess.DataMapper.class);
job.setReducerClass(IdDataProcess.DataReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
String uri = "hdfs://hadoop:xxxxxx@xxxxxx:9000/cf/cust";
FileInputFormat.addInputPath(job, new Path(uri));
Path out = new Path("./certno");
FileOutputFormat.setOutputPath(job, out);
FileSystem fileSystem = FileSystem.get(new URI(out.toString()), new Configuration());
if (fileSystem.exists(out)) {
fileSystem.delete(out, true);
}
boolean result = job.waitForCompletion(true);//提交任务
Path hdfsPath = new Path("./certno/part-r-00000");
FSDataInputStream fsDataInputStream = fileSystem.open(hdfsPath);
OutputStream outputStream = new FileOutputStream("./data/province/result");
IOUtils.copyBytes(fsDataInputStream, outputStream, 4096, true);
System.exit(result ? 0 : 1);
}
job.waitForCompletion方法后台调用的是Job.submit方法提交任务到集群执行。
后来看到很多代码是下面这种形式:
public class SortDataPreprocessor extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
if (job == null) {
return -1;
}
job.setMapperClass(CleanerMapper.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
job.setNumReduceTasks(0);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
SequenceFileOutputFormat.setCompressOutput(job, true);
SequenceFileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
SequenceFileOutputFormat.setOutputCompressionType(job,
CompressionType.BLOCK);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new SortDataPreprocessor(), args);
System.exit(exitCode);
}
}
其中Tool是hadoop提供的一个工具接口
<