题目:利用MapReduce对 file1.txt和 file2.txt里面对里面的内容进行去重,排序,并输出结果。。。
1.Mapper阶段:
主要是对<k1,v1>进行排序,排序之后<k2,v2>作为Map的输出;
public class DistinctMapper extends Mapper<LongWritable,Text,Text,Text>{
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
context.write(value, new Text()); //置v2为空,不可直接写null
}
}
2.Reducer阶段:此时<k2,v2>是已经排好序的,
public class DistinctReducer extends Reducer<Text, Text, Text, Text> {
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
//v3可以直接:null , 在<k3,v3>该阶段已对k3进行去重处理
context.write(key, null);
}
}
3.Driver阶段-主类
public class DistinctDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Path outfile=new Path("file:///D:/outToDate");
FileSystem fs=outfile.getFileSystem(conf);
if(fs.exists(outfile)){
fs.delete(outfile,true);
}
Job job = Job.getInstance(conf);
job.setJarByClass(DistinctDriver.class);
job.setJobName("mysort");
job.setMapperClass(DistinctMapper.class);//输入数据方法
job.setReducerClass(DistinctReducer.class);//计算结果
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path("file:///D:/quchong"));
FileOutputFormat.setOutputPath(job, outfile);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
4.处理结果
【file1.txt】
2012-3-1 a
2012-3-2 b
2012-3-3 c
2012-3-4 d
2012-3-5 a
2012-3-6 b
2012-3-7 c
2012-3-3 c
【file2.txt】
2012-3-1 b
2012-3-2 a
2012-3-3 b
2012-3-4 d
2012-3-5 a
2012-3-6 c
2012-3-7 d
2012-3-3 c
part-r-00000 --运行程序输出的结果(已去重并且排序后的数据)
2012-3-1 a
2012-3-1 b
2012-3-2 a
2012-3-2 b
2012-3-3 b
2012-3-3 c
2012-3-4 d
2012-3-5 a
2012-3-6 b
2012-3-6 c
2012-3-7 c
2012-3-7 d
很简单的一个去重排序的小程序!!!