MapReduce实现全排序要点
有两种方式,一种是继承 WritableComparable 接口,另一种是实现自己的比较器 Comparator。
WritableComparable 接口
所有的 Mappers 和 Reducers 进程中的对象都必须实现一个特定的接口:Writable 接口。另外,Reducer 端的 Key 要实现 WritableComparable 实例。
Writable 接口有两个方法如下:
- Write :将实例的所有原始属性值写到 java.io.DataOutput 类型的输出流。DataOutput 类中的方法可以序列化基本的Java 数据类型。
- readFields :利用从 javaio.DataInput 类型的输入流中抓取的数据重新创建 Writable 实例。DataInput 类中的方法可以反序列化基本的 Java数据类型。
注意:这些方法中字段的顺序必须和 write() 和 readFields() 方法中的顺序一致。
实例程序:
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.WritableComparable;
public class MonthDoWWritable implements WritableComparable<MonthDoWWritable>{
public int monthSort = 1;
public int dowSort = -1;
public IntWritable month=new IntWritable();
public IntWritable dayOfWeek = new IntWritable();
public MonthDoWWritable(){
}
@Override
public void write(DataOutput out) throws IOException {
this.month.write(out);
this.dayOfWeek.write(out);
}
@Override
public void readFields(DataInput in) throws IOException {
this.month.readFields(in);
this.dayOfWeek.readFields(in);
}
@Override
public int compareTo(MonthDoWWritable second) {
if(this.month.get()==second.month.get()){
return -1*this.dayOfWeek.compareTo(second.dayOfWeek);
}
else{
return 1*this.month.compareTo(second.month);
}
}
@Override
public boolean equals(Object o) {
if (!(o instanceof MonthDoWWritable)) {
return false;
}
MonthDoWWritable other = (MonthDoWWritable)o;
return this.month.get() == other.month.get() && this.dayOfWeek.get() == other.dayOfWeek.get();
}
@Override
public int hashCode() {
return (this.month.get()-1);
}
}
要点如下:
- 提供一个无参数的构造函数。Hadoop框架通过一个无参数的构造函数来创建实例。
- compareTo() 方法用于比较两个 Writable 实例。
- hashCode() 的作用就是为 MonthDowWritable 的实例提供一种散列。因为这种 hash 值并不具有唯一性,因此 equals() 方法和 hashCode() 方法一起实现,这样可以确定返回相同 hashCode() 的两个实例是不是真的相等。注意hashCode 的返回值的使用者是 HashPartitioner 类,这个是 Hadoop中默认的 Partitioner 类。
MapReduce程序运行主体框架
public static class MyMapper extends
Mapper<LongWritable, Text, Text, IntWritable>{
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptException{
}
}
public static class MyReducer extends
Reducer<Text, IntWtritable, Text, IntWritable>{
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptException{
}
}
public static class MyCombiner extends
Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptException{
}
}
public static class MyPatitioner extends
Partitioner<Text, IntWritable>{
@Override
public int partition(Text key, IntWirtable value, int numPartitions){
return
}
}
public int run(String[] allArgs){
Job job = Job.getInstance(getConf);
job.setJarByClass( .class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setCombinerClass(MyCombiner.class);
job.setPartitionerClass(MyPartitioner.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextInputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
String[] args = new GenericOptionsParser(getConf(), allArgs).getRemainingArgs();
FileInputFormat.setInputPaths(job, new Path(arg[0]));
FileOutputFormat.setOutputPahts(job, new Paht(arg[1]));
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args){
Configuration conf = new Configuration();
ToolRunner.run(new ***(), args);
}