Hadoop实现全排序要点及基本框架

最新推荐文章于 2022-03-27 22:41:34 发布

qq_14847537

最新推荐文章于 2022-03-27 22:41:34 发布

阅读量360

点赞数

分类专栏： hadoop笔记笔记文章标签： hadoop mapreduce

本文链接：https://blog.csdn.net/qq_14847537/article/details/77685344

版权

笔记同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

hadoop笔记

2 篇文章 0 订阅

订阅专栏

MapReduce实现全排序要点

有两种方式，一种是继承 WritableComparable 接口，另一种是实现自己的比较器 Comparator。

WritableComparable 接口

所有的 Mappers 和 Reducers 进程中的对象都必须实现一个特定的接口：Writable 接口。另外，Reducer 端的 Key 要实现 WritableComparable 实例。

Writable 接口有两个方法如下：

Write ：将实例的所有原始属性值写到 java.io.DataOutput 类型的输出流。DataOutput 类中的方法可以序列化基本的Java 数据类型。
readFields ：利用从 javaio.DataInput 类型的输入流中抓取的数据重新创建 Writable 实例。DataInput 类中的方法可以反序列化基本的 Java数据类型。

注意：这些方法中字段的顺序必须和 write() 和 readFields()　方法中的顺序一致。

实例程序：

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.WritableComparable;

public class MonthDoWWritable implements WritableComparable<MonthDoWWritable>{
    public int monthSort = 1;
    public int dowSort = -1;

    public IntWritable month=new IntWritable();
    public IntWritable dayOfWeek = new IntWritable();

    public MonthDoWWritable(){ 
    }

    @Override
    public void write(DataOutput out) throws IOException {
        this.month.write(out);
        this.dayOfWeek.write(out);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        this.month.readFields(in);
        this.dayOfWeek.readFields(in);
    }

    @Override
    public int compareTo(MonthDoWWritable second) {
        if(this.month.get()==second.month.get()){
            return -1*this.dayOfWeek.compareTo(second.dayOfWeek);
        }
        else{
            return 1*this.month.compareTo(second.month);
        }
    }

    @Override
    public boolean equals(Object o) {
        if (!(o instanceof MonthDoWWritable)) {
          return false;
        }
        MonthDoWWritable other = (MonthDoWWritable)o;
        return this.month.get() == other.month.get() && this.dayOfWeek.get() == other.dayOfWeek.get();         
      }

    @Override
    public int hashCode() {
        return (this.month.get()-1);
    }
}

要点如下：

提供一个无参数的构造函数。Hadoop框架通过一个无参数的构造函数来创建实例。
compareTo() 方法用于比较两个 Writable 实例。
hashCode() 的作用就是为 MonthDowWritable 的实例提供一种散列。因为这种 hash 值并不具有唯一性，因此 equals() 方法和 hashCode() 方法一起实现，这样可以确定返回相同 hashCode() 的两个实例是不是真的相等。注意hashCode 的返回值的使用者是 HashPartitioner 类，这个是 Hadoop中默认的 Partitioner 类。

MapReduce程序运行主体框架

public static class MyMapper extends 
        Mapper<LongWritable, Text, Text, IntWritable>{
    public void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptException{

    }
}
public static class MyReducer extends 
        Reducer<Text, IntWtritable, Text, IntWritable>{
    public void reduce(Text key, Iterable<IntWritable> values, Context context)
      throws IOException, InterruptException{

    }
}
public static class MyCombiner extends 
        Reducer<Text, IntWritable, Text, IntWritable>{
    public void reduce(Text key, Iterable<IntWritable> values, Context context)
      throws IOException, InterruptException{

    }
}
public static class MyPatitioner extends 
        Partitioner<Text, IntWritable>{
    @Override
    public int partition(Text key, IntWirtable value, int numPartitions){
        return 
    }
}
public int run(String[] allArgs){
    Job job = Job.getInstance(getConf);
    job.setJarByClass(  .class);

    job.setMapperClass(MyMapper.class);
    job.setReducerClass(MyReducer.class);
    job.setCombinerClass(MyCombiner.class);
    job.setPartitionerClass(MyPartitioner.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextInputFormat.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    String[] args = new GenericOptionsParser(getConf(), allArgs).getRemainingArgs();
    FileInputFormat.setInputPaths(job, new Path(arg[0]));
    FileOutputFormat.setOutputPahts(job, new Paht(arg[1]));

    job.waitForCompletion(true);
    return 0;
}
public static void main(String[] args){
    Configuration conf = new Configuration();
    ToolRunner.run(new ***(), args);
}