mapreduce的自定义分组器

最新推荐文章于 2022-06-26 12:15:07 发布

lds_include

最新推荐文章于 2022-06-26 12:15:07 发布

阅读量768

点赞数

分类专栏：大数据 MapReduce Hadoop 文章标签： mapreduce的自定义分组器

本文链接：https://blog.csdn.net/lds_include/article/details/92797370

版权

大数据同时被 3 个专栏收录

70 篇文章 4 订阅

订阅专栏

Hadoop

22 篇文章 0 订阅

订阅专栏

MapReduce

8 篇文章 0 订阅

订阅专栏

Mapreduce自定义分组器

前提：有的时候我们想将符合条件的key值放在同一个组内；但是key的值是不同的将不会放进同一个组中。

举例：想将一个学生的进校以后不同时间段的数学成绩按进校考试的时间进行一个成绩排序。如下效果

//排序前的效果
 stu1 time1 core1
 stu1 time2 core
 stu1 time3 core3
 stu2 time1 core1
 stu2 time2 core3
 stu2 time3 core2

//排序后的效果
 stu1 core3,core2,core1
 stu2 core2,core3,core1

方法：

说明：提前就已经重新定义了数据类型，类型定义如下：

/**
 * @description 这是自己的key值
 * @author: LuoDeSong 694118297@qq.com
 * @create: 2019-06-18 14:49:55
 **/
public class OneKey implements WritableComparable {
    private String first;
    private String secend;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        OneKey myKey = (OneKey) o;
        return Objects.equals(first, myKey.first) &&
                Objects.equals(secend, myKey.secend);
    }

    @Override
    public int hashCode() {
        return Objects.hash(first, secend);
    }

    public OneKey() {
        super();
    }

    public OneKey(String first, String secend) {
        this.first = first;
        this.secend = secend;
    }

    public void setFirst(String first) {
        this.first = first;
    }

    public void setSecend(String secend) {
        this.secend = secend;
    }

    public String getFirst() {
        return first;
    }

    public String getSecend() {
        return secend;
    }

    @Override
    public int compareTo(Object o) {//如果学生不同按学号先后排序，如果是同一个学生的话就将按考试的时间倒序来排。
        OneKey o1 = (OneKey)o;
        int ans = this.first.compareTo(o1.getFirst());
        if(ans != 0){
            return ans;
        }
        return o1.getSecend().compareTo(this.secend);
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(this.first);
        out.writeUTF(this.secend);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        this.first = in.readUTF();
        this.secend = in.readUTF();
    }
}

首先：要重新定义一下分组的方法，代码如下：

/**
 * @description: 这个是用于将重新定义分组的key
 * @author: LuoDeSong 694118297@qq.com
 * @create: 2019-06-18 16:10:18
 **/
public class OneComparator extends WritableComparator {
    public OneComparator() {//将要重新定义的分组的key的类型传进来
        super(OneKey.class,true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {//对应这道题来说就是只是按照这个组合key的学生的学号来分组
        OneKey a1 = (OneKey)a;
        OneKey b1 = (OneKey) b;
        return a1.getFirst().compareTo(b1.getFirst());
    }
}

编写mapper：

import java.io.IOException;
/**
 * @description
 * @author: LuoDeSong 694118297@qq.com
 * @create: 2019-06-18 15:02:08
 **/
public class OneMapper extends Mapper<LongWritable, Text, OneKey,Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String lines = value.toString();
        String[] strings = lines.split(",");
        OneKey oneKey = new OneKey();
        oneKey.setFirst(strings[0]);
        oneKey.setSecend(strings[1]);
        context.write(oneKey, new Text(strings[2]));
    }
}

编写reducer：

/**
 * @description
 * @author: LuoDeSong 694118297@qq.com
 * @create: 2019-06-18 15:19:31
 **/
public class OneReducer extends Reducer<OneKey, Text, Text, Text> {
    @Override
    protected void reduce(OneKey key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        StringBuilder sb=new StringBuilder();
        for (Text text: values) {
            sb.append(text.toString()).append(",");
        }
        context.write(new Text(key.getFirst()), new Text(sb.substring(0, sb.length()-1)));
    }
}

编写driver：

public class OneDriver {
    /**
     *
     * @param args
     * @throws IOException
     * @throws ClassNotFoundException
     * @throws InterruptedException
     */
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        String inputPath = "E:/Test-workspace/test/input/2-1.txt";
        String outputPath = "E:/Test-workspace/test/output/2-1";
        Configuration configuration = new Configuration();
        Job instance = Job.getInstance(configuration);
        instance.setJarByClass(OneDriver.class);
        //设置自定分组
        instance.setGroupingComparatorClass(OneComparator.class);
        //instance.setNumReduceTasks(2);这是设置reduce的个数

        instance.setMapperClass(OneMapper.class);
        instance.setReducerClass(OneReducer.class);

        instance.setMapOutputKeyClass(OneKey.class);
        instance.setMapOutputValueClass(Text.class);

        instance.setOutputKeyClass(Text.class);
        instance.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(instance, inputPath);
        FileOutputFormat.setOutputPath(instance, new Path(outputPath));

        boolean waitForCompletion = instance.waitForCompletion(true);
        System.exit(waitForCompletion?0:1);
    }
}

lds_include

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
mapreduce的自定义分组器

Mapreduce自定义分组器前提：有的时候我们想将符合条件的key值放在同一个组内；但是key的值是不同的将不会放进同一个组中。举例：想将一个学生的进校以后不同时间段的数学成绩按进校考试的时间进行一个成绩排序。如下效果//排序前的效果 stu1 time1 core1 stu1 time2 core stu1 time3 core3 stu2 time1 core1 stu2 t...
复制链接

扫一扫