二次排序算法(可求不同类别下的Top N)

先给一条测试数据:
math,xuzheng,54,52,86,91,42,85,75
课程名,学生姓名,分数
(完整的数据放在文末)

需求:求出每门课程参考学生平均成绩最高的学生的信息:课程,姓名和平均分。

思路:

  1. 创建课程pojo类,实现WritableComparable接口,实现compareTo方法,先对课程名进行比较,相同再对分数进行比较。

  2. 创建分组类继承WritableComparator类,实现compare方法对课程名进行比较。

  3. 在map()中对每行数据进切片,求出平均分,封装进课程对象,然后把改对象作为outKey输出给reduce()。

  4. reduce()每一组的values进行遍历,输出这一组排序好的key。

具体实现

课程类CourseScore

public class CourseScore implements WritableComparable<CourseScore>{
    private String courseName;
    private String studentName;
    private double avgScore;
    public CourseScore() {
    }
    //根据最后的输出格式实现toString方法
    @Override
    public String toString() {
        return courseName + "\t" + studentName + "\t" + avgScore;
    }
    //对属性序列化
    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(courseName);
        out.writeUTF(studentName);
        out.writeDouble(avgScore);
    }
    //反序列化
    @Override
    public void readFields(DataInput in) throws IOException {
        this.courseName = in.readUTF();
        this.studentName = in.readUTF();
        this.avgScore = in.readDouble();
    }
    //实现比较器
    @Override
    public int compareTo(CourseScore o) {
        //先比较课程名
        int result = o.courseName.compareTo(this.courseName);
        //课程名相同,进行平均分的比较
        if(result==0){
            double temp = o.avgScore-this.avgScore;
            if(temp==0){
                return 0;
            }else{
                return temp>0?1:-1;
            }
        }else{
            return result;
        }
    }

    public String getCourseName() {
        return courseName;
    }
    public void setCourseName(String courseName) {
        this.courseName = courseName;
    }
    public String getStudentName() {
        return studentName;
    }
    public void setStudentName(String studentName) {
        this.studentName = studentName;
    }
    public double getAvgScore() {
        return avgScore;
    }
    public void setAvgScore(double avgScore) {
        this.avgScore = avgScore;
    }

}

分组类

public class ClazzScoreGroupComparator extends WritableComparator {
    //默认构造器需要调用父类的构造方法
    public ClazzScoreGroupComparator() {
        super(CourseScore.class,true);
    }

    //实现分组比较器根据课程名分组就比较课程名
    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        CourseScore cs1 = (CourseScore)a;
        CourseScore cs2 = (CourseScore)b;
        return cs1.getCourseName().compareTo(cs2.getCourseName());
    }
}

MapReduce

public class SecondSort {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Job job = Job.getInstance(conf);

        job.setJarByClass(SecondSort.class);
        job.setMapperClass(MR_Mapper.class);
        job.setReducerClass(MR_Reducer.class);

        job.setOutputKeyClass(CourseScore.class);
        job.setOutputValueClass(NullWritable.class);

        //设置分组比较器
        job.setGroupingComparatorClass(ClazzScoreGroupComparator.class);

        Path inputPath = new Path("D:\\bigdata\\flow\\input\\grad");
        Path outputPath = new Path("D:\\bigdata\\flow\\output\\q3");
        FileInputFormat.setInputPaths(job, inputPath);
        if(fs.exists(outputPath)){
            fs.delete(outputPath, true);
        }
        FileOutputFormat.setOutputPath(job, outputPath);

        boolean isDone = job.waitForCompletion(true);
        System.exit(isDone ? 0 : 1);
    }
    /**
     * 原始数据示例:
     *  math,liujialing,85,86,41,75,93,42,85,75
     *  english,huangxiaoming,85,86,41,75,93,42,85
     */
    public static class MR_Mapper extends Mapper<LongWritable, Text, CourseScore, NullWritable>{
        //创建课程分数类对象
        CourseScore cs = new CourseScore();
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] split = value.toString().split(",");
            String courseName = split[0];
            String studentName = split[1];
            double avgScore=0;
            int sum = 0;
            for(int i=2;i<split.length;i++){
                sum += Integer.parseInt(split[i]);
            }
            //求平均分
            avgScore=sum/(split.length-2.0);
            avgScore =(Math.round(avgScore*100)/100.0);//保留两位小数
            //把平均分,课程名,学生姓名封装进对象
            cs.setAvgScore(avgScore);
            cs.setCourseName(courseName);
            cs.setStudentName(studentName);
            context.write(cs, NullWritable.get());
        }
    }

    public static class MR_Reducer extends Reducer<CourseScore, NullWritable, CourseScore, NullWritable>{
        @Override
        protected void reduce(CourseScore key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
            //如果只想输出前N项可以设置计数器int count = 0,放在for循环中为N的时候退出
            //遍历每一组的values并输出这一组key
            for(NullWritable v:values){
                context.write(key, NullWritable.get());
            }
        }
    }
}

数据流分析

  • 原始数据:
computer,huangxiaoming,85,86,41,75,93,42,85
english,yangmi,85,41,75,21,85,96,14
english,huangdatou,48,58,67,86,15,33,85
computer,xuzheng,54,52,86,91,42
computer,huangbo,85,42,96,38
english,liuyifei,76,95,86,74,68,74,48
  • map切分后封装到对象中的数据:
computer    huangxiaoming    72.43
english    liuyifei    74.43
english    huangdatou    56.0
computer    xuzheng    65.0
computer    huangbo    65.25
english    yangmi    59.57
  • 分组后的数据
computer    huangxiaoming    72.43
computer    huangbo    65.25
computer    xuzheng    65.0

english    liuyifei    74.43
english    yangmi    59.57
english    huangdatou    56.0

一般而言,reduce阶段会把具有相同map阶段输出的具有相同key的数据进行聚合,本例中map输出的每一个key都是不同的,但是设置了分组后,reduce就会把每一组的数据进行聚合,我们只要在迭代这组数据的时候输出key,就能把每一组中已经排序好的,不同的key进行输出。

如果不迭代values,直接输出key的话,只会输出这一组的数据的最后一个key,因为这一组前面的数据会被后面的数据覆盖。如在上述示例数据中只会输出:
computer,xuzheng,65.0
english,huangdatou,56.0

注意:

  • 假如排序规则需要:a b c d
  • 那么分组规则只能从前往后比较

    也就是只有如下选择:

    - a
    - a b //先比较a再比较b
    - a b c //先比较a再比较b再比较c
    - a b c d

测试数据

computer,huangxiaoming,85,86,41,75,93,42,85
computer,xuzheng,54,52,86,91,42
computer,huangbo,85,42,96,38
english,zhaobenshan,54,52,86,91,42,85,75
english,liuyifei,85,41,75,21,85,96,14
algorithm,liuyifei,75,85,62,48,54,96,15
computer,huangjiaju,85,75,86,85,85
english,liuyifei,76,95,86,74,68,74,48
english,huangdatou,48,58,67,86,15,33,85
algorithm,huanglei,76,95,86,74,68,74,48
algorithm,huangjiaju,85,75,86,85,85,74,86
computer,huangdatou,48,58,67,86,15,33,85
english,zhouqi,85,86,41,75,93,42,85,75,55,47,22
english,huangbo,85,42,96,38,55,47,22
algorithm,liutao,85,75,85,99,66
computer,huangzitao,85,86,41,75,93,42,85
math,wangbaoqiang,85,86,41,75,93,42,85
computer,liujialing,85,41,75,21,85,96,14,74,86
computer,liuyifei,75,85,62,48,54,96,15
computer,liutao,85,75,85,99,66,88,75,91
computer,huanglei,76,95,86,74,68,74,48
english,liujialing,75,85,62,48,54,96,15
math,huanglei,76,95,86,74,68,74,48
math,huangjiaju,85,75,86,85,85,74,86
math,liutao,48,58,67,86,15,33,85
english,huanglei,85,75,85,99,66,88,75,91
math,xuzheng,54,52,86,91,42,85,75
math,huangxiaoming,85,75,85,99,66,88,75,91
math,liujialing,85,86,41,75,93,42,85,75
english,huangxiaoming,85,86,41,75,93,42,85
algorithm,huangdatou,48,58,67,86,15,33,85
algorithm,huangzitao,85,86,41,75,93,42,85,75
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值