辅助排序和二次排序案例(GroupingComparator)

本文介绍了如何使用HadoopMapReduce框架,通过Mapper和Reducer处理订单数据,利用GroupingComparator找到每个订单中的最高成交金额商品。作者详细展示了订单信息Bean的定义、Mapper处理数据、Reducer聚合结果以及Driver程序的实现。
摘要由CSDN通过智能技术生成

1)需求

有如下订单数据

订单id

商品id

成交金额

0000001

Pdt_01

222.8

0000001

Pdt_06

25.8

0000002

Pdt_03

522.8

0000002

Pdt_04

122.4

0000002

Pdt_05

722.4

0000003

Pdt_01

222.8

0000003

Pdt_02

33.8

现在需要求出每一个订单中最贵的商品。

2)输入数据

输出数据预期:

    

3)分析

(1)利用“订单id和成交金额”作为key,可以将map阶段读取到的所有订单数据按照id分区,按照金额排序,发送到reduce。

(2)在reduce端利用groupingcomparator将订单id相同的kv聚合成组,然后取第一个即是最大值。

4)代码实现

(1)定义订单信息Bean

package bigdata.b13.groupSort;

import lombok.Getter;
import lombok.Setter;
import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

@Getter
@Setter
public class FlowBean implements WritableComparable<FlowBean> {
    private long id;//订单id
    private double price;//金额

    //空参构造器,反序列化需要
    public FlowBean() {
    }
    //重载赋值
    public FlowBean(long id, double price) {
        this.id = id;
        this.price = price;
    }

    //二次排序
    @Override
    public int compareTo(FlowBean o) {
        //按照订单id升序
        if (this.id > o.id) {
            return 1;
        } else if (this.id < o.id) {
            return -1;
        } else if (this.price > o.price) {//价格倒序
            return -1;
        }
        return 1;
    }


    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeLong(id);
        dataOutput.writeDouble(price);
    }

    @Override
    public void readFields(DataInput dataInput) throws IOException {
        this.id= dataInput.readLong();
        this.price= dataInput.readDouble();
    }

    @Override
    public String toString() {
        return this.id+"\t"+this.price;
    }
}

(2)编写Mapper

package bigdata.b13.groupSort;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class WritableComparableMapper extends Mapper<LongWritable, Text, FlowBean, NullWritable> {

    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, FlowBean, NullWritable>.Context context) throws IOException, InterruptedException {
        //获取数据
        String string = value.toString();
        //切分
        String[] split = string.split("\t");
        //赋值,数据封装
        FlowBean flowBean = new FlowBean();
        flowBean.setId(Long.parseLong(split[0]));
        flowBean.setPrice(Double.parseDouble(split[2]));

        context.write(flowBean, NullWritable.get());
    }
}

(3)编写reducer

package bigdata.b13.groupSort;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WritableComparableReducer extends Reducer<FlowBean, NullWritable, FlowBean, NullWritable> {
    @Override
    protected void reduce(FlowBean key, Iterable<NullWritable> values, Reducer<FlowBean, NullWritable, FlowBean, NullWritable>.Context context) throws IOException, InterruptedException {
        context.write(key,NullWritable.get());
    }
}

(4)编写driver

package bigdata.b13.groupSort;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class WritableComparableDriver {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        args = new String[]{"D:\\test\\GroupingComparator.txt","D:\\test\\groupingComparator"};
        //获取job信息
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        job.setJarByClass(WritableComparableDriver.class);

        //关联map和reduce
        job.setMapperClass(WritableComparableMapper.class);
        job.setReducerClass(WritableComparableReducer.class);
        //设置最终输出类型
        job.setMapOutputKeyClass(FlowBean.class);
        job.setMapOutputValueClass(NullWritable.class);
        job.setOutputKeyClass(FlowBean.class);
        job.setOutputValueClass(NullWritable.class);
        //输入输出路径
        FileInputFormat.setInputPaths(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));
        //提交任务
        job.waitForCompletion(true);
    }
}

(5)输出结果

  • 10
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值