GroupingComparator分组排序
》》什么是分组排序?
对Reduce阶段的数据根据某一个或几个字段进行分组。
》》分组排序的步骤是?
(1)自定义类继承WritableComparator
(2)重写compare()方法
@Override
public int compare(WritableComparable a, WritableComparable b) {
// 比较的业务逻辑
return result;
}
(3)创建一个构造将比较对象的类传给父类
protected OrderGroupingComparator() {
super(OrderBean.class, true);
}
需求分析
现在需要求出每一个订单中最贵的商品。
(1)输入数据
0000001 Pdt_01 222.8
0000002 Pdt_05 722.4
0000001 Pdt_02 33.8
0000003 Pdt_06 232.8
0000003 Pdt_02 33.8
0000002 Pdt_03 522.8
0000002 Pdt_04 122.4
(2)期望输出数据
1 222.8
2 722.4
3 232.8
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zvXo3WKV-1618185850834)(index_files/82fe1dc7-7c88-43b6-9425-55253111348a.png)]
(3)MapTask
》1:获取一行
》2:切割出每个字段
》3:一行封装成bean对象
bean(0000001,222.8 ) nullwritable
bean(0000002,722.4 ) nullwritable
bean(0000001,33.8 ) nullwritable
bean(0000003,232.8 ) nullwritable
bean(0000003,33.8 ) nullwritable
bean(0000002,522.8 ) nullwritable
bean(0000002,122.4 ) nullwritable
先根据订单id排序,如果订单id相同再根据
价格降序排序
0000001 222.8
0000001 33.8
0000002 722.4
0000002 522.8
0000002 122.4
0000003 232.8
0000003 33.8
(4)MapTask
对Map端拉取过来的数据再次进行排序
只要订单id相同就认为是相同的key
Reduce方法只把一组的key的第一个写出来
第1次调用reduce
0000001 222.8
第2次调用reduce
0000002 722.4
第3次调用reduce
0000003 232.8
(1)利用"订单id和成交金额"作为key,可以将Map阶段读取到的所有订单数据按照id升序排序,
如果id相同再按照金额降序排序,发送到Reduce。
(2)在Reduce端利用groupingComparator将订单id相同的kv聚合成组,
然后取第一个即是该订单中最贵商品
代码实现
(1)定义订单信息OrderBean类
package com.dev1.order;
public class OrderBean implements WritableComparable<OrderBean> {
private int order_id; // 订单id号
private double price; // 价格
public OrderBean() {
super();
}
public OrderBean(int order_id, double price) {
super();
this.order_id = order_id;
this.price = price;
}
@Override
public void write(DataOutput out) throws IOException {
out.writeInt(order_id);
out.writeDouble(price);
}
@Override
public void readFields(DataInput in) throws IOException {
order_id = in.readInt();
price = in.readDouble();
}
@Override
public String toString() {
return order_id + "\t" + price;
}
public int getOrder_id() {
return order_id;
}
public void setOrder_id(int order_id) {
this.order_id = order_id;
}
public double getPrice() {
return price;
}
public void setPrice(double price) {
this.price = price;
}
// 二次排序
@Override
public int compareTo(OrderBean o) {
int result;
if (order_id > o.getOrder_id()) {
result = 1;
} else if (order_id < o.getOrder_id()) {
result = -1;
} else {
// 价格倒序排序
result = price > o.getPrice() ? -1 : 1;
}
return result;
}
}
(2)编写OrderSortMapper类
package com.dev1.order;
public class OrderSortMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable> {
OrderBean k = new OrderBean();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// 1 获取一行
String line = value.toString();
// 2 截取
String[] fields = line.split("\t");
// 3 封装对象
k.setOrder_id(Integer.parseInt(fields[0]));
k.setPrice(Double.parseDouble(fields[2]));
// 4 写出
context.write(k, NullWritable.get());
}
}
(3)编写OrderSortGroupingComparator类
package com.dev1.order;
public class OrderSortGroupingComparator extends WritableComparator {
protected OrderGroupingComparator() {
super(OrderBean.class, true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
OrderBean aBean = (OrderBean) a;
OrderBean bBean = (OrderBean) b;
int result;
if (aBean.getOrder_id() > bBean.getOrder_id()) {
result = 1;
} else if (aBean.getOrder_id() < bBean.getOrder_id()) {
result = -1;
} else {
result = 0;
}
return result;
}
}
(4)编写OrderSortReducer类
package com.dev1.order;
public class OrderSortReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable> {
@Override
protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
context.write(key, NullWritable.get());
}
}
(5)编写OrderSortDriver类
package com.dev1.order;
public class OrderSortDriver {
public static void main(String[] args) throws Exception, IOException {
// 输入输出路径需要根据自己电脑上实际的输入输出路径设置
// 1 获取配置信息
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
// 2 设置jar包加载路径
job.setJarByClass(OrderSortDriver.class);
// 3 加载map/reduce类
job.setMapperClass(OrderSortMapper.class);
job.setReducerClass(OrderSortReducer.class);
// 4 设置map输出数据key和value类型
job.setMapOutputKeyClass(OrderBean.class);
job.setMapOutputValueClass(NullWritable.class);
// 5 设置最终输出数据的key和value类型
job.setOutputKeyClass(OrderBean.class);
job.setOutputValueClass(NullWritable.class);
// 6 设置输入数据和输出数据路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 8 设置reduce端的分组
job.setGroupingComparatorClass(OrderSortGroupingComparator.class);
// 7 提交
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}