一、前言
GroupingComparator
是mapreduce当中reduce端的一个功能组件,主要的作用是决定哪些数据作为一组,调用一次reduce的逻辑;默认是相同的key,作为同一组,每个组调用一次reduce逻辑;我们可以自定义GroupingComparator实现不同的key作为同一个组,调用一次reduce逻辑。
分组排序属于mr中第六步,自定义一个分组类,细节如下:
- 自定义类继承
WritableComparator
- 重写
compare()
方法,定义分组逻辑
@Override
public int compare(WritableComparable a, WritableComparable b){
// 比较的业务逻辑
return result;
}
- 创建一个构造将比较对象的类传给父类
protected OrderGroupingComparator() {
super(OrderBean.class, true);
}
二、案例
2.1 需求:
现在有订单数据如下:
订单id | 商品id | 成交金额 |
---|---|---|
Order_0000001 | Pdt_01 | 222.8 |
Order_0000001 | Pdt_05 | 25.8 |
Order_0000002 | Pdt_03 | 522.8 |
Order_0000002 | Pdt_04 | 122.4 |
Order_0000002 | Pdt_05 | 722.4 |
Order_0000003 | Pdt_01 | 222.8 |
需要求取每个订单当中金额Top 2的商品。
2.2 代码
1、自定义OrderBean对象
public class OrderBean implements WritableComparable<OrderBean> {
private String orderId;
private Double price;
@Override
public int compareTo(OrderBean o) {
int ret = this.orderId.compareTo(o.orderId);
if (ret == 0){
int ret1 = this.price.compareTo(o.price);
return -ret1;
}
return ret;
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(orderId);
out.writeDouble(price);
}
@Override
public void readFields(DataInput in) throws IOException {
this.orderId = in.readUTF();
this.price = in.readDouble();
}
public String getOrderId() {
return orderId;
}
public void setOrderId(String orderId) {
this.orderId = orderId;
}
public Double getPrice() {
return price;
}
public void setPrice(Double price) {
this.price = price;
}
@Override
public String toString() {
return "OrderBean{" +
"orderId='" + orderId + '\'' +
", price=" + price +
'}';
}
}
2、自定义mapper类
public class GroupMapper extends Mapper<LongWritable, Text,OrderBean, NullWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] fields = value.toString().split("\t");
OrderBean orderBean = new OrderBean();
orderBean.setOrderId(fields[0]);
orderBean.setPrice(Double.valueOf(fields[2]));
context.write(orderBean,NullWritable.get());
}
}
3、自定义分区类
public class GroupPartition extends Partitioner<OrderBean, NullWritable> {
int mypartition = 0;
@Override
public int getPartition(OrderBean orderBean, NullWritable nullWritable, int numReduceTasks) {
mypartition = (orderBean.getOrderId().hashCode() & Integer.MAX_VALUE) % numReduceTasks;
return mypartition;
}
}
4、自定义分组类
public class MyGroup extends WritableComparator {
public MyGroup() {
super(OrderBean.class,true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
OrderBean first = (OrderBean) a;
OrderBean second = (OrderBean) b;
//以orderId作为比较条件,判断哪些orderid相同作为同一组
return first.getOrderId().compareTo(second.getOrderId()); //0为一组,1为不同一组
}
}
5、自定义reduce类
//求每个组当中的top2的订单金额数据
public class GroupReducer extends Reducer<OrderBean, NullWritable,OrderBean,NullWritable> {
@Override
protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
int i = 0;
for (NullWritable value : values){
System.out.println("value是: " + value);
if (i < 2){
context.write(key,NullWritable.get());
i++;
}
}
}
}
6、定义程序入口类
public class GroupMain extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
//获取job对象
Job job = Job.getInstance(super.getConf(), "group");
//第一步:读取文件,解析成为key,value对
job.setInputFormatClass(TextInputFormat.class);
TextInputFormat.addInputPath(job, new Path("file:///E:\\大数据\\数据"));
//第二步:自定义map逻辑
job.setMapperClass(GroupMapper.class);
job.setMapOutputKeyClass(OrderBean.class);
job.setMapOutputValueClass(NullWritable.class);
//第三步:分区
job.setPartitionerClass(GroupPartition.class);
//第四步:排序 已经做了
//第五步:规约 combiner 省掉
//第六步:分组 自定义分组逻辑
job.setGroupingComparatorClass(MyGroup.class);
//第七步:设置reduce逻辑
job.setReducerClass(GroupReducer.class);
job.setOutputKeyClass(OrderBean.class);
job.setOutputValueClass(NullWritable.class);
//第八步:设置输出路径
job.setOutputFormatClass(TextOutputFormat.class);
TextOutputFormat.setOutputPath(job, new Path("file:///E:\\大数据\\数据\\out_top2"));
boolean b = job.waitForCompletion(true);
return b ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int run = ToolRunner.run(new Configuration(), new GroupMain(), args);
System.exit(run);
}
}