第九章 MapReduce典型编程案例进阶之ReduceJoin

-晚雪

已于 2022-12-12 12:39:16 修改

阅读量500

点赞数

分类专栏： Hadoop具体案例文章标签： mapreduce 大数据

于 2022-12-03 00:23:59 首次发布

本文链接：https://blog.csdn.net/weixin_51524477/article/details/128155318

版权

Hadoop具体案例专栏收录该内容

8 篇文章 0 订阅

订阅专栏

一、前提

（1）数据样例

① 文件 order_items.txt 【大文件】

1001,20200319,p0001,3
1002,20200319,p0002,1
1003,20200319,p0002,3
1004,20200319,p0001,2
1005,20200319,p0003,2
1006,20200319,p0002,4
1007,20200319,p0004,2
1008,20200319,p0003,2
1009,20200319,p0004,1

② 文件 mobile.txt [大文件]

p0001,华为P30,3388
p0002, iPhoneX,6148
p0003,OPPOR17,3499
p0004,三星GalaxyS10+,6399
p0005,vivoX30,3298

（2）字段释义

①

字段中文释义	订单号	时间	商品ID	购买数量
字段英文释义	orderID	time	goodID	number

②

字段中文释义	商品ID	手机型号	价格
字段英文释义	goodID	mobile	price

（3）项目需求

要求用ReduceJoin来进行多表连接，查询在2020年3月19日该大卖场手机的售卖情况

这里我们假设mobile.txt文件和order_items.txt文件记录数都比较多

要求的输出格式如下：

goodID orderID number mobile

二、具体代码

1.map端的程序编写

创建两个mapper类，分别处理表order_items和表mobile的记录

（1）处理order_items的数据

读取此订单表的数据，格式：1001，20200319，p0001,3

把所有记录标记成<key,value>的形式：① key是连接键goodID ② value的值是“O_”+orderID+"\t"+number

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class OrderItemsMapper extends Mapper<LongWritable,Text,Text,Text>{
    //定义标识常量
    public static final String LABEL="O_";

    @Override
    protected void map(LongWritable key,Text value,Mapper<LongWritable,Text,Text,Text>.Context context)throws IOException,InterruptedException{
        //获取一行文本的内容，并将其转换为String类型，之后按照分隔符“，”进行切分
        String[] splits =value.toString().split(",");
        String orderID = splits[0];  //订单号
        String goodID = splits[2];    //商品ID
        String number = splits[3];    //购买数量
        
        //标识 + 其他所需字段作为Map的Value
        String orderValue = LABEL + orderID +"\t" +number;
        //连接键作为Map的key,标识 + 其他所需字段作为Map的Value
        context.write(new Text(goodID),new Text(orderValue));
        
    }

}

（2）处理mobile的数据

读取该订单表的数据，格式：p0001,华为P30,3388

把所有记录标记成<key,value>的形式：① key是连接键goodID ② value的值“M_”＋mobile(手机型号）

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class MobileMapper extends Mapper<LongWritable,Text,Text,Text> {

    //定义标识常量
    public static final String LABEL = "M_";

    @Override
    protected void map(LongWritable key,Text value,Mapper<LongWritable,Text,Text,Text>.Context context)throws IOException,InterruptedException{

        //获取一行文本的内容，将其转换为String类型，之后按照分隔符“，”进行切分
        String[] splits = value.toString().split(",");
        String goodID =splits[0];
        String mobile = splits[1];
        
        //标识 + mobile作为Map的value
        String mobileValue = LABEL + mobile;
        
        //连接键作为Map的key,标识+mobile作为Map的Value
        context.write(new Text(goodID),new Text(mobileValue));
    }
}

2.reduce端的程序编写

① 将每个key下的values列表拆分为分别来自表order_items和表mobile的两部分，分别放入两个列表中

② 然后遍历两个列表做笛卡尔积，形成一条条最终的结果

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class ReduceJoinReducer extends Reducer<Text,Text,Text,Text>{

    @Override
    protected void reduce(Text key, Iterable<Text>values, Reducer<Text,Text,Text,Text>.Context context)throws IOException,InterruptedException{

        //创建一个临时的list存储（订单号，购买数量）
        List<String> orderList = new ArrayList<String>();
        //创建一个临时的list存储（手机型号）
        List<String>mobileList = new ArrayList<String>();

        for(Text value:values){
            //获取key对应的values值，将Text类型转换为String类型
            String strValue = value.toString();

            //判断字符串是以“O_"开头还是”M_"开头
            if(strValue.startsWith(OrderItemsMapper.LABEL)){
                //数据格式：（订单号，购买数量），截取字符串，将“O_"去掉
                String order_items=strValue.substring(2);
                //向列表OrderList的末尾插入新数据（订单号，购买数量）
                orderList.add(order_items);

            }else if(strValue.startsWith(MobileMapper.LABEL)){
                //数据格式：（手机型号），截取字符串，将”M_"去掉
                String mobile = strValue.substring(2);
                //向列表mobileList的末尾插入新元素（手机型号）
                mobileList.add(mobile);
            }
        }

        //遍历列表，完成笛卡尔积
        for(int i = 0;i<orderList.size();i++){
            for(int j = 0;j<mobileList.size();j++){
                //Reduce端的输出value
                String resultValue = orderList.get(i)+"\t"+mobileList.get(j);
                //最后的输出为（商品ID 订单号 购买数量 手机型号）
                context.write(key,new Text(resultValue));
            }
        }
    }
}

3.driver端的程序编写

使用MultipleInputs.addInputPath指定多个输入路径，每个路径都可以指定相应的mapper类

具体代码：

//指定Mapper类和数据的输入路径

MultipleInputs.addInputPath(job,new Path("/mapreduce/9.3/ReduceJoin/input/order_items.txt"),TextInputFormat.class,OrderItemsMapper.class);

MultipleInputs.addInputPath(job,new Path("/mapreduce/9.3/ReduceJoin/input/mobile.txt"),TextInputFormat.class,MobileMapper.class);

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class ReduceJoinDemo {

    public static void main(String[] args)throws IOException,InterruptedException,ClassNotFoundException{
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://192.168.230.13:9000");
        Job job =Job.getInstance(conf);
        job.setJarByClass(ReduceJoinDemo.class);
        
        //指定mapper类和数据的输入路径
        MultipleInputs.addInputPath(job,new Path("/mapreduce/9.3/ReduceJoin/input/order_items.txt"),TextInputFormat.class,OrderItemsMapper.class);
        MultipleInputs.addInputPath(job,new Path("/mapreduce/9.3/ReduceJoin/input/mobile.txt"),TextInputFormat.class,MobileMapper.class);
       
       
        job.setReducerClass(ReduceJoinReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        Path outPath=new Path("/mapreduce/9.3/ReduceJoin/output");
        FileSystem fs =FileSystem.get(conf);
        if(fs.exists(outPath)){
            fs.delete(outPath,true);
        }

        FileOutputFormat.setOutputPath(job,outPath);
        boolean waitForCompletion = job.waitForCompletion(true);
        System.exit(waitForCompletion ? 0:1);
    }
}