在mapreduce程序中有可能遇到文件一种某个字段对应另外一个文件中的某个字段我们又想把这两个文件中的数据在一起展示
例如文件一:
第一列代表订单号、第二列代表商品ID、第三列代表商品卖出数量
文件二:
第一列代表商品ID、第二列代表商品名、第三列代表商品单价
情景:求出每个订单商品总价(商品单价*数量的和)
解决:将两个文件进行join操作
准备文件:
把订单文件和商品文件存放于两个文件夹内
直接上代码:
定义一个数据类字段包括:订单编号、商品ID、商品数量、商品名称、商品单价
public class TableBean implements Writable {
/**
* 订单id
*/
private String orderId;
/**
* 商品ID
*/
private String shopId;
/**
* 商品数量
*/
private Integer shopNum;
/**
* 商品名称
*/
private String shopName;
/**
* 商品价格
*/
private BigDecimal shopPrice;
//getting setting 省略
@Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeUTF(orderId);
dataOutput.writeUTF(shopId);
dataOutput.writeInt(shopNum);
dataOutput.writeUTF(shopName);
dataOutput.writeUTF(shopPrice.toString());
}
@Override
public void readFields(DataInput dataInput) throws IOException {
orderId = dataInput.readUTF();
shopId = dataInput.readUTF();
shopNum = dataInput.readInt();
shopName = dataInput.readUTF();
shopPrice = new BigDecimal(dataInput.readUTF());
}
}
定义mapper类
注:需要将商品文件预先加载到缓存中在Mapper类中重写setup方法并在启动类中定义要加载缓存文件的位置
public class OrderShopMapper extends Mapper<LongWritable,Text,Text,TableBean> {
HashMap<String,String> pdMap = new HashMap<String, String>();
@Override
protected void setup(Context context) throws IOException, InterruptedException {
//1.加载缓存文件
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("shopDetail.txt"), "UTF-8"));
String line;
//2.判断缓存文件不为空
while(StringUtils.isNotEmpty(line = br.readLine())){
//切割数据 01 商品A 4
String[] fields = line.split(" ");
Map<String,Object> map = new HashMap<>(2);
map.put("name",fields[1]);
map.put("price",fields[2]);
//缓冲 到 集合 ; 商品ID : {"name":"商品A","price":4}
pdMap.put(fields[0], JSONObject.toJSONString(map));
}
br.close();
}
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
TableBean tb = new TableBean();
//1.获取数据
String line = value.toString();
//2.切分数据
String[] fields = line.split(" ");
//3.获取商品pid,商品名称,商品价格
String shopId = fields[1];
String shopJson = pdMap.get(shopId);
JSONObject detail = JSONObject.parseObject(shopJson);
String name = detail.getString("name");
String price = detail.getString("price");
//4.包装输出类
tb.setOrderId(fields[0]);
tb.setShopId(shopId);
tb.setShopNum(Integer.parseInt(fields[2]));
tb.setShopName(name);
tb.setShopPrice(new BigDecimal(price));
//5.输出
context.write(new Text(fields[0]),tb);
}
}
定义reduce类
public class OrderShopReducer extends Reducer<Text,TableBean,Text,Text> {
@Override
protected void reduce(Text key, Iterable<TableBean> values, Context context) throws IOException, InterruptedException {
BigDecimal total = new BigDecimal("0");
for(TableBean tb:values){
total = total.add(tb.getShopPrice().multiply(new BigDecimal(tb.getShopNum())));
}
context.write(key, new Text(total.toString()));
}
}
定义启动类
注:这里面需要定义加载缓存文件的路径
public class OrderShopRunner {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
//设置整个job所用的那些类在哪个jar包
job.setJarByClass(OrderShopRunner.class);
//本job使用的mapper和reducer的类
job.setMapperClass(OrderShopMapper.class);
job.setReducerClass(OrderShopReducer.class);
//指定reduce的输出数据kv类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
//指定mapper的输出数据kv类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(TableBean.class);
//设置数据输入的路径
FileInputFormat.setInputPaths(job,new Path("d://shop//order"));
//设置数据输出的路径
FileOutputFormat.setOutputPath(job,new Path("d://shop//out"));
//加载缓存数据
job.addCacheFile(new URI("file:///d:/shop/detail/shopDetail.txt"));
//提交任务
boolean rs = job.waitForCompletion(true);
System.exit(rs?0:1);
}
}
输出结果:
检验过后结果正确
文章到此结束代码已上传至github中
本人(个人号并非工作号)微信号、QQ号:806751350
github地址:https://github.com/linminlm