(1)需求
将图一 中数量(1、2、6等)取出,将图二编号和名称(如:小米、华为等)取出。融合成为图三格式:序号+名称+数量格式。
![](https://i-blog.csdnimg.cn/blog_migrate/36370e855d21b9a7981c469f1945ff07.png)
![图2](https://i-blog.csdnimg.cn/blog_migrate/c5d40d7af2813a7d2aeeacbc6bb0ec01.png)
![](https://i-blog.csdnimg.cn/blog_migrate/28fca05bae707bc7005a03d536f0cd49.png)
(2)实现方法
采用MapReduce方式实现,其中将表的合并阶段放到Map阶段,减少Reduce端的压力。将Reduce任务数量设置为0,将图二存储到磁盘;
(3)对于Map
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URI;
import java.util.HashMap;
public class MapJoinMapper extends Mapper <LongWritable, Text, Text, NullWritable>{
HashMap<String, String> pdMap = new HashMap<>();
private Text outK = new Text();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//处理order.txt
String line = value.toString();
String[] fields = line.split("\t");
//获取pid
String pname = pdMap.get(fields[1]);
//获取订单id和数量
//封装
outK.set(fields[0]+"\t"+pname+"\t"+fields[2]);
context.write(outK, NullWritable.get());
}
@Override
protected void setup(Context context) throws IOException, InterruptedException {
//获取缓存文件
URI[] cacheFiles = context.getCacheFiles();
FileSystem fs = FileSystem.get(context.getConfiguration());
FSDataInputStream fis = fs.open(new Path(cacheFiles[0]));
//从流中读取数据
BufferedReader reader = new BufferedReader(new InputStreamReader(fis, "UTF-8"));
String line;
while(StringUtils.isNotEmpty(line = reader.readLine())){
//切割
String[] fields = line.split("\t");
pdMap.put(fields[0],fields[1]);
}
//关流
IOUtils.closeStream(reader);
}
}
(4)对于driver
其余操作与正常driver程序一样,多出如下代码。
//加载缓存数据
job.addCacheFile(new URI("file:///D:/hadoop/input/table/pd.txt"));
//关闭reduce阶段,task设置为0
job.setNumReduceTasks(0);
参考尚硅谷Hadoop视频!原视频连接:尚硅谷【官网】谷粉与老学员力挺的Java培训|大数据培训|前端培训|UI设计培训