1.需求
现在有orders与products两张表,路径分别为H:/大数据/mapreduce/mapjoin/input/ H:/大数据/mapreduce/mapjoin/ 其数据内容分别是
orders
id
pid
mount
1001 pd001 300
1001 pd002 20
1002 pd003 40
1003 pd002 50
products
id
name
pd001,apple
pd002,banana
pd003,orange
2.思路
这里采用map段的join方法,通过将商品表中的信息缓存到task工作节点的工作目录当中(由job.addCatchFile方法实现),我们可以才读入orders文件中每行order信息时就拿到对应商品id的商品名称,从而输出其联合字符串。
3.代码
public class MJoin {
static class MJoinMapper extends Mapper<LongWritable, Text, Text, NullWritable>{
Map<String,String> pdInfoMap = new HashMap<String,String>();
@Override
protected void setup(Context context)throws IOException, InterruptedException {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("pdts.txt")));
String line;
while(StringUtils.isNotEmpty(line = br.readLine())){
String fields[] = line.split(",");
pdInfoMap.put(fields[0], fields[1]);
}
}
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String orderLine = value.toString();
String fields[] = orderLine.split("\t");
String pdName = pdInfoMap.get(fields[1]);
Text k = new Text(orderLine+"\t"+pdName);
context.write(k, NullWritable.get());
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(MJoin.class);
job.setMapperClass(MJoinMapper.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
FileInputFormat.setInputPaths(job, new Path("H:/大数据/mapreduce/mapjoin/input"));
FileOutputFormat.setOutputPath(job, new Path("H:/大数据/mapreduce/mapjoin/output"));
job.addCacheFile(new URI("file:/H:/大数据/mapreduce/mapjoin/pdts.txt"));
job.setNumReduceTasks(0);
boolean res = job.waitForCompletion(true);
System.exit(res ? 0 : 1);
}
}
4.输出
1001 pd001 300 apple
1001 pd002 20 banana
1002 pd003 40 orange
1003 pd002 50 banana