map端join算法实现

最新推荐文章于 2024-05-18 15:40:42 发布

日拱一卒的Alex

最新推荐文章于 2024-05-18 15:40:42 发布

阅读量589

点赞数 1

分类专栏： MapReduce 大数据文章标签： Hadoop mapreduce 编程 JAVA

本文链接：https://blog.csdn.net/u012808902/article/details/77483101

版权

大数据同时被 2 个专栏收录

13 篇文章 1 订阅

订阅专栏

MapReduce

8 篇文章 0 订阅

订阅专栏

1.需求

现在有orders与products两张表，路径分别为H:/大数据/mapreduce/mapjoin/input/ H:/大数据/mapreduce/mapjoin/ 其数据内容分别是

orders

id pid mount

1001	pd001	300
1001	pd002	20
1002	pd003	40
1003	pd002	50

products

id name

pd001,apple
pd002,banana
pd003,orange

现在要求将每条订单信息联合订单中的商品名称一起输出到 H:/大数据/mapreduce/mapjoin/output 目录下的文件中

2.思路

这里采用map段的join方法，通过将商品表中的信息缓存到task工作节点的工作目录当中（由job.addCatchFile方法实现），我们可以才读入orders文件中每行order信息时就拿到对应商品id的商品名称，从而输出其联合字符串。

3.代码

public class MJoin {
	
	static class MJoinMapper extends Mapper<LongWritable, Text, Text, NullWritable>{
		Map<String,String> pdInfoMap = new HashMap<String,String>();
		@Override
		protected void setup(Context context)throws IOException, InterruptedException {
			BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("pdts.txt")));
			String line;
			while(StringUtils.isNotEmpty(line = br.readLine())){
				String fields[] = line.split(",");
				pdInfoMap.put(fields[0], fields[1]);
			}
		}
		
		@Override
		protected void map(LongWritable key, Text value,Context context)
				throws IOException, InterruptedException {
			String orderLine = value.toString();
			String fields[] = orderLine.split("\t");
			String pdName = pdInfoMap.get(fields[1]);
			Text k = new Text(orderLine+"\t"+pdName);
			context.write(k, NullWritable.get());
		}
	}
	
	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		
		Job job = Job.getInstance(conf);
		job.setJarByClass(MJoin.class);
		
		job.setMapperClass(MJoinMapper.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(NullWritable.class);
		
		FileInputFormat.setInputPaths(job, new Path("H:/大数据/mapreduce/mapjoin/input"));
		FileOutputFormat.setOutputPath(job, new Path("H:/大数据/mapreduce/mapjoin/output"));

		job.addCacheFile(new URI("file:/H:/大数据/mapreduce/mapjoin/pdts.txt"));
		job.setNumReduceTasks(0);
		boolean res = job.waitForCompletion(true);
		System.exit(res ? 0 : 1);
	}

}

4.输出

1001	pd001	300	apple
1001	pd002	20	banana
1002	pd003	40	orange
1003	pd002	50	banana

日拱一卒的Alex

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
map端join算法实现

map端的表连接实现
复制链接

扫一扫

专栏目录