MapReduce并行创建反向索引

最新推荐文章于 2024-09-24 11:26:18 发布

xugen12

最新推荐文章于 2024-09-24 11:26:18 发布

阅读量943

点赞数

分类专栏： Hadoop开发文章标签： mapreduce 索引排序

Hadoop开发专栏收录该内容

24 篇文章 0 订阅

订阅专栏

使用Mapreduce可以并行的创建反向索引。假如你输入的是文本文件，输出是元组列表，每个元组由一个数据和包含该数据的文件列表组成。常规处理办法需要将这些数据连接在一起，而且是在内存中执行连接操作。但是有大量数据执行操作的话，将可能消耗掉内存，也可以使用数据库中介存储工具，但是这样会降低运行效率。

更好的方法是标记每行，并生成每行只包含一个数据的中间文件，然后对这些中间文件进行排序，最后打出所有被排序的中间文件，并对每个单独的数据调用一个函数。Mapreduce采用的就是这个方法，其代码如下：

public static class Map extends Mapper<LongWritable, Text, Text, Text>{

private Text documentID;

private Text word = new Text();

@Override

protected void setup(Context context){

String filename = ((FileSplit) context,getInputSplit()).getPath().getName();

documentID = new Text(filename);

}

@Override

protected void map(LongWritable key, Text value,Context context)

throws IOException, InterruptedException{

for(String token:StringUtils.split(value.toString())){

word.set(token);

context.write(word, documentID);

}}}

public static class Reduce extends Reducer<Text, Text, Text, Text>{
private Text docIds = new Text();

public void reduce(Text key, Iterable<Text> values, Context context) throws IOException , InterruptedException{

HashSet<Text> uniqueDocIds = new HashSet<Text>();

for(Text docId : values){

uniqueDocIds.add(new Text(docId));

}

docIds.set(new Text(StringUtils.join(uniqueDocIds, ",")));

context.write(key,docIds);

}}

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。