Hadoop学习之莎士比亚文档倒排索引

最新推荐文章于 2018-06-13 23:33:12 发布

linluyisb

最新推荐文章于 2018-06-13 23:33:12 发布

阅读量1.5k

点赞数

分类专栏： Hadoop

本文链接：https://blog.csdn.net/buring_/article/details/10150105

版权

Hadoop 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一：题目莎士比亚文档倒排索引

二：简单的实现

1）map类这其中定义一下map类的输出格式

	public static class InvertedMapper extends Mapper<Long,Text,Text,Text>{
							//默认的这里不是longWritable的key么，怎么回事,应该要设置把
		
		@Override
		protected void map(Long key, Text value, Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			Text one = new Text("1");
			FileSplit fs = (FileSplit) context.getInputSplit();
			String filename = fs.getPath().getName();
			Text word = new Text();
			StringTokenizer token = new StringTokenizer(value.toString());
			
			while(token.hasMoreTokens()){
				word.set(token.nextToken()+":"+filename);
				context.write(word, one);//格式为<word:file>  one
			}
		}
	}

2）Combiner类

这里比较糊涂，combiner自己有接口，为什么要继承reducer。

	//combine阶段，还是继承reducer
	public static class InvertedCombiner extends Reducer<Text,Text,Text,Text>{

		@Override
		protected void reduce(Text key, Iterable<Text> values,Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			String keys[] = key.toString().split(":");
			int sum = 0;
			for(Text val:values){
				sum+=Integer.parseInt(val.toString());
			}
			context.write(new Text(keys[0]), new Text(keys[1]+":"+String.valueOf(sum)));
			//变为了word:<filename,sum>
		}
		
	}

3）

//定制partitioner,确保相同的term会分到同一个reducer
	public static class InvertedPartioner extends HashPartitioner<Text, Text>{

		@Override
		public int getPartition(Text key, Text value, int numReduceTasks) {
			// TODO Auto-generated method stub
			String term = key.toString().split(":")[0];
			
			return super.getPartition(new Text(term), value, numReduceTasks);
		}
		
	}

4）reduce类

	public static class InvertedReducer extends Reducer<Text,Text,Text,Text>{
		@Override
		protected void reduce(Text key, Iterable<Text> values,Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			Iterator<Text> it = values.iterator();
			StringBuilder sb = new StringBuilder();
			if(it.hasNext())sb.append(it.next().toString());
			while(it.hasNext()){
				sb.append(";");
				sb.append(it.next().toString());
			}
			context.write(key, new Text(sb.toString()));
		}
		
	}

三：感觉需要弄清楚map输出，和reduce输入的格式。

linluyisb

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Hadoop学习之莎士比亚文档倒排索引

一：题目莎士比亚文档倒排索引二：简单的实现 1）map类这其中定义一下map类的输出格式 public static class InvertedMapper extends Mapper{ //默认的这里不是longWritable的key么，怎么回事,应该要设置把 @Override protected void map(Long
复制链接

扫一扫