canopy java,netflixcanopyamalagator.java 源代码在线查看 - 一个简单的mapreduce实现 资源下载 虫虫电子下载站...

Author - Jack Hebert (jhebert@cs.washington.edu)// Copyright 2007// Distributed under GPLv3//import java.io.IOException;import java.util.ArrayList;import java.util.Iterator;import org.apache.hadoop.io.*;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reducer;import org.apache.hadoop.mapred.Reporter;public class NetflixCanopyAmalagator extends MapReduceBase implements Reducer {// So we will have one single reducer and it maintains a list of the final canopy centers.// Each center emitted from the mappers will only be added if none of the current centers// covers it (this same canopy selection algorithm as before, and in the assigned reading).// NOTE: This is a mostly working algorithm and definitely worked for me in practice, though no// proof is attached. There might technically be a data point that is not selected as a canopy center// but is not within the near distance of any canopy center due to the 2-level map-reduce selection.// However, I argue that this still works as a k-means canopy selector and is not a fundamental flaw.// To fix this you would just need to limit yourself to a single mapper, or get tricker about also// emitting points belonging to a canopy to the reducer.private int count = 0;private ArrayList canopyCenters;public void configure(JobConf conf) {this.canopyCenters = new ArrayList();}// If we have multiple mappers then we might have to be tricky about// selecting disjoint canopy centers.public void reduce(WritableComparable key, Iterator values,OutputCollector output, Reporter reporter) throws IOException {this.count += 1;//String canopy_id = ((Text) key).toString();while (values.hasNext()) {String data = ((Text) values.next()).toString();int index = data.indexOf(":");String movie_id = data.substring(0, index);data = data.substring(index+1);NetflixMovie curr = new NetflixMovie(movie_id, data);boolean too_close = false;for (NetflixMovie nm : this.canopyCenters) {int matchCount = nm.MatchCount(curr);if(matchCount > 10) {too_close = true;break;}}if (!too_close) {Text to_emit = new Text(data);output.collect(new Text(curr.movie_id), to_emit);this.canopyCenters.add(curr);String toShow = this.canopyCenters.size() + ":" + this.count;reporter.setStatus(toShow);}}}}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值