描述:
使用map reduce来计算单词频率
https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Example%3A+WordCount+v1.0
样例:
chunk1: "Google Bye GoodBye Hadoop code"
chunk2: "lintcode code Bye"
Get MapReduce result:
Bye: 2
GoodBye: 1
Google: 1
Hadoop: 1
code: 2
lintcode: 1
分析:
程序很简单,主要是对map reduce的理解,map对数据进行处理,建立键值对,reduce对map传输过来的数据进行处理,并返回结果
/**
* Definition of OutputCollector:
* class OutputCollector<K, V> {
* public void collect(K key, V value);
* // Adds a key/value pair to the output buffer
* }
*/
public class WordCount {
public static class Map {
public void map(String key, String value, OutputCollector<String, Integer> output) {
// Write your code here
// Output the results into output buffer.
// Ps. output.collect(String key, int value);
String[] result = value.split(" ");
for(int i = 0;i<result.length;i++){
output.collect(result[i] , 1);
}
}
}
public static class Reduce {
public void reduce(String key, Iterator<Integer> values,
OutputCollector<String, Integer> output) {
// Write your code here
// Output the results into output buffer.
// Ps. output.collect(String key, int value);
int count = 0;
while(values.hasNext()){
count += values.next();
}
output.collect(key , count);
}
}
}