Use MapReduce framework to build a key-value index for Google Suggestion where the key is the prefix of a query and the value is the top 10 searched queries.
You don't need go through all queries and calculate the number of searches, assume you are given a list of queries and their number of searches, which is the output of another map reduce problem - Word Count.
The key of the map function is the document id which you can ignore it. The value of the map function is a document instance which contains two member variables, word and count. e.g. "hello 100", that means the query "hello" has been searched 10 times. The output the the map function depending on your algorithm, we won't check it so you can output anything you want as key-value pairs.
The key, value of the reduce function is depending on what you output in the map function. The output of the reduce function is key-value pairs where the key is the prefix, the value is top 10 queries and their counts. Use the Document class to wrap them.
Example
Example1
Input:
[("apple",100), ("app",1200), ("app store",1200)]
Output:
"a": [("app", 1200), ("app store", 1200), ("apple", 100)]
"ap": [("app", 1200), ("app store", 1200), ("apple", 100)]
"app": [("app", 1200), ("app store", 1200), ("apple", 100)]
"app ": [("app store", 1200)]
"app s": [("app store", 1200)]
"app st": [("app store", 1200)]
"app sto": [("app store", 1200)]
"app stor": [("app store", 1200)]
"app store": [("app store", 1200)]
"appl": [("apple", 100)]
"apple": [("apple", 100)]
思路:就是minheap取前十大,key word count就可以了;
/**
* Definition of OutputCollector:
* class OutputCollector<K, V> {
* public void collect(K key, V value);
* // Adds a key/value pair to the output buffer
* }
* Definition of Document:
* class Document {
* public int count;
* public String content;
* }
*
*class Pair {
* private String content;
* private int count;
*
* Pair(String key, int value) {
* this.key = key;
* this.value = value;
* }
* public String getContent(){
* return this.content;
* }
* public int getCount(){
* return this.count;
* }
*
*}
*/
public class GoogleSuggestion {
public static class Map {
public void map(Document value,
OutputCollector<String, Pair> output) {
// Write your code here
// Output the results into output buffer.
// Ps. output.collect(String key, Pair value);
String str = value.content;
for(int i = 0; i < str.length(); i++) {
String substr = str.substring(0,i+1);
output.collect(substr, new Pair(value.content, value.count));
}
}
}
public static class Reduce {
private class PairComparator implements Comparator<Pair> {
@Override
public int compare(Pair a, Pair b) {
if(a.getCount() != b.getCount()){
return a.getCount() - b.getCount();
} else {
return b.getContent().compareTo(a.getContent());
}
}
}
public void setup() {}
public void reduce(String key, Iterator<Pair> values, OutputCollector<String, Pair> output) {
// Write your code here
// Output the results into output buffer.
// Ps. output.collect(String key, Pair value);
PriorityQueue<Pair> pq = new PriorityQueue(new PairComparator());
while(values.hasNext()){
Pair cur = values.next();
if(pq.size() < 10){
pq.offer(cur);
} else {
Pair peek = pq.peek();
PairComparator pairCmp = new PairComparator();
if(pairCmp.compare(cur, peek) > 0) {
pq.poll();
pq.offer(cur);
}
}
}
List<Pair> list = new ArrayList<Pair>();
while(!pq.isEmpty()){
list.add(0, pq.poll());
}
for(int i = 0; i < list.size(); i++){
Pair pair = list.get(i);
output.collect(key, pair);
}
}
}
}