MapReduce是聚合工具中的明星,count,distinct,group能做的事情,MapReduce都可以完成,它是一个可以轻松并行化到多个服务器的聚合方法.简单的说就是将大批量的工作(数据)分解(MAP)执行,然后再将结果合并成最终结果(REDUCE)。这样做的好处是可以在任务被分解后,可以通过大量机器进行并行计算,减少整个操作的时间。
map和reduce是十分有用的操作,特别是在NOSQL中.本文简单小结下
在mongodb中对mapreduce的操作,以及在JAVA中如何操作.1 启动mongodb
mongo启动即可
2 建立db
use test
3 加点记录
> book1 = {name : "Understanding JAVA", pages : 100}
> book2 = {name : "Understanding JSON", pages : 200}
> db.books.save(book1)
> db.books.save(book2)
继续加
> book = {name : "Understanding XML", pages : 300}
> db.books.save(book)
> book = {name : "Understanding Web Services", pages : 400}
> db.books.save(book)
> book = {name : "Understanding Axis2", pages : 150}
> db.books.save(book)
4 先来做MAP,这里是先归类,按页数去划分分类,如下:
- > var map = function() {
- var category;
- if ( this.pages >= 250 )
- category = 'Big Books';
- else
- category = "Small Books";
- emit(category, {name: this.name});
- };
5 然后再按reduce来统计个数
- > var reduce = function(key, values) {
- var sum = 0;
- values.forEach(function(doc) {
- sum += 1;
- });
- return {books: sum};
- };
6 然后再查看下,结果显示为:
> var count = db.books.mapReduce(map, reduce, {out: "book_results"});
> db[count.result].find()
{ "_id" : "Big Books", "value" : { "books" : 2 } }
{ "_id" : "Small Books", "value" : { "books" : 3 } }
7 换用JAVA去实现之,注意下载mongodb的驱动,代码如下:
- import com.mongodb.BasicDBObject;
- import com.mongodb.DB;
- import com.mongodb.DBCollection;
- import com.mongodb.DBObject;
- import com.mongodb.MapReduceCommand;
- import com.mongodb.MapReduceOutput;
- import com.mongodb.Mongo;
- public class MongoClient {
- /**
- * @param args
- */
- public static void main(String[] args) {
- Mongo mongo;
- try {
- mongo = new Mongo("localhost", 27017);
- DB db = mongo.getDB("library");
- DBCollection books = db.getCollection("books");
- BasicDBObject book = new BasicDBObject();
- book.put("name", "Understanding JAVA");
- book.put("pages", 100);
- books.insert(book);
- book = new BasicDBObject();
- book.put("name", "Understanding JSON");
- book.put("pages", 200);
- books.insert(book);
- book = new BasicDBObject();
- book.put("name", "Understanding XML");
- book.put("pages", 300);
- books.insert(book);
- book = new BasicDBObject();
- book.put("name", "Understanding Web Services");
- book.put("pages", 400);
- books.insert(book);
- book = new BasicDBObject();
- book.put("name", "Understanding Axis2");
- book.put("pages", 150);
- books.insert(book);
- String map = "function() { "+
- "var category; " +
- "if ( this.pages >= 250 ) "+
- "category = 'Big Books'; " +
- "else " +
- "category = 'Small Books'; "+
- "emit(category, {name: this.name});}";
- String reduce = "function(key, values) { " +
- "var sum = 0; " +
- "values.forEach(function(doc) { " +
- "sum += 1; "+
- "}); " +
- "return {books: sum};} ";
- MapReduceCommand cmd = new MapReduceCommand(books, map, reduce,
- null, MapReduceCommand.OutputType.INLINE, null);
- MapReduceOutput out = books.mapReduce(cmd);
- for (DBObject o : out.results()) {
- System.out.println(o.toString());
- }
- } catch (Exception e) {
- // TODO Auto-generated catch block
- e.printStackTrace();
- }
- }
- }
1persons集合里面有如下数据,用MapReduce计算出每个国家的人数
map=function(){
emit( this.country,{count: 1} ) ;
}
reduce = function( key , values ){...
var total = {count: 0}
for ( var i=0; i<values.length; i++ )
total.count += values[i].count;
return { count : total };
};
查询结果:
用java操作MapReduce
public static void main(String[]args){
MongoDb db=new MongoDb("foobar");
String map = "function() { emit(this.country, {count:1});}";
String reduce = "function Reduce(key, values) {var total = {count:0}; for ( var i=0; i<values.length; i++ ) total.count += values[i].count; ;return total;}";
MapReduceOutput mop = db.mapReduce("persons", map, reduce, null,
MapReduceCommand.OutputType.INLINE, null);
Iterable<DBObject> itr = mop.results();
long lCount = 0;
for (DBObject dbObject : itr) {
String _id = dbObject.get("_id").toString();
Double dble = (Double) ((DBObject) dbObject.get("value"))
.get("count");
lCount = dble.longValue();
System.out.println(_id + " " + lCount);
}
}
public MapReduceOutput mapReduce(String collName,String map, String reduce,
String outputTarget, OutputType outputType, DBObject queryObj) {
DBCollection collection=db.getCollection(collName);
return collection.mapReduce(map,reduce, null, MapReduceCommand.OutputType.INLINE, queryObj);
}
查询结果如下:
American 2
China 5
Korea 1