MongoDB中group() mapReduce() aggregate()之比较

对于SQL而言,如果从users表里查询每个team所有成员的number,查询语句如下:

SELECT team, no FROM users GROUP BY team                             (1)

但是对于Mongodb而言,实现这样的功能,则比较复杂。

从mongodb2.2之后,有了三个function可以实现这个功能,他们按照产生的顺序,分别是group(), mapReduce()和aggregate().

他们之间的区别有哪些呢?参照stack overflow上讨论http://stackoverflow.com/questions/12337319/mongodb-aggregation-comparison-group-group-and-mapreduce整理如下:

1.     db.collection.group().

定义为:

Db.collection.group(
			key,
			reduce,
			initial,
			keyf,
			cond,
			finalize).

特征为:

  • Simple syntax and functionality for grouping .. analogous to GROUP BY in SQL.
  • Returns result set inline (as an array of grouped items).
  • Implemented using the JavaScript engine; custom reduce() functions can be written in JavaScript.
  • Current Limitations
    • Will not group into a result set with more than 10,000 keys.
    • Results must fit within the limitations of a BSON document (currently 16Mb).
    • Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
    • Does not work with sharded collections.

Ex: 如果需要实现语句1的功能,实现如下:

db.users.group({key: {team: 1}, initial: {members: []}, reduce: function(cur, result){result.members.push(cur.no);}});

2.     db.collection.mapReduce().

据说增加mapreduce是为了迎合mapreduce的流行。

db.collection.mapReduce(
                         <mapfunction>,
                        <reducefunction>,
                         {
                           out: <collection>,
                           query: <document>,
                           sort: <document>,
                           limit: <number>,
                           finalize: <function>,
                           scope: <document>,
                           jsMode: <boolean>,
                           verbose: <boolean>
                         }
                       )

特征为:

  • Implements the MapReduce model for processing large data sets.
  • Can choose from one of several output options (inline, new collection, merge, replace, reduce)
  • MapReduce functions are written in JavaScript.
  • Supports non-sharded and sharded input collections.
  • Can be used for incremental aggregation over large collections.
  • MongoDB 2.2 implements much better support for sharded map reduce output.
  • Current Limitations
    • There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently.
    • MapReduce functions can be difficult to debug. You can use print() and printjson() to include diagnostic output in the mongod log.
    • MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience.

由于需要用到js engine,所以速度是比较慢的,具体的可以参照http://technicaldebt.com/?p=1157

 

Ex: 如果需要实现语句1的功能,实现如下:

var map = function(){ emit(this.team, this.no); }; 
var reduce = function(key, value){ return {team: key, members: value}; };
db.users.mapReduce(map, reduce, {out: "team_member"});

3.     db.collection.aggregate().

For simplertasks, mapReduce is big hammer. And avoid overhead of JavaScript engine, alsoselect matching subdocuments and arrays. Aggregate framework is implementedwithpipelinein C++.

Pipeline 定义的操作有:

$match – query predicate as a filter.

$project – use a sample document todetermine the shape of the result.

$unwind – hands out array elements oneat a time.

$group – aggregates items into bucketsdefined by a key.

$sort – sort document.

$limit – allow the specified number ofdocuments to pass

$skip – skip over the specified numberof documents.

特征如下:

  • New feature in the MongoDB 2.2.0 production release (August, 2012).
  • Designed with specific goals of improving performance and usability.
  • Returns result set inline.
  • Supports non-sharded and sharded input collections.
  • Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping.
  • Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.
  • Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
  • Pipeline operators can be repeated as needed (for example, multiple $project or $groupsteps.
  • Current Limitations
    • Results are returned inline, so are limited to the maximum document size supported by the server (16Mb)
    • Doesn't support as many output options as MapReduce
    • Limited to operators and expressions supported by the Aggregation Framework (i.e. can't write custom functions)
    • Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage.

Ex: 如果需要实现语句1的功能,实现如下:

db.users.aggregate({$project: {team: 1, no: 1}}, {$group: { _id: "$team", memebers: {$addToSet: "$no"}}});

Refs:

http://docs.mongodb.org/manual/aggregation/#Aggregation-Examples

http://docs.mongodb.org/manual/reference/method/db.collection.group/

http://technicaldebt.com/?p=1157

http://stackoverflow.com/questions/12337319/mongodb-aggregation-comparison-group-group-and-mapreduce






评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值