1、http://www.mongovue.com/2010/11/03/yet-another-mongodb-map-reduce-tutorial/
这篇文章中比较重要的知识点是:
Reduce takes 2 parameters – 1) Key 2) An array of values (number of values outputted from Map step). Output of Reduce is an object. It is important to note that Reduce can be called multiple times on a single key! Yes, you read it correctly. It is not that difficult to think actually – consider a case where your data is huge and it lies on 2 different servers. It would be ideal to perform a Reduce on the given key on first server, and then perform a Reduce for the same key on second server. And then do a Reduce on the results of these two reduced values.
The picture above shows Reduce being called twice. This is just can example. To be frank, we don’t know how MongoDB executes Reduce. We don’t know which key it is going to be reduced first and which key last. We also don’t know how many times it is going to call reduce for a key. This optimization is better left with MongoDB itself as it finds the most suitable parallel execution for every MapReduce command.
还有就是例子中对reduce解析的第二张图片和说明:
2、http://www.infoq.com/cn/articles/implementing-aggregation-functions-in-mongodb
这篇博文主要是参考具体的例子和语句对mapreduce进行理解
这两篇对reduce的原理写的比较详细。还有就是其官方的文档中关于mapreduce的描写:
3、http://docs.mongodb.org/manual/reference/command/mapReduce/#dbcmd.mapReduce
官方文档中,主意以下部分:
-
the type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:
reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )
-
the reduce function must be idempotent. Ensure that the following statement is true:
reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )
-
the order of the elements in the valuesArray should not affect the output of the reduce function, so that the following statement is true:
reduce( key, [ A, B ] ) == reduce( key, [ B, A ] )
仔细研磨吧,路还长着呢