1.【count】
集合的count函数是最简单的聚合函数,返回集合中文档的数量,也可以接受一个查询文档,统计符合这个查询的文档数量:
01.
> db.user.find();
02.
{
"_id"
: ObjectId(
"5020faf5d6acd1b2a3fb316f"
),
"name"
:
"tim"
,
"age"
: 40,
"registered"
: ISODate(
"2007-03-02T16:00:00Z"
) }
03.
{
"_id"
: ObjectId(
"5020fb08d6acd1b2a3fb3170"
),
"name"
:
"tom"
,
"age"
: 29,
"registered"
: ISODate(
"2009-07-02T16:00:00Z"
) }
04.
{
"_id"
: ObjectId(
"5020fb27d6acd1b2a3fb3171"
),
"name"
:
"jimmy"
,
"age"
: 18,
"registered"
: ISODate(
"2009-09-02T16:00:00Z"
) }
05.
> db.user.count();
06.
3
07.
> db.user.count({
"name"
:
"tim"
});
08.
1
09.
>
如上例,不使用任何参数的count,不论集合有多大,都会很快返回结果!使用参数后,会让count函数执行变慢。
2.【distinct】
用于找出一个集合中,给定键的所有不同的值!
1.
> db.user.find();
2.
{
"_id"
: ObjectId(
"5020faf5d6acd1b2a3fb316f"
),
"name"
:
"tim"
,
"age"
: 40,
"registered"
: ISODate(
"2007-03-02T16:00:00Z"
) }
3.
{
"_id"
: ObjectId(
"5020fb08d6acd1b2a3fb3170"
),
"name"
:
"tom"
,
"age"
: 29,
"registered"
: ISODate(
"2009-07-02T16:00:00Z"
) }
4.
{
"_id"
: ObjectId(
"5020fb27d6acd1b2a3fb3171"
),
"name"
:
"jimmy"
,
"age"
: 18,
"registered"
: ISODate(
"2009-09-02T16:00:00Z"
) }
5.
> db.user.distinct(
"name"
);
6.
[
"jimmy"
,
"tim"
,
"tom"
]
7.
>
我们往集合的distinct函数中传递键的名称,返回一个数组,包含这个键在集合中的所有值!我们还可通过在数据库上运行命令,来执行distinct聚合函数,此时我们必须指定集合和键:
01.
> db.user.find();
02.
{
"_id"
: ObjectId(
"5020faf5d6acd1b2a3fb316f"
),
"name"
:
"tim"
,
"age"
: 40,
"registered"
: ISODate(
"2007-03-02T16:00:00Z"
) }
03.
{
"_id"
: ObjectId(
"5020fb08d6acd1b2a3fb3170"
),
"name"
:
"tom"
,
"age"
: 29,
"registered"
: ISODate(
"2009-07-02T16:00:00Z"
) }
04.
{
"_id"
: ObjectId(
"5020fb27d6acd1b2a3fb3171"
),
"name"
:
"jimmy"
,
"age"
: 18,
"registered"
: ISODate(
"2009-09-02T16:00:00Z"
) }
05.
> db.user.distinct(
"name"
);
06.
[
"jimmy"
,
"tim"
,
"tom"
]
07.
> db.runCommand({
"distinct"
:
"user"
,
"key"
:
"name"
});
08.
{
09.
"values"
: [
10.
"jimmy"
,
11.
"tim"
,
12.
"tom"
13.
],
14.
"stats"
: {
15.
"n"
: 3,
16.
"nscanned"
: 3,
17.
"nscannedObjects"
: 0,
18.
"timems"
: 78,
19.
"cursor"
:
"BtreeCursor name_1_age_1"
20.
},
21.
"ok"
: 1
22.
}
命令接受的文档参数中,键"distinct"指定统计的集合名称,键"key"指定统计的键的名称!返回一个文档,键“value”指定统计的键在该集合中的所有值!我们还可以看出,在统计时还使用了索引!
3.【group】
group聚合可以做稍微复杂一些的操作,其执行过程为:先按照指定的键对集合中的文档进行分组,然后通过聚合每一组中的所有文档,来产生最终的结果文档。我们现在有这样一个集合,其中记录了所有蔬菜的价格并会定期刷新,我们要统计一下各个蔬菜的最新价格,即我们首先需要按照蔬菜名称进行分组,然后对每一组文档进行处理,找出最新的价格,我们先通过运行数据库命令的方式来使用group:
01.
> db.vegetableprice.find();
02.
{
"_id"
: ObjectId(
"50271b4ae02ab93d5c5be795"
),
"name"
:
"tomato"
,
"price"
: 3.3,
"time"
: ISODate(
"2012-08-12T02:56:10.303Z"
) }
03.
{
"_id"
: ObjectId(
"50271b58e02ab93d5c5be796"
),
"name"
:
"tomato"
,
"price"
: 3.5,
"time"
: ISODate(
"2012-08-12T02:56:24.843Z"
) }
04.
{
"_id"
: ObjectId(
"50271b95e02ab93d5c5be797"
),
"name"
:
"eggplant"
,
"price"
: 5.6,
"time"
: ISODate(
"2012-08-12T02:57:25.605Z"
) }
05.
{
"_id"
: ObjectId(
"50271ba6e02ab93d5c5be798"
),
"name"
:
"cucumber"
,
"price"
: 4.7,
"time"
: ISODate(
"2012-08-12T02:57:42.031Z"
) }
06.
{
"_id"
: ObjectId(
"50271bafe02ab93d5c5be799"
),
"name"
:
"eggplant"
,
"price"
: 5.9,
"time"
: ISODate(
"2012-08-12T02:57:51.001Z"
) }
07.
{
"_id"
: ObjectId(
"50271bb7e02ab93d5c5be79a"
),
"name"
:
"cucumber"
,
"price"
: 4.3,
"time"
: ISODate(
"2012-08-12T02:57:59.363Z"
) }
08.
{
"_id"
: ObjectId(
"50271bdae02ab93d5c5be79b"
),
"name"
:
"bean"
,
"price"
: 8.9,
"time"
: ISODate(
"2012-08-12T02:58:34.931Z"
) }
09.
> db.runCommand({
"group"
: {
10.
...
"ns"
:
"vegetableprice"
,
11.
...
"key"
: {
"name"
:
true
},
12.
...
"initial"
: {
"time"
: 0},
13.
...
"$reduce"
:
function
(doc, prev) {
14.
...
if
(doc.time > prev.time) {
15.
... prev.time = doc.time;
16.
... prev.price = doc.price;
17.
... }
18.
... }
19.
... }});
20.
{
21.
"retval"
: [
22.
{
23.
"name"
:
"tomato"
,
24.
"time"
: ISODate(
"2012-08-12T02:56:24.843Z"
),
25.
"price"
: 3.5
26.
},
27.
{
28.
"name"
:
"eggplant"
,
29.
"time"
: ISODate(
"2012-08-12T02:57:51.001Z"
),
30.
"price"
: 5.9
31.
},
32.
{
33.
"name"
:
"cucumber"
,
34.
"time"
: ISODate(
"2012-08-12T02:57:59.363Z"
),
35.
"price"
: 4.3
36.
},
37.
{
38.
"name"
:
"bean"
,
39.
"time"
: ISODate(
"2012-08-12T02:58:34.931Z"
),
40.
"price"
: 8.9
41.
}
42.
],
43.
"count"
: 7,
44.
"keys"
: 4,
45.
"ok"
: 1
46.
}
47.
>
从返回的结果文档中,我们可以看到,键“retval”指向的文档数组就是我们需要的所有蔬菜最新价格。我们来看看,运行“group”命令的参数文档的所有键的意义:
1》 “ns” : 字符串,指定要进行操作的集合名称
2》 “key”: 文档,指定要进行分组所使用的键,此处可以指定多个键,分组将按照这些键的值进行,此处会区分大小写!对于某些应用,我们可能要分组不去区分值的大小写,这个后面会提到如何进行!
3》 “initial” : 累加器文档!分组后,会对每一组执行reduce函数,reduce函数会接受两个文档参数,第一个是分组当前遍历的文档,第二个就是initial键指定的累加器文档!每一组所有文档执行reduce都会使用这个文档,所以改变会一直保留在累加器文档中!
4》 “$reduce”: 函数。分组后,会通过这个函数对每个分组进行聚合!注意聚合时,每一组会有一个独立的累加器文档,结束后,累加器文档中即记录聚合结果!
5》 “condition”: 条件。在对集合进行分组时,我们可以过滤部分文档,这里的过滤条件就是我们前面提到的查询条件!可以使用前面提到的各种查询条件操作符!这个键也可以缩写为“cond” 或 “q” 。如上例,我们只想查看“tomato” 和“cucumber” 的最新价格,我们可以这样写:(分别演示了这个键的3种写法)
01.
> db.runCommand({
"group"
:{
02.
...
"ns"
:
"vegetableprice"
,
03.
...
"key"
:{
"name"
:
true
},
04.
...
"initial"
:{
"time"
:0},
05.
...
"$reduce"
:
function
(doc, prev){
06.
...
if
(doc.time > prev.time){
07.
... prev.time = doc.time;
08.
... prev.price = doc.price;
09.
... }},
10.
...
"condition"
:{
"name"
:{
"$in"
:[
"tomato"
,
"cucumber"
]}}
11.
... }});
12.
{
13.
"retval"
: [
14.
{
15.
"name"
:
"tomato"
,
16.
"time"
: ISODate(
"2012-08-12T02:56:24.843Z"
),
17.
"price"
: 3.5
18.
},
19.
{
20.
"name"
:
"cucumber"
,
21.
"time"
: ISODate(
"2012-08-12T02:57:59.363Z"
),
22.
"price"
: 4.3
23.
}
24.
],
25.
"count"
: 4,
26.
"keys"
: 2,
27.
"ok"
: 1
28.
}
29.
> db.runCommand({
"group"
: {
30.
...
"ns"
:
"vegetableprice"
,
31.
...
"key"
: {
"name"
:
true
},
32.
...
"initial"
: {
"time"
: 0},
33.
...
"$reduce"
:
function
(doc, prev){
34.
...
if
(doc.time > prev.time){
35.
... prev.time = doc.time;
36.
... prev.price = doc.price;
37.
... }},
38.
...
"cond"
:{
"name"
: {
"$in"
: [
"tomato"
,
"cucumber"
]}}
39.
... }});
40.
{
41.
"retval"
: [
42.
{
43.
"name"
:
"tomato"
,
44.
"time"
: ISODate(
"2012-08-12T02:56:24.843Z"
),
45.
"price"
: 3.5
46.
},
47.
{
48.
"name"
:
"cucumber"
,
49.
"time"
: ISODate(
"2012-08-12T02:57:59.363Z"
),
50.
"price"
: 4.3
51.
}
52.
],
53.
"count"
: 4,
54.
"keys"
: 2,
55.
"ok"
: 1
56.
}
57.
> db.runCommand({
"group"
: {
58.
...
"ns"
:
"vegetableprice"
,
59.
...
"key"
: {
"name"
:
true
},
60.
...
"initial"
: {
"time"
: 0},
61.
...
"$reduce"
:
function
(doc, prev){
62.
...
if
(doc.time > prev.time){
63.
... prev.time = doc.time;
64.
... prev.price = doc.price;
65.
... }},
66.
...
"q"
:{
"name"
:{
"$in"
:[
"tomato"
,
"cucumber"
]}}
67.
... }});
68.
{
69.
"retval"
: [
70.
{
71.
"name"
:
"tomato"
,
72.
"time"
: ISODate(
"2012-08-12T02:56:24.843Z"
),
73.
"price"
: 3.5
74.
},
75.
{
76.
"name"
:
"cucumber"
,
77.
"time"
: ISODate(
"2012-08-12T02:57:59.363Z"
),
78.
"price"
: 4.3
79.
}
80.
],
81.
"count"
: 4,
82.
"keys"
: 2,
83.
"ok"
: 1
84.
}
85.
>
【使用完成器】
上述我们可以看到group操作的返回值,其中键“retval”指向所有聚合后的文档,这些就是客户端需要的文档。group操作的返回值文档最大为2万条记录,因为有这个大小的限制和考虑到效率,我们有时需要对这些聚合后得到的文档进行进一步地修剪后再返回!比如,我们有这样一个blog集合,其中每一个文档是一个blog,我们这次要统计的是,每个作者在写博客时最常使用的tag(标签),我们先看看这个集合结构:
1.
> db.blog.find();
2.
{
"_id"
: ObjectId(
"50272f94e02ab93d5c5be79c"
),
"author"
:
"jim"
,
"content"
:
"..."
,
"tag"
: [
"cat"
,
"pet"
,
"dog"
] }
3.
{
"_id"
: ObjectId(
"50272f9ee02ab93d5c5be79d"
),
"author"
:
"jim"
,
"content"
:
"..."
,
"tag"
: [
"cat"
,
"pet"
,
"pig"
] }
4.
{
"_id"
: ObjectId(
"50272fcee02ab93d5c5be79e"
),
"author"
:
"jim"
,
"content"
:
"..."
,
"tag"
: [
"cat"
,
"pet"
,
"hamster"
] }
5.
{
"_id"
: ObjectId(
"50272fe4e02ab93d5c5be79f"
),
"author"
:
"tom"
,
"content"
:
"..."
,
"tag"
: [
"db"
,
"oracle"
,
"mysql"
] }
6.
{
"_id"
: ObjectId(
"50272fede02ab93d5c5be7a0"
),
"author"
:
"tom"
,
"content"
:
"..."
,
"tag"
: [
"db"
,
"mongodb"
,
"mysql"
] }
7.
>
按照上述的统计方式,我们会这样做:
01.
> db.blog.find();
02.
{
"_id"
: ObjectId(
"50272f94e02ab93d5c5be79c"
),
"author"
:
"jim"
,
"content"
:
"..."
,
"tag"
: [
"cat"
,
"pet"
,
"dog"
] }
03.
{
"_id"
: ObjectId(
"50272f9ee02ab93d5c5be79d"
),
"author"
:
"jim"
,
"content"
:
"..."
,
"tag"
: [
"cat"
,
"pet"
,
"pig"
] }
04.
{
"_id"
: ObjectId(
"50272fcee02ab93d5c5be79e"
),
"author"
:
"jim"
,
"content"
:
"..."
,
"tag"
: [
"cat"
,
"pet"
,
"hamster"
] }
05.
{
"_id"
: ObjectId(
"50272fe4e02ab93d5c5be79f"
),
"author"
:
"tom"
,
"content"
:
"..."
,
"tag"
: [
"db"
,
"oracle"
,
"mysql"
] }
06.
{
"_id"
: ObjectId(
"50272fede02ab93d5c5be7a0"
),
"author"
:
"tom"
,
"content"
:
"..."
,
"tag"
: [
"db"
,
"mongodb"
,
"mysql"
] }
07.
> db.runCommand({
"group"
: {
08.
...
"ns"
:
"blog"
,
09.
...
"key"
: {
"author"
:
true
},
10.
...
"initial"
: {
"tag"
:{}},
11.
...
"$reduce"
:
function
(doc, prev){
12.
...
for
(
var
i
in
doc.tag){
13.
...
if
(doc.tag
in
prev.tag){
14.
... prev.tag[doc.tag]++;
15.
... }
else
{
16.
... prev.tag[doc.tag] = 1;
17.
... }
18.
... }
19.
... }
20.
... }});
21.
{
22.
"retval"
: [
23.
{
24.
"author"
:
"jim"
,
25.
"tag"
: {
26.
"cat"
: 3,
27.
"pet"
: 3,
28.
"dog"
: 1,
29.
"pig"
: 1,
30.
"hamster"
: 1
31.
}
32.
},
33.
{
34.
"author"
:
"tom"
,
35.
"tag"
: {
36.
"db"
: 2,
37.
"oracle"
: 1,
38.
"mysql"
: 2,
39.
"mongodb"
: 1
40.
}
41.
}
42.
],
43.
"count"
: 5,
44.
"keys"
: 2,
45.
"ok"
: 1
46.
}
47.
>
从上述结果我们可以看出,返回值中,包含所有作者使用的所有的tag的次数统计,我们可以将这个文档返回给客户端,然后由客户端去比较得到每个作者用得最多的tag。但这样做会产生很多开销影响效率,对于大的集合的统计尤为明显!我们此时可以通过在统计时使用“finalize”键来进一步处理reduce后的结果:
01.
> db.runCommand({
"group"
: {
02.
...
"ns"
:
"blog"
,
03.
...
"key"
: {
"author"
:
true
},
04.
...
"initial"
: {
"tag"
: {}},
05.
...
"$reduce"
:
function
(doc, prev) {
06.
...
for
(
var
i
in
doc.tag){
07.
...
if
(doc.tag
in
prev.tag){
08.
... prev.tag[doc.tag]++;
09.
... }
else
{
10.
... prev.tag[doc.tag] = 1;
11.
... }
12.
... }
13.
... },
14.
...
"finalize"
:
function
(reducedoc) {
15.
...
var
mostpopular = 0;
16.
...
for
(
var
i
in
reducedoc.tag) {
17.
...
if
(reducedoc.tag > mostpopular) {
18.
... mostpopular = reducedoc.tag;
19.
... reducedoc.usemosttag = i;
20.
... }
21.
... }
22.
...
delete
reducedoc.tag;
23.
... }
24.
... }});
25.
{
26.
"retval"
: [
27.
{
28.
"author"
:
"jim"
,
29.
"usemosttag"
:
"cat"
30.
},
31.
{
32.
"author"
:
"tom"
,
33.
"usemosttag"
:
"db"
34.
}
35.
],
36.
"count"
: 5,
37.
"keys"
: 2,
38.
"ok"
: 1
39.
}
40.
>
通过在group操作中使用键"finalize",我们看到最后返回的结果简洁清晰了很多,并且就是客户想要的结果。这个finalize函数写得有点问题就是,如果使用最多的tag有多个,他只会得到第一个!这个可以通过改进这个finalize函数来进行!
【将函数作为键使用】
上面我们都是通过直接指定某一个键,让其值作为分组的基准!我们也提到了,按照值分组会区分值得大小写!我们同样拿查询蔬菜最新价格来距离,我们先看不做任何处理的结果:
01.
> db.vegetableprice.find();
02.
{
"_id"
: ObjectId(
"50273c44e02ab93d5c5be7a1"
),
"name"
:
"tomato"
,
"price"
: 3.5,
"time"
: ISODate(
"2012-08-12T05:16:52.106Z"
) }
03.
{
"_id"
: ObjectId(
"50273c4ce02ab93d5c5be7a2"
),
"name"
:
"Tomato"
,
"price"
: 3.5,
"time"
: ISODate(
"2012-08-12T05:17:00.686Z"
) }
04.
> db.runCommand({
"group"
: {
05.
...
"ns"
:
"vegetableprice"
,
06.
...
"key"
: {
"name"
:
true
},
07.
...
"initial"
: {
"time"
: 0},
08.
...
"$reduce"
:
function
(doc, prev) {
09.
...
if
(doc.time > prev.time) {
10.
... prev.time = doc.time;
11.
... prev.price = doc.price;
12.
... }
13.
... }
14.
... }});
15.
{
16.
"retval"
: [
17.
{
18.
"name"
:
"tomato"
,
19.
"time"
: ISODate(
"2012-08-12T05:16:52.106Z"
),
20.
"price"
: 3.5
21.
},
22.
{
23.
"name"
:
"Tomato"
,
24.
"time"
: ISODate(
"2012-08-12T05:17:00.686Z"
),
25.
"price"
: 3.5
26.
}
27.
],
28.
"count"
: 2,
29.
"keys"
: 2,
30.
"ok"
: 1
31.
}
32.
>
我们看到这不是我们想要的结果,group将“tomato” 和“Tomato”作为两个组来对待!通过在group函数中使用键“$keyf” 代替键“key” 来指定分组依赖的键,会解决这个问题,使用方位为:
01.
> db.vegetableprice.find();
02.
{
"_id"
: ObjectId(
"50273c44e02ab93d5c5be7a1"
),
"name"
:
"tomato"
,
"price"
: 3.5,
"time"
: ISODate(
"2012-08-12T05:16:52.106Z"
) }
03.
{
"_id"
: ObjectId(
"50273c4ce02ab93d5c5be7a2"
),
"name"
:
"Tomato"
,
"price"
: 3.5,
"time"
: ISODate(
"2012-08-12T05:17:00.686Z"
) }
04.
> db.runCommand({
"group"
: {
05.
...
"ns"
:
"vegetableprice"
,
06.
...
"$keyf"
:
function
(doc){
return
{
"name"
: doc.name.toLowerCase()};},
07.
...
"initial"
: {
"time"
: 0},
08.
...
"$reduce"
:
function
(doc, prev){
09.
...
if
(doc.time > prev.time){
10.
... prev.time = doc.time;
11.
... prev.price = doc.price;
12.
... }
13.
... }
14.
... }});
15.
{
16.
"retval"
: [
17.
{
18.
"name"
:
"tomato"
,
19.
"time"
: ISODate(
"2012-08-12T05:17:00.686Z"
),
20.
"price"
: 3.5
21.
}
22.
],
23.
"count"
: 2,
24.
"keys"
: 1,
25.
"ok"
: 1
26.
}
27.
>
注意键“$keyf”后面是一个函数,function(doc){....},参数doc就表示当前需要提取分组键的文档!这个函数会对这个文档的特定分组键的值进行一些处理!此处进行的是,将键“name”的值全部转化为小写!注意这个函数最后也必须返回一个文档!
我们上述演示的所有group操作,全是通过运行数据库命令实现的!实际上,MongoDB为集合也提供了group函数!使用方法同上,这里就不做演示了。