数据很简单,如下:
> db.t2.find();
{ "country" : "china", "province" : "sh", "userid" : "a" }
{ "country" : "china", "province" : "sh", "userid" : "b" }
{ "country" : "china", "province" : "sh", "userid" : "a" }
{ "country" : "china", "province" : "sh", "userid" : "c" }
{ "country" : "china", "province" : "bj", "userid" : "da" }
{ "country" : "china", "province" : "bj", "userid" : "fa" }
需求是统计出每个country/province下的userid的数量(同一个userid只统计一次)
过程如下。
首先试着这样来统计:
> db.t2.aggregate([ { $group: {"_id": { "country" : "$country", "prov": "$province"} , "number":{$sum:1}} } ])
但是这样的结果是错误的:
{ "_id" : { "country" : "china", "prov" : "bj" }, "number" : 2 }
{ "_id" : { "country" : "china", "prov" : "sh" }, "number" : 4 }
原因是,这样来统计不能区分userid相同的情况 (上面的数据中sh有两个 userid = a)
为了解决这个问题,首先执行一个group,其id 是 country, province, userid三个field:
> db.t2.aggregate([ { $group: {"_id": { "country" : "$country", "province": "$province" , "uid" : "$userid" } } } ])
结果为
{ "_id" : { "country" : "china", "province" : "bj", "uid" : "fa" } }
{ "_id" : { "country" : "china", "province" : "bj", "uid" : "da" } }
{ "_id" : { "country" : "china", "province" : "sh", "uid" : "c" } }
{ "_id" : { "country" : "china", "province" : "sh", "uid" : "b" } }
{ "_id" : { "country" : "china", "province" : "sh", "uid" : "a" } }
可以看出,这步的目的是把相同的userid只剩下一个。
然后第二步,再第一步的结果之上再执行统计:
>db.t2.aggregate([
{ $group: {"_id": { "country" : "$country", "province": "$province" , "uid" : "$userid" } } } ,
{ $group: {"_id": { "country" : "$_id.country", "province": "$_id.province" }, count : { $sum : 1 } } }
])
这回就对了:
{ "_id" : { "country" : "china", "province" : "sh" }, "count" : 3 }
{ "_id" : { "country" : "china", "province" : "bj" }, "count" : 2 }
为了让结果好看点,加入一个$project操作符,把_id分开:
>db.t2.aggregate([ { $group: {"_id": { "country" : "$country", "province": "$province" , "uid" : "$userid" } } } ,
{ $group: {"_id": { "country" : "$_id.country", "province": "$_id.province" }, count: { $sum : 1 } } },
{ $project : {"_id": 0, "country" : "$_id.country", "province" : "$_id.province", "count" : 1}}
])
{ "count" : 3, "country" : "china", "province" : "sh" }
{ "count" : 2, "country" : "china", "province" : "bj" }
---------------
聚合报错 $group,but didn't allow external Sort.Pass allowDiskuse:true_我家小宝_朱朱的博客-CSDN博客
原文:https://blog.csdn.net/lff0305/article/details/50034735