MongoDB三种分组方式
- group(先筛选再分组,不支持分片,对数据量有所限制,效率不高) 【简单分组实测150W 12.5s】
- mapreduce(基于js引擎,单线程执行,效率较低,适合用做后台统计等) 【简单分组实测150W 28.5s】
- aggregate(推荐) (性能要高很多,并且使用上要简单些) 【简单分组实测150W 2.6s】
group
db.ad_play_log.group({
// https://docs.mongodb.org/manual/reference/method/db.collection.group/
// https://docs.mongodb.org/manual/reference/command/group/#dbcmd.group
key: {
// 分组的字段
ad_position_id: 1
},
cond: {
// WHERE条件
ord_dt: {
$gt: new Date('01/01/2012')
}
},
reduce: function (curr, result) {
result.count++;
},
initial: {
count: 0
}
});
// SELECT ad_play_log, SUM(material_id) as total
// FROM orders
// WHERE ord_dt > '01/01/2012'
// GROUP BY ad_position_id
db.runCommand({
mapreduce: "ad_play_log",
map: function Map() {
var key = {
ad_position_id: this.ad_position_id
};
var value = {
count: 1
};
/**
* key value 传给reduce函数处理
* @param key
* @param value
*/
emit(key, value);
},
reduce: function Reduce(key, values) {
var ret = {
count: 0
};
for (var i in values) {
ret.count += 1;
}
return ret;
},
out: {
inline: 1
}
});
Mongodb官网对MapReduce介绍:
Map/reduce in MongoDB is useful for batch processing of data and aggregation operations. It is similar in spirit to using something like Hadoop with all input coming from a collection and output going to a collection. Often, in a situation where you would have used GROUP BY in SQL, map/reduce is the right tool in MongoDB.
大致意思是:Mongodb中的Map/reduce主要是用来对数据进行批量处理和聚合操作,有点类似于使用Hadoop对集合数据进行处理,所有输入数据都是从集合中获取,而MapReduce后输出的数据也都会写入到集合中。通常类似于我们在SQL中使用Group By语句一样。
使用MapReduce要实现两个函数:Map和Reduce。Map函数调用emit(key,value)遍历集合中所有的记录,将key与value传给Reduce函数进行处理。Map函数和Reduce函数是使用Javascript编写的,并可以通过db.runCommand或mapreduce命令来执行MapReduce操作。
Aggregate
db.ad_play_log.aggregate(
{
//https://docs.mongodb.org/manual/reference/method/db.collection.aggregate/
// 分组
$group: {
// 根据ad_position_id分组
_id: "$ad_position_id",
count: {
// 统计个数count
$sum: 1
},
total: {
// 对material_id相加
$sum: "$material_id"
}
}
},
{
$sort: {
// 根据ad_position_id排序,-1表示降序
_id: -1
}
},
{
// 限制条数,可选
$limit: 10
},
{
// 匹配条件,可选,放在$group前面就是先匹配在分组,等于613和小于700
$match: {_id: 613, count: {$lt: 700}}
}
// == SELECT ad_position_id,count(1) AS count FROM ad_play_log GROUP BY ad_position_id
);
1
2
3
4
5
6
7
8
9
10
11
12
|
public
void
test_aggregate() {
MongoCollection<Document> collection = MongoUtil.getCollection(
"ad_play_log"
);
AggregateIterable<Document> iterable = collection.aggregate(asList(
new
Document(
"$group"
,
new
Document(
"_id"
,
"$ad_position_id"
).append(
"count"
,
new
Document(
"$sum"
,
1
)))));
iterable.forEach(
new
Block<Document>() {
@Override
public
void
apply(
final
Document document) {
System.out.println(document.toJson());
}
});
}
|
- Use
$project
to savetag
andcount
intotmp
- Use
$push
oraddToSet
to storetmp
into yourdata
list.
Code:
db.test.aggregate({$unwind:'$tags'},{$group:{_id:'$tags', count:{$sum:1}}},{$project:{tmp:{tag:'$_id', count:'$count'}}},{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}})
Output:
{"result":[{"_id":null,"total":5,"data":[{"tag":"SOME","count":1},{"tag":"RANDOM","count":2},{"tag":"TAGS1","count":1},{"tag":"TAGS","count":1},{"tag":"SOME1","count":1}]}],"ok":1}
参考
代码