最近在做一个分享统计,分享记录user_share_log结构如下:
- 使用pipeline统计每个用户分享的文章个数及访问次数(相当于distinct+group by)
思路:先获得每个用户每个文章的访问次数,然后再分组统计每个用户分享的文章个数和累计访问次数。这两步统计我们直接使用pipeline进行两个group操作。如下:
db.runCommand({
"aggregate": "user_share_log",
"pipeline": [
{ "$match": { "day": 20181016}},
{ "$group": { "_id": { "userId": "$userId", "shareObjectId": "$shareObjectId"}, "shareCount": { "$sum": 1}}},
{ "$group": { "_id": "$_id.userId", "userCount": { "$sum": 1}, "shareCount": { "$sum": "$shareCount"}}},
{ "$project": { "userId": "$_id.userId", "userCount": "$userCount", "shareCount": "$shareCount"} }
]
});
结果:
- 统计每个用户分享文章a的用户数及访问次数($cond相当于case when)
db.runCommand({
"aggregate": "user_share_log",
"pipeline": [
{ "$match": { "day": 20181016}},
{ "$group": { "_id": { "userId": "$userId", "shareObjectId": "$shareObjectId"}, "shareCount": { "$sum": 1}}},
{ "$group": { "_id": "$_id.userId", "userACount": { "$sum" : { "$cond" : [ { "$eq" : [ "$_id.shareObjectId" , 'a']} , 1 , 0]}} , "shareACount": { "$sum" : { "$cond" : [ { "$eq" : [ "$_id.shareObjectId" , 'a']} , "$shareCount" , 0]}} }},
{ "$project": { "userId": "$_id.userId", "userACount": "$userACount", "shareACount": "$shareACount"} }
]
});
结果:
附:java代码:
Criteria criteria1 = Criteria.where("day").is(20181016);
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(criteria1),
Aggregation.group("userId","shareObjectId")
.count().as("shareCount"),
Aggregation.group("$_id.userId")
.count().as("userCount").sum("shareCount").as("shareCount")
);
AggregationResults<DBObject> aggregationResults = shopStatisticsMongoTemplate.aggregate(aggregation, UserVisit.class, DBObject.class);
List<DBObject> mappedResults = aggregationResults.getMappedResults();
for(DBObject dbObject : mappedResults){
logger.warn(String.format("userId=%s userCount=%s shareCount=%s", dbObject.get("_id"), dbObject.get("userCount"), dbObject.get("shareCount")));
}
相关知识链接:MongoDB 聚合管道(Aggregation Pipeline):https://www.cnblogs.com/shanyou/p/3494854.html