mongodb 聚合Aggregation分组group汇总索引问题

最新推荐文章于 2024-08-28 02:46:37 发布

BinaryBoss

最新推荐文章于 2024-08-28 02:46:37 发布

阅读量2.4k

点赞数 1

分类专栏： java mongodb 文章标签： java 开发语言

本文链接：https://blog.csdn.net/maxi1234/article/details/127778663

版权

java 同时被 2 个专栏收录

17 篇文章 0 订阅

订阅专栏

mongodb

6 篇文章 2 订阅

订阅专栏

本文讲述了在MongoDB中进行数据分组汇总时遇到的性能问题。作者通过匹配指定日期和账号的文档，然后按标题分组求和，发现查询速度较慢。在查阅官方文档后，了解到$group阶段可以利用索引加速，但条件是需配合$sort并使用$first累积器。为提高效率，作者添加了按标题升序的排序步骤，并验证了查询确实使用了预期的索引，从而显著提升了聚合操作的速度。强调了深入阅读官方文档对于解决复杂问题的重要性。

摘要由CSDN通过智能技术生成

项目中，需要对mongo中某个集合进行分组汇总，集合主要包含下面几个字段

// 日期
private Integer date;

// 账号id
private Long accountId;

// 标题
private String title;

// 状态
private Integer status;

// 状态🐴
private String statusCode;

// 数量
private Integer count;

集合主要包含下面两个索引

    {
        "v" : 2,
        "key" : {
            "date" : -1.0,
            "accountId" : 1.0,
            "status" : 1.0
        },
        "name" : "date_-1_accountId_1_status_1",
        "background" : true
    },

    {
        "v" : 2,
        "key" : {
            "date" : -1.0,
            "accountId" : 1.0,
            "title" : 1.0
        },
        "name" : "date_-1_accountId_1_title_1",
        "background" : true
    },

现在想对指定日期、指定账号下的数据不同标题下面的数量进行汇总，刚开始使用下面代码：

Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(Criteria.where("date")
.is(date).and("accountId").is(accountId)),
Aggregation.group("title", "statusCode")
.sum("count").as("totalCount"));

但是实际在测试的时候，发现速度比较慢（数据库千万级），通过查看mongo日志，发现走的是date_-1_accountId_1_status_1索引，并没有使用date_-1_accountId_1_title_1索引。

于是查询mongo官方文档Aggregation Pipeline — MongoDB Manual，怎样才能使用group中的字段索引，发现了有如下说明：

$group
The $group stage can sometimes use an index to find the first document in each group if all of the following criteria are met:

    The $group stage is preceded by a $sort stage that sorts the field to group by,

    There is an index on the grouped field which matches the sort order and

    The only accumulator used in the $group stage is $first

See Optimization to Return the First Document of Each Group for an example.

具体意思大家自己翻译，我理解的意思是，将分组group的列进行排序一下，应用到我们这个场景，就是将title字段排序一下，就可能走date_-1_accountId_1_title_1索引，聚合速度可能就更快了。

以上只是猜想，实践如下：

 Sort sort = new Sort(Sort.Direction.ASC, "title");

Aggregation aggregation = Aggregation.newAggregation(
    Aggregation.match(Criteria.where("date")
    .is(date).and("accountId").is(accountId)),
    Aggregation.sort(sort),
    Aggregation.group("title").sum("count").as("totalCount")
);

测试结果，查看mongo日志，确实走了date_-1_accountId_1_title_1索引，聚合速度相对之前也快了不少。