MongoDB 常用 Aggregation Pipeline Stages - group and lookup etc

磐石区

已于 2024-09-14 10:14:35 修改

阅读量1k

点赞数

分类专栏： MongoDB 文章标签： MongoDB aggregate group lookup stage

于 2020-02-06 11:17:25 首次发布

本文链接：https://blog.csdn.net/panshiqu/article/details/104193436

版权

MongoDB 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文介绍了将小项目从MySQL迁移到MongoDB时，如何使用Aggregation Pipeline Stages，重点关注$match、$sort、$limit、$group（包括Accumulator Operator）和$lookup。建议$match尽早应用以利用索引，$sort在$limit前可合并以优化性能，$group的_id字段是必需的，$lookup用于执行集合间的连接。

摘要由CSDN通过智能技术生成

最近有把一个小项目的底层数据库由 MySQL 修改成 MongoDB，借此更进一步熟悉了 Aggregation Pipeline Stages

本文仅包含翻译小项目 SQL 版本所需的 MongoDB Aggregation Pipeline Stages 知识，成文时 MongoDB 最新版本为 V4.2

db.collection.aggregate( [ { <stage> }, ... ] )

注：除了 $o u t 、$ merge、$geoNear 所有 stage（阶段）均可在 pipeline（管道）中出现多次，意味着本文所提这些均可出现多次

Examples

示例仅演示本文相关 stage 主要用法，详细请访问各个 stage 的官方文档

db.message.insertMany([
    {"id": 1,"num": 2},
    {"id": 1,"num": 4},
    {"id": 2,"num": 6},
    {"id": 2,"num": 4},
    {"id": 2,"num": 2},
    {"id": 3,"num": 2}
]);

db.user.insertMany([
    {"id": 1,"name": "n1"},
    {"id": 2,"name": "n2"},
    {"id": 3,"name": "n3"}
]);

db.message.aggregate([{
    $match: {
        id: 2
    }
}, {
    $sort: {
        num: 1
    }
}, {
    $limit: 2
}, {
    $project: {
        _id: 0
    }
}]);

[
    {"id": 2,"num": 2},
    {"id": 2,"num": 4}
]

db.message.aggregate([{
    $group: {
        _id: "$id",
        max: {
            $max: "$num"
        },
        sum: {
            $sum: "$num"
        },
        count: {
            $sum: 1
        }
    }
}]);

[
    {"_id": 3,"max": 2,"sum": 2,"count": 1},
    {"_id": 2,"max": 6,"sum": 12,"count": 3},
    {"_id": 1,"max": 4,"sum": 6,"count": 2}
]

db.user.aggregate([{
    $match: {
        id: 3
    }
}, {
    $lookup: {
        from: "message",
        localField: "id",
        foreignField: "id",
        as: "messages"
    }
}, {
    $project: {
        _id: 0,
        "messages._id": 0
    }
}]);

{"id": 3,"name": "n3","messages": [
        {"id": 3,"num": 2}
    ]
}

$match

{ $match: { <query> } }

Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of processing down the pipe

If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes

$sort

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

set the sort order to 1 or -1 to specify an ascending or descending sort respectively

When a $sort precedes a $limit and there are no intervening stages that modify the number of documents, the optimizer can coalesce the $limit into the $sort

The $sort stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $sort will produce an error. To allow for the handling of large datasets, set the allowDiskUse option to true to enable $sort operations to write to temporary files

$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the $project, $unwind, and $group aggregation operators. If $project, $unwind, or $group occur prior to the $sort operation, $sort cannot use any indexes

$limit

{ $limit: <positive integer> }

$limit takes a positive integer that specifies the maximum number of documents to pass along

$group

{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }

The _id field is mandatory; however, you can specify an _id value of null, or any other constant value, to calculate accumulated values for all the input documents as a whole

The remaining computed fields are optional and computed using the <accumulator> operators

Accumulator Operator

$avg Returns an average of numerical values. Ignores non-numeric values

$max Returns the highest expression value for each group

$min Returns the lowest expression value for each group

$sum Returns a sum of numerical values. Ignores non-numeric values

$lookup

Equality Match

To perform an equality match between a field from the input documents with a field from the documents of the “joined” collection

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

from Specifies the collection in the same database to perform the join with. The from collection cannot be sharded

localField Specifies the field from the documents input to the $lookup stage

foreignField Specifies the field from the documents in the from collection

as Specifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten

if does not contain the localField or foreignField, the $lookup treats the value as null for matching purposes

Join Conditions and Uncorrelated Sub-queries

略

$skip

{ $skip: <positive integer> }

$skip takes a positive integer that specifies the maximum number of documents to skip

$sample

{ $sample: { size: <positive integer> } }

Randomly selects the specified number of documents from its input

$sample may output the same document more than once in its result set

$count

{ $count: <string> }

<string> is the name of the output field which has the count as its value. <string> must be a non-empty string, must not start with $ and must not contain the . character

The $count stage is equivalent to the following $group + $project sequence

db.collection.aggregate( [
   { $group: { _id: null, myCount: { $sum: 1 } } },
   { $project: { _id: 0 } }
] )

磐石区

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录