aggregate聚合管道

最新推荐文章于 2023-07-31 17:09:04 发布

w54a3te

最新推荐文章于 2023-07-31 17:09:04 发布

阅读量535

点赞数 1

分类专栏： MongoDB 文章标签： mongodb

本文链接：https://blog.csdn.net/w54a3teg64c7bd/article/details/88425896

版权

MongoDB 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

SQL聚合映射表
$addFields
$bucket
$bucketAuto(待)
$count
$group
$limit
$lookup
- 连接条件和不相关的子查询
$match
- 限制
$project

SQL聚合映射表

下表概述了常见的SQL聚合术语、函数和概念以及相应的MongoDB聚合操作符:

SQL	MongoDB Aggregation
WHERE	$match
GROUP BY	$group
HAVING	$match
SELECT	$project
ORDER BY	$sort
LIMIT	$limit
SUM()	$sum
COUNT()	$sum $sortByCount
join	$lookup

$addFields

使用两次$addFields管道

Scores 集合：

{
  _id: 1,
  student: "Maya",
  homework: [ 10, 5, 10 ],
  quiz: [ 10, 8 ],
  extraCredit: 0
}
{
  _id: 2,
  student: "Ryan",
  homework: [ 5, 6, 5 ],
  quiz: [ 8, 8 ],
  extraCredit: 8
}

下面聚合操作在结果集中新增三列：

db.scores.aggregate( [
   {
     $addFields: {
       totalHomework: { $sum: "$homework" } ,
       totalQuiz: { $sum: "$quiz" }
     }
   },
   {
     $addFields: { totalScore:
       { $add: [ "$totalHomework", "$totalQuiz", "$extraCredit" ] } }
   }
] )

返回的结果集：

{
  "_id" : 1,
  "student" : "Maya",
  "homework" : [ 10, 5, 10 ],
  "quiz" : [ 10, 8 ],
  "extraCredit" : 0,
  "totalHomework" : 25,
  "totalQuiz" : 18,
  "totalScore" : 43
}
{
  "_id" : 2,
  "student" : "Ryan",
  "homework" : [ 5, 6, 5 ],
  "quiz" : [ 8, 8 ],
  "extraCredit" : 8,
  "totalHomework" : 16,
  "totalQuiz" : 16,
  "totalScore" : 40
}

嵌入

vehicles 集合：

{ _id: 1, type: "car", specs: { doors: 4, wheels: 4 } }
{ _id: 2, type: "motorcycle", specs: { doors: 0, wheels: 2 } }
{ _id: 3, type: "jet ski" }

向specs字段新增fuel_type字段

db.vehicles.aggregate( [
        {
           $addFields: {
              "specs.fuel_type": "unleaded"
           }
        }
   ] )

返回的结果集：

{ _id: 1, type: "car",
   specs: { doors: 4, wheels: 4, fuel_type: "unleaded" } }
{ _id: 2, type: "motorcycle",
   specs: { doors: 0, wheels: 2, fuel_type: "unleaded" } }
{ _id: 3, type: "jet ski",
   specs: { fuel_type: "unleaded" } }

覆盖现有字段

在$addFields操作中指定现有字段名会导致替换原来的字段的值

animals集合：

{ _id: 1, dogs: 10, cats: 15 }

下面聚合操作指定 cats 字段：

db.animals.aggregate( [
  {
    $addFields: { "cats": 20 }
  }
] )

返回的结果集：

{ _id: 1, dogs: 10, cats: 20 }

用集合中的一个字段去替换另一个字段也是可以的，下面例子是用item字段替换_id字段。

fruit集合：

{ "_id" : 1, "item" : "tangerine", "type" : "citrus" }
{ "_id" : 2, "item" : "lemon", "type" : "citrus" }
{ "_id" : 3, "item" : "grapefruit", "type" : "citrus" }

下面聚合操作将_id字段使用item字段的值替换，再用静态值替换item字段的值：

db.fruit.aggregate( [
  {
    $addFields: {
      _id : "$item",
      item: "fruit"
    }
  }
] )

结果集：

{ "_id" : "tangerine", "item" : "fruit", "type" : "citrus" }
{ "_id" : "lemon", "item" : "fruit", "type" : "citrus" }
{ "_id" : "grapefruit", "item" : "fruit", "type" : "citrus" }

$bucket

Field	Type	Desctiption
group by	expression	对文档进行分组的表达式。要指定字段路径. `Example: groupBy: "$price",`,$bucket包含了default的情况下可为空
boundaries	array	基于groupBy的表达式，指定了每个边界的值，必须指定至少2个值，`Example: boundaries: [0,200,400],`结果集中的`_id`为`_id: 0`,`_id: 200`。
default	literal	可选项，集合了所有不在边界内的内容。结果集中的`_id`为default指定的值
output	document	除了_id字段之外，指定要包含在输出文档中的字段的文档

Example：
artword集合：

{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
    "price" : NumberDecimal("199.99") }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
    "price" : NumberDecimal("280.00") }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
    "price" : NumberDecimal("76.04") }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
    "price" : NumberDecimal("167.30") }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
    "price" : NumberDecimal("483.00") }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
    "price" : NumberDecimal("385.00") }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
    /* No price*/ }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
    "price" : NumberDecimal("118.42") }

以价格为分组，0-200一组，200-400一组，不在0-400的一组(“other”)，输出组中的price合计，以及title名称。聚合操作：

db.artwork.aggregate( [
  {
    $bucket: {
      groupBy: "$price",
      boundaries: [ 0, 200, 400 ],
      default: "Other",
      output: {
        "count": { $sum: 1 },
        "titles" : { $push: "$title" }
      }
    }
  }
] )

结果集：

{
  "_id" : 0,
  "count" : 4,
  "titles" : [
    "The Pillars of Society",
    "Dancer",
    "The Great Wave off Kanagawa",
    "Blue Flower"
  ]
}
{
  "_id" : 200,
  "count" : 2,
  "titles" : [
    "Melancholy III",
    "Composition VII"
  ]
}
{
  "_id" : "Other",
  "count" : 2,
  "titles" : [
    "The Persistence of Memory",
    "The Scream"
  ]
}

$bucketAuto(待)

平均分配

$count

返回聚合操作结果集的数量

{ $count: <string> }

<string>是结果集中输出的表名

Example：

*scores**集合：

{ "_id" : 1, "subject" : "History", "score" : 88 }
{ "_id" : 2, "subject" : "History", "score" : 92 }
{ "_id" : 3, "subject" : "History", "score" : 97 }
{ "_id" : 4, "subject" : "History", "score" : 71 }
{ "_id" : 5, "subject" : "History", "score" : 79 }
{ "_id" : 6, "subject" : "History", "score" : 83 }

查找分数大于80的集合：

db.scores.aggregate(
  [
    {
      $match: {
        score: {
          $gt: 80
        }
      }
    },
    {
      $count: "passing_scores"
    }
  ]
)

结果集：

{ "passing_scores" : 4 }

$group

按照指定的表达式和输出将文档分组到下一阶段，为每个不同的分组创建一个文档。输出文档包含一个_id字段，该字段包含按键分隔的组。输出文档还可以包含一些计算字段，这些字段包含一些蓄电池表达式的值，这些值按 $group的_id字段分组。$ group不会对其输出文档进行排序

分组，为不同组创建一个文档，文档包含一个_id字段，还可包含一些计算字段。

原型：

{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }

<accumulator1>必须是以下计算操作：

Name	Description
$addToSet	将文档指定字段的值去重
$avg	指定字段求平均
$first
$last
$max
$mergeObjects
$min
$push
$stdDevPop
$stdDevSamp
$sum

Example:

sales集合：

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-03-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-04-04T21:23:13.331Z") }

下面的聚合操作使用$group阶段按月、日和年对文档进行分组，计算总价和平均数量，并计算每组的文档数

db.sales.aggregate(
   [
      {
        $group : {
           _id : { month: { $month: "$date" }, day: { $dayOfMonth: "$date" }, year: { $year: "$date" } },
           totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } },
           averageQuantity: { $avg: "$quantity" },
           count: { $sum: 1 }
        }
      }
   ]
)

返回的结果集：

{ "_id" : { "month" : 3, "day" : 15, "year" : 2014 }, "totalPrice" : 50, "averageQuantity" : 10, "count" : 1 }
{ "_id" : { "month" : 4, "day" : 4, "year" : 2014 }, "totalPrice" : 200, "averageQuantity" : 15, "count" : 2 }
{ "_id" : { "month" : 3, "day" : 1, "year" : 2014 }, "totalPrice" : 40, "averageQuantity" : 1.5, "count" : 2 }

$limit

限制查询出的数量

db.article.aggregate(
    { $limit : 5 }
);

$lookup

表关联

原型：

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

Field	Description
from	关联的表
localField	当前表要关联的字段名
foreignField	from集合中的字段名
as	指定要添加到输入文档的新数组字段的名称。新数组字段包含来自from集合的匹配文档。如果指定的名称已经存在于输入文档中，则覆盖现有字段

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (SELECT *
                               FROM <collection to join>
                               WHERE <foreignField>= <collection.localField>);

连接条件和不相关的子查询

原型：

{
   $lookup:
     {
       from: <collection to join>,
       let: { <var_1>: <expression>, …, <var_n>: <expression> },
       pipeline: [ <pipeline to execute on the collection to join> ],
       as: <output array field>
     }
}

Field	Description
from	关联的表
let	可选，指定要在pipeline中要使用的字段名
pipeline	指定要在已连接的集合上运行的管道。管道确定连接集合产生的文档
as	指定要添加到输入文档的新数组字段的名称。新数组字段包含来自from集合的匹配文档。如果指定的名称已经存在于输入文档中，则覆盖现有字段

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (SELECT <documents as determined from the pipeline>
                               FROM <collection to join>
                               WHERE <pipeline> );

Example:

orders 集合：

db.orders.insert([
  { "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 },
  { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 },
  { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
])

warehouses 集合：

db.warehouses.insert([
  { "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 },
  { "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 },
  { "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 },
  { "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 },
  { "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
])

以下操作将订单收货与仓库收货按项目和库存数量是否足以覆盖订单数量进行连接：

db.orders.aggregate([
   {
      $lookup:
         {
           from: "warehouses",
           let: { order_item: "$item", order_qty: "$ordered" },
           pipeline: [
              { $match:
                 { $expr:
                    { $and:
                       [
                         { $eq: [ "$stock_item",  "$$order_item" ] },
                         { $gte: [ "$instock", "$$order_qty" ] }
                       ]
                    }
                 }
              },
              { $project: { stock_item: 0, _id: 0 } }
           ],
           as: "stockdata"
         }
    }
])

结果集：

{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2,
   "stockdata" : [ { "warehouse" : "A", "instock" : 120 }, { "warehouse" : "B", "instock" : 60 } ] }
{ "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1,
   "stockdata" : [ { "warehouse" : "A", "instock" : 80 } ] }
{ "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60,
   "stockdata" : [ { "warehouse" : "A", "instock" : 80 } ] }

伪sql语句：

SELECT *, stockdata
FROM orders
WHERE stockdata IN (SELECT warehouse, instock
                    FROM warehouses
                    WHERE stock_item= orders.item
                    AND instock >= orders.ordered );

$match

只将匹配指定条件的文档传递到下一个管道阶段

限制

$m a t c h 不接受原始聚合表达式。若要在$ match中包含聚合表达式，请使用$expr查询表达式：

{ $match: { $expr: { <aggregation expression> } } }

在 $m a t c h 查询中不能使用$ where作为聚合管道的一部分
在 $m a t c h 查询中，不能使用$ near或$near sphere作为聚合管道的一部分。作为一种选择，你可以选择：
- 使用 $g e o N e a r 代替$ match
- 在 $m a t c h 阶段使用$ center或 $c e n t e r S p h e r e 的$ geoWithin查询操作符（Use $geoWithin query operator with $center or $centerSphere in the $match stage）
要在 $m a t c h 段使用$ text， $match段必须是管道的第一阶段

$project

将带请求字段的文档传递到管道中的下一阶段。指定的字段可以是输入文档中的现有字段，也可以是新计算的字段

原型：

{ $project: { <specification(s)> } }

$project接受一个文档，该文档可以指定字段的包含、_id字段的禁用、新字段的添加和现有字段值的重置。或者，您可以指定字段的排除

Form	Description
: <1 or true>	指定字段的包含
_id: <0 or false>	指定 _id字段的禁用
:	添加新字段或重置现有字段的值。
:<0 or false>	指定字段的排除

Example：

books集合：

{
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5
}

结果默认包含_id的查询：

db.books.aggregate( [ { $project : { title : 1 , author : 1 } } ] )

结果集：

{ "_id" : 1, "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }

取消_id的查询

db.books.aggregate( [ { $project : { _id: 0, title : 1 , author : 1 } } ] )

结果集：

{ "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }

w54a3te

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
aggregate聚合管道

文章目录SQL聚合映射表$addFields使用两次$addFields管道嵌入覆盖现有字段$bucket$bucketAuto(待)$count$group$limit$lookup连接条件和不相关的子查询$match限制$projectSQL聚合映射表下表概述了常见的SQL聚合术语、函数和概念以及相应的MongoDB聚合操作符:SQLMongoDB Aggregation...
复制链接

扫一扫

专栏目录