aggregate聚合管道


image

SQL聚合映射表

下表概述了常见的SQL聚合术语、函数和概念以及相应的MongoDB聚合操作符:

SQLMongoDB Aggregation
WHERE$match
GROUP BY$group
HAVING$match
SELECT$project
ORDER BY$sort
LIMIT$limit
SUM()$sum
COUNT()$sum $sortByCount
join$lookup

$addFields

使用两次$addFields管道

Scores 集合:

{
  _id: 1,
  student: "Maya",
  homework: [ 10, 5, 10 ],
  quiz: [ 10, 8 ],
  extraCredit: 0
}
{
  _id: 2,
  student: "Ryan",
  homework: [ 5, 6, 5 ],
  quiz: [ 8, 8 ],
  extraCredit: 8
}

下面聚合操作在结果集中新增三列:

db.scores.aggregate( [
   {
     $addFields: {
       totalHomework: { $sum: "$homework" } ,
       totalQuiz: { $sum: "$quiz" }
     }
   },
   {
     $addFields: { totalScore:
       { $add: [ "$totalHomework", "$totalQuiz", "$extraCredit" ] } }
   }
] )

返回的结果集:

{
  "_id" : 1,
  "student" : "Maya",
  "homework" : [ 10, 5, 10 ],
  "quiz" : [ 10, 8 ],
  "extraCredit" : 0,
  "totalHomework" : 25,
  "totalQuiz" : 18,
  "totalScore" : 43
}
{
  "_id" : 2,
  "student" : "Ryan",
  "homework" : [ 5, 6, 5 ],
  "quiz" : [ 8, 8 ],
  "extraCredit" : 8,
  "totalHomework" : 16,
  "totalQuiz" : 16,
  "totalScore" : 40
}

嵌入

vehicles 集合:

{ _id: 1, type: "car", specs: { doors: 4, wheels: 4 } }
{ _id: 2, type: "motorcycle", specs: { doors: 0, wheels: 2 } }
{ _id: 3, type: "jet ski" }

向specs字段新增fuel_type字段

db.vehicles.aggregate( [
        {
           $addFields: {
              "specs.fuel_type": "unleaded"
           }
        }
   ] )

返回的结果集:

{ _id: 1, type: "car",
   specs: { doors: 4, wheels: 4, fuel_type: "unleaded" } }
{ _id: 2, type: "motorcycle",
   specs: { doors: 0, wheels: 2, fuel_type: "unleaded" } }
{ _id: 3, type: "jet ski",
   specs: { fuel_type: "unleaded" } }

覆盖现有字段

在$addFields操作中指定现有字段名会导致替换原来的字段的值

animals集合:

{ _id: 1, dogs: 10, cats: 15 }

下面聚合操作指定 cats 字段:

db.animals.aggregate( [
  {
    $addFields: { "cats": 20 }
  }
] )

返回的结果集:

{ _id: 1, dogs: 10, cats: 20 }

用集合中的一个字段去替换另一个字段也是可以的,下面例子是用item字段替换_id字段。

fruit集合:

{ "_id" : 1, "item" : "tangerine", "type" : "citrus" }
{ "_id" : 2, "item" : "lemon", "type" : "citrus" }
{ "_id" : 3, "item" : "grapefruit", "type" : "citrus" }

下面聚合操作将_id字段使用item字段的值替换,再用静态值替换item字段的值:

db.fruit.aggregate( [
  {
    $addFields: {
      _id : "$item",
      item: "fruit"
    }
  }
] )

结果集:

{ "_id" : "tangerine", "item" : "fruit", "type" : "citrus" }
{ "_id" : "lemon", "item" : "fruit", "type" : "citrus" }
{ "_id" : "grapefruit", "item" : "fruit", "type" : "citrus" }

$bucket

FieldTypeDesctiption
group byexpression对文档进行分组的表达式。要指定字段路径. Example: groupBy: "$price",,$bucket包含了default的情况下可为空
boundariesarray基于groupBy的表达式,指定了每个边界的值,必须指定至少2个值,Example: boundaries: [0,200,400],结果集中的_id_id: 0,_id: 200
defaultliteral可选项,集合了所有不在边界内的内容。结果集中的_id为default指定的值
outputdocument除了_id字段之外,指定要包含在输出文档中的字段的文档

Example:
artword集合:

{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
    "price" : NumberDecimal("199.99") }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
    "price" : NumberDecimal("280.00") }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
    "price" : NumberDecimal("76.04") }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
    "price" : NumberDecimal("167.30") }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
    "price" : NumberDecimal("483.00") }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
    "price" : NumberDecimal("385.00") }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
    /* No price*/ }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
    "price" : NumberDecimal("118.42") }

以价格为分组,0-200一组,200-400一组,不在0-400的一组(“other”),输出组中的price合计,以及title名称。聚合操作:

db.artwork.aggregate( [
  {
    $bucket: {
      groupBy: "$price",
      boundaries: [ 0, 200, 400 ],
      default: "Other",
      output: {
        "count": { $sum: 1 },
        "titles" : { $push: "$title" }
      }
    }
  }
] )

结果集:

{
  "_id" : 0,
  "count" : 4,
  "titles" : [
    "The Pillars of Society",
    "Dancer",
    "The Great Wave off Kanagawa",
    "Blue Flower"
  ]
}
{
  "_id" : 200,
  "count" : 2,
  "titles" : [
    "Melancholy III",
    "Composition VII"
  ]
}
{
  "_id" : "Other",
  "count" : 2,
  "titles" : [
    "The Persistence of Memory",
    "The Scream"
  ]
}

$bucketAuto(待)

平均分配

$count

返回聚合操作结果集的数量

{ $count: <string> }

<string>是结果集中输出的表名

Example:

*scores**集合:

{ "_id" : 1, "subject" : "History", "score" : 88 }
{ "_id" : 2, "subject" : "History", "score" : 92 }
{ "_id" : 3, "subject" : "History", "score" : 97 }
{ "_id" : 4, "subject" : "History", "score" : 71 }
{ "_id" : 5, "subject" : "History", "score" : 79 }
{ "_id" : 6, "subject" : "History", "score" : 83 }

查找分数大于80的集合:

db.scores.aggregate(
  [
    {
      $match: {
        score: {
          $gt: 80
        }
      }
    },
    {
      $count: "passing_scores"
    }
  ]
)

结果集:

{ "passing_scores" : 4 }

$group

按照指定的表达式和输出将文档分组到下一阶段,为每个不同的分组创建一个文档。输出文档包含一个_id字段,该字段包含按键分隔的组。输出文档还可以包含一些计算字段,这些字段包含一些蓄电池表达式的值,这些值按 g r o u p 的 i d 字 段 分 组 。 group的_id字段分组。 groupidgroup不会对其输出文档进行排序

分组,为不同组创建一个文档,文档包含一个_id字段,还可包含一些计算字段

原型:

{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }

<accumulator1>必须是以下计算操作:

NameDescription
$addToSet将文档指定字段的值去重
$avg指定字段求平均
$first
$last
$max
$mergeObjects
$min
$push
$stdDevPop
$stdDevSamp
$sum

Example:

sales集合:

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-03-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-04-04T21:23:13.331Z") }

下面的聚合操作使用$group阶段按月、日和年对文档进行分组,计算总价和平均数量,并计算每组的文档数

db.sales.aggregate(
   [
      {
        $group : {
           _id : { month: { $month: "$date" }, day: { $dayOfMonth: "$date" }, year: { $year: "$date" } },
           totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } },
           averageQuantity: { $avg: "$quantity" },
           count: { $sum: 1 }
        }
      }
   ]
)

返回的结果集:

{ "_id" : { "month" : 3, "day" : 15, "year" : 2014 }, "totalPrice" : 50, "averageQuantity" : 10, "count" : 1 }
{ "_id" : { "month" : 4, "day" : 4, "year" : 2014 }, "totalPrice" : 200, "averageQuantity" : 15, "count" : 2 }
{ "_id" : { "month" : 3, "day" : 1, "year" : 2014 }, "totalPrice" : 40, "averageQuantity" : 1.5, "count" : 2 }

$limit

限制查询出的数量

db.article.aggregate(
    { $limit : 5 }
);

$lookup

表关联

原型:

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}
FieldDescription
from关联的表
localField当前表要关联的字段名
foreignFieldfrom集合中的字段名
as指定要添加到输入文档的新数组字段的名称。新数组字段包含来自from集合的匹配文档。如果指定的名称已经存在于输入文档中,则覆盖现有字段
SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (SELECT *
                               FROM <collection to join>
                               WHERE <foreignField>= <collection.localField>);

连接条件和不相关的子查询

原型:

{
   $lookup:
     {
       from: <collection to join>,
       let: { <var_1>: <expression>, …, <var_n>: <expression> },
       pipeline: [ <pipeline to execute on the collection to join> ],
       as: <output array field>
     }
}
FieldDescription
from关联的表
let可选,指定要在pipeline中要使用的字段名
pipeline指定要在已连接的集合上运行的管道。管道确定连接集合产生的文档
as指定要添加到输入文档的新数组字段的名称。新数组字段包含来自from集合的匹配文档。如果指定的名称已经存在于输入文档中,则覆盖现有字段
SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (SELECT <documents as determined from the pipeline>
                               FROM <collection to join>
                               WHERE <pipeline> );

Example:

orders 集合:

db.orders.insert([
  { "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 },
  { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 },
  { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
])

warehouses 集合:

db.warehouses.insert([
  { "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 },
  { "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 },
  { "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 },
  { "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 },
  { "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
])

以下操作将订单收货与仓库收货按项目和库存数量是否足以覆盖订单数量进行连接:

db.orders.aggregate([
   {
      $lookup:
         {
           from: "warehouses",
           let: { order_item: "$item", order_qty: "$ordered" },
           pipeline: [
              { $match:
                 { $expr:
                    { $and:
                       [
                         { $eq: [ "$stock_item",  "$$order_item" ] },
                         { $gte: [ "$instock", "$$order_qty" ] }
                       ]
                    }
                 }
              },
              { $project: { stock_item: 0, _id: 0 } }
           ],
           as: "stockdata"
         }
    }
])

结果集:

{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2,
   "stockdata" : [ { "warehouse" : "A", "instock" : 120 }, { "warehouse" : "B", "instock" : 60 } ] }
{ "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1,
   "stockdata" : [ { "warehouse" : "A", "instock" : 80 } ] }
{ "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60,
   "stockdata" : [ { "warehouse" : "A", "instock" : 80 } ] }

伪sql语句:

SELECT *, stockdata
FROM orders
WHERE stockdata IN (SELECT warehouse, instock
                    FROM warehouses
                    WHERE stock_item= orders.item
                    AND instock >= orders.ordered );

$match

只将匹配指定条件的文档传递到下一个管道阶段

限制

  • m a t c h 不 接 受 原 始 聚 合 表 达 式 。 若 要 在 match不接受原始聚合表达式。若要在 matchmatch中包含聚合表达式,请使用$expr查询表达式:
{ $match: { $expr: { <aggregation expression> } } }
  • m a t c h 查 询 中 不 能 使 用 match查询中不能使用 match使where作为聚合管道的一部分
  • m a t c h 查 询 中 , 不 能 使 用 match查询中,不能使用 match使near或$near sphere作为聚合管道的一部分。作为一种选择,你可以选择:
    • 使用 g e o N e a r 代 替 geoNear代替 geoNearmatch
    • m a t c h 阶 段 使 用 match阶段使用 match使center或 c e n t e r S p h e r e 的 centerSphere的 centerSpheregeoWithin查询操作符(Use $geoWithin query operator with $center or $centerSphere in the $match stage)
  • 要在 m a t c h 段 使 用 match段使用 match使text, $match段必须是管道的第一阶段

$project

将带请求字段的文档传递到管道中的下一阶段。指定的字段可以是输入文档中的现有字段,也可以是新计算的字段

原型:

{ $project: { <specification(s)> } }

$project接受一个文档,该文档可以指定字段的包含、_id字段的禁用、新字段的添加和现有字段值的重置。或者,您可以指定字段的排除

FormDescription
: <1 or true>指定字段的包含
_id: <0 or false>指定 _id字段的禁用
: 添加新字段或重置现有字段的值。
:<0 or false>指定字段的排除

Example:

books集合:

{
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5
}

结果默认包含_id的查询:

db.books.aggregate( [ { $project : { title : 1 , author : 1 } } ] )

结果集:

{ "_id" : 1, "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }

取消_id的查询

db.books.aggregate( [ { $project : { _id: 0, title : 1 , author : 1 } } ] )

结果集:

{ "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值