初学者MongoDB聚合概览(The Beginner’s Guide to MongoDB Aggregation)

MongoDB聚合框架用于复杂的数据检索,包括$match()筛选文档,$group()进行聚合,$sort()排序,以及$limit()限制输出。聚合查询可以分解成多个阶段,每个阶段处理数据的不同方面。聚合管道的输出可以存储到新的集合,使用$unwind()处理数组,$sort()按字段值排序,$limit()限制返回结果数量。注意,每个阶段的内存限制为100MB,超过此限制可通过allowDiskUse选项写入磁盘。
摘要由CSDN通过智能技术生成

原文链接: https://studio3t.com/knowledge-base/articles/mongodb-aggregation-framework/#introduction

The Beginner’s Guide to MongoDB Aggregation(初学者概览)

一、Introduction (简介)

When you start with MongoDB, you will use the find() command for querying data and it will probably be sufficient, but as soon as you start doing anything more advanced than data retrieval, you will need to know more about the MongoDB Aggregation Framework.

I will explain the main principles of building working queries and how to take advantage of indexes for speeding up queries.

Furthermore,I will introduce the most important stages of the aggregation pipeline with short examples using each one, and how to apply them to an aggregation pipeline.

开始用 MongoDB 的时候,用 find() 命令来查询数据就够了。但是,如果想做一些比数据检索更复杂的工作,那就需要进一步了解 MongoDB 聚合框架。

接下来我讲一讲构建可用查询的主要原则和利用索引加速查询的方法。

此外,我将通过简单明了的例子,介绍聚合管道里最重要的一些阶段,以及在聚合管道中的应用。

二、The MongoDB Aggregation Framework (MongoDB 聚合框架)

Here is a diagram to illustrate a typical pipeline.

下面只一个典型管道的图解。

图片

The $match() stage filters those documents we need to work with, those that fit our needs.

$match() 阶段的功能是按需求筛选数据。

The $group() stage does the aggregation job and finally, we $sort() the resulting documents the way we require. In order to store the documents obtained we use the $output() stage.

$group() 阶段负责聚合工作。然后,我们通过 $sort() 把生成的文档按所要求的方式排序。 $output() 用来保存文档。

The input of the pipeline can be one or several collections. The pipeline then performs successive transformations on the data until our goal is achieved. This way, we can break down a complex query into easier stages, in each of which we complete a different operation on the data. So, by the end of the query pipeline, we will have achieved all that we wanted.

管道的输入可以是一个也可以是几个集合,然后持续转换数据,直到完成要求。通过这种方式,我们可以把复杂的查询划分为几个简单阶段,每个阶段我们都对数据执行不同的操作。最后,就可以得到理想的结果。

This approach allows us to check whether our query is functioning properly at every stage by examining both its input and the output. The output of each stage will be the input of the next.

There is no limit to the number of stages used in the query, or how we combine them. To achieve optimum query performance there are a number of best practices to take into account. We will come to those later in the article.

检查每一步的输入和输出,我们可以用这个方法来判断查询是否正常运行。每一步的输出就是下一步的输入。查询中使用的阶段数量和组合是不受限制的。为了达到最佳查询效果,下文列举了一些范例供大家参考。我们将在后文继续探讨这个问题。

三、Syntax (语法)

This is an example of how we build the aggregation query:

下面是一个关于我们如何创建聚合查询的例子:

pipeline = [  
  { $match : { … },  
  { $group : { … },  
  { $sort : { … }, 
  ... 
] 
db.collectionName.aggregate(pipeline, options) 

四、Regarding Limits(一些局限)

Up to 100 MB of RAM can be used per stage. You will get an error from the database if you exceed this limit. If it becomes an unavoidable problem you can opt to page to disk, with the only disadvantage that you will wait a little longer because it is slower to work in the disk rather than memory. To choose the page to disk method, you just need to use the option allowDiskUse, in this way:

每一个阶段最多可以使用 100MB 的运行内存。如果超出这个限制,数据库就会报错。如果不得不这么做的话,可以选择页到磁盘。缺点就是要多等一会儿,因为在磁盘比在内存运行慢。选择页到磁盘,只需要加上选项 allowDiskUse,如下:

db.collectionName.aggregate(pipeline, { allowDiskUse : true })

The documents returned by the aggregation query, either as a cursor or stored via $out() in another collection, are limited to 16MB.

通过聚合查询、光标或是用$out()储存,返回的最大限制为 16MB。

If you are likely to exceed this limit, then you should specify that the output of the aggregation query will be as a cursor and not as a document.

如果超出这个限制,那么必须指定聚合查询的输出从文档改为光标。

五、Our Collections(我们的集合)

I will be giving examples of the use of all the more important stages. To illustrate the examples, I am going to use two collections. The first is called ‘universities’ and is made up of these documents (the data is not real):

对于比较重要的阶段,我会用例子来阐述。为了说明这些例子,我用两个集合。第一个由下面文档组成,“高校集合”(非真实数据):

{
  country : 'Spain',
  city : 'Salamanca',
  name : 'USAL',
  location : {
    type : 'Point',
    coordinates : [ -5.6722512,17, 40.9607792 ]
  },
  students : [
    { year : 2014, number : 24774 },
    { year : 2015, number : 23166 },
    { year : 2016, number : 21913 },
    { year : 2017, number : 21715 }
  ]
}
{
  country : 'Spain',
  city : 'Salamanca',
  name : 'UPSA',
  location : {
    type : 'Point',
    coordinates : [ -5.6691191,17, 40.9631732 ]
  },
  students : [
    { year : 2014, number : 4788 },
    { year : 2015, number : 4821 },
    { year : 2016, number : 6550 },
    { year : 2017, number : 6125 }
  ]
}

If you would like to test these examples on your own installation, you can insert them with this bulk command:
如果想在自己的数据库中测试这些例子,可以用下面的命令集插入。

use 3tdb
db.universities.insert([
{
  country : 'Spain',
  city : 'Salamanca',
  name : 'USAL',
  location : {
    type : 'Point',
    coordinates : [ -5.6722512,17, 40.9607792 ]
  },
  students : [
    { year : 2014, number : 24774 },
    { year : 2015, number : 23166 },
    { year : 2016, number : 21913 },
    { year : 2017, number : 21715 }
  ]
},
{
  country : 'Spain',
  city : 'Salamanca',
  name : 'UPSA',
  location : {
    type : 'Point',
    coordinates : [ -5.6691191,17, 40.9631732 ]
  },
  students : [
    { year : 2014, number : 4788 },
    { year : 2015, number : 4821 },
    { year : 2016, number : 6550 },
    { year : 2017, number : 6125 }
  ]
}
])

The second and last collection is called ‘courses’ and looks like this:
第二个是“课程集合”,大致如下:

{
  university : 'USAL',
  name : 'Computer Science',
  level : 'Excellent'
}
{
  university : 'USAL',
  name : 'Electronics',
  level : 'Intermediate'
}
{
  university : 'USAL',
  name : 'Communication',
  level : 'Excellent'
}

Again, you can insert them in the same way, using the following code:
当然,你也可以用下面的代码插入:

db.courses.insert([
{
  university : 'USAL',
  name : 'Computer Science',
  level : 'Excellent'
},
{
  university : 'USAL',
  name : 'Electronics',
  level : 'Intermediate'
},
{
  university : 'USAL',
  name : 'Communication',
  level : 'Excellent'
}
])

六、Aggregation Stages(联合阶段)

$match

This stage allows us to choose just those documents from a collection that we want to work with. It does this by filtering out those that do not follow our requirements.

这一阶段允许我们从想要使用的集合中选择文档,过滤掉那些不符合要求的。

In the following example, we only want to work with those documents which specify that Spain is the value of the field country, and Salamanca is the value of the field city.

In order to get a readable output, I am going to add .pretty() at the end of all the commands.

在下面的例子中,我们只想处理那些国家字段值是西班牙和城市字段值是萨拉曼卡的文档。为了结果便于理解,我会在所有命令的结尾加一个 .pretty()。

db.universities.aggregate([
  { $match : { country : 'Spain', city : 'Salamanca' } }
]).pretty()

The output is…

输出结果是:

{
    "_id" : ObjectId("5b7d9d9efbc9884f689cdba9"),
    "country" : "Spain",
    "city" : "Salamanca",
    "name" : "USAL",
    "location" : {
        "type" : "Point",
        "coordinates" : [
            -5.6722512,
            17,
            40.9607792
        ]
    },
    "students" : [
		{
			"year" : 2014,
			"number" : 24774
		},
		{
			"year" : 2015,
			"number" : 23166
		},
		{
			"year" : 2016,
			"number" : 21913
		},
		{
			"year" : 2017,
			"number" : 21715
		}
	]
}
{
	"_id" : ObjectId("5b7d9d9efbc9884f689cdbaa"),
	"country" : "Spain",
	"city" : "Salamanca",
	"name" : "UPSA",
	"location" : {
		"type" : "Point",
		"coordinates" : [
			-5.6691191,
			17,
			40.9631732
		]
	},
	"students" : [
		{
			"year" : 2014,
			"number" : 4788
		},
		{
			"year" : 2015,
			"number" : 4821
		},
		{
			"year" : 2016,
			"number" : 6550
		},
		{
			"year" : 2017,
			"number" : 6125
		}
	]
}

$project

It is rare that you ever need to retrieve all the fields in your documents. It is good practice to return only those fields you need so as to avoid processing more data than is necessary. The $project() stage is used to do this and to add any calculated fields that you need.

一般来说,在查询文档时,我们不需要所有值。通常是只返回那些你需要的字段,这样可以避免处理不必要的数据。

In this example, we only need the fields country, city and name. In the code that follows, please note that:

在这个例子里,我们只需要国家、城市、姓名这些字段。下面的代码中,请注意:

  • We must explicitly write _id : 0 when this field is not required
  • 如果我们不需要这个字段,我们必须明确地写出来 _id : 0
  • Apart from the _id field, it is sufficient to specify only those fields we need to obtain as a result of the query
  • 除了_id这个字段,只指定那些查询结果里需要的字段就可以了。

This stage …(这一阶段)

db.universities.aggregate([
  { $project : { _id : 0, country : 1, city : 1, name : 1 } }
]).pretty()

…will give the result …(结果是)

{ "country" : "Spain", "city" : "Salamanca", "name" : "USAL" }
{ "country" : "Spain", "city" : "Salamanca", "name" : "UPSA" }

Use the [KaTeX parse error: Expected 'EOF', got '#' at position 90: …mongodb-arrays/#̲how-to-use-filt…project to filter elements from MongoDB arrays.
用 $filter 操作符配合 $project 来筛选 MongoDB数组的元素。

$group

With the $group() stage, we can perform all the aggregation or summary queries that we need, such as finding counts, totals, averages or maximums.

在 $group阶段,可以执行所有需要的聚合和汇总查询,比如查找数量、总和、平均数或是最大值。

In this example, we want to know the number of documents per university in our ‘universities’ collection:

在这个例子中,我们想知道高校集合这个集合中每个大学文档的数量:

The query …(查询语句)

db.universities.aggregate([
  { $group : { _id : '$name', totaldocs : { $sum : 1 } } }
]).pretty()

…will produce this result …(产出结果)

{ "_id" : "UPSA", "totaldocs" : 1 }
{ "_id" : "USAL", "totaldocs" : 1 }

$out

This is an unusual type of stage because it allows you to carry the results of your aggregation over into a new collection, or into an existing one after dropping it, or even adding them to the existing documents (new in 4.1.2 version).

这个阶段不太常见,其功能是将聚合的结果转移到一个新集合中,或者将其删除后转移到一个现有的集合中,甚至将其添加到现有文档中。(版本4.1.2新支持)

The $out() operator must be the last stage in the pipeline. For the first time, we are using an aggregation with more than one stage. We now have two, a $group() and an $out():

o u

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值