mongodb nosql_在MongoDB NoSQL数据库中使用JOIN

最新推荐文章于 2020-09-11 16:11:55 发布

culi4814

最新推荐文章于 2020-09-11 16:11:55 发布

阅读量379

点赞数

文章标签：数据库 java mysql python 大数据

原文链接：https://www.sitepoint.com/using-joins-in-mongodb-nosql-databases/

版权

mongodb nosql

Thanks to Julian Motz for kindly helping to peer review this article.

感谢朱利安·莫茨 ( Julian Motz)的帮助，以帮助同行审阅本文。

One of the biggest differences between SQL and NoSQL databases is JOIN. In relational databases, the SQL JOIN clause allows you to combine rows from two or more tables using a common field between them. For example, if you have tables of books and publishers, you can write SQL commands such as:

SQL和NoSQL数据库之间最大的区别之一是JOIN。在关系数据库中，SQL JOIN子句允许您使用两个或多个表之间的公共字段来合并它们之间的行。例如，如果有books和publishers表，则可以编写SQL命令，例如：

SELECT book.title, publisher.name
FROM book
LEFT JOIN book.publisher_id ON publisher.id;

In other words, the book table has a publisher_id field which references the id field in the publisher table.

换句话说， book表具有一个publisher_id字段，该字段引用了publisher表中的id字段。

This is practical, since a single publisher could offer thousands of books. If we ever need to update a publisher’s details, we can change a single record. Data redundancy is minimized, since we don’t need to repeat the publisher information for every book. The technique is known as normalization.

这是可行的，因为单个出版商可以提供数千本书。如果我们需要更新发布者的详细信息，则可以更改单个记录。由于我们不需要为每本书重复出版商信息，因此最大限度地减少了数据冗余。该技术称为标准化 。

SQL databases offer a range of normalization and constraint features to ensure relationships are maintained.

SQL数据库提供了一系列规范化和约束功能，以确保维持关系。

NoSQL ==没有加入？ (NoSQL == No JOIN?)

Not always …

不总是 …

Document-oriented databases such as MongoDB are designed to store denormalized data. Ideally, there should be no relationship between collections. If the same data is required in two or more documents, it must be repeated.

面向文档的数据库(例如MongoDB)旨在存储非规范化数据。理想情况下，集合之间应该没有关系。如果两个或更多个文档中需要相同的数据，则必须重复该数据。

This can be frustrating, since there are few situations where you never need relational data. Fortunately, MongoDB 3.2 introduces a new $lookup operator which can perform a LEFT-OUTER-JOIN-like operation on two or more collections. But there’s a catch …

这可能令人沮丧，因为在极少数情况下，您永远不需要关系数据。幸运的是，MongoDB 3.2引入了新的$lookup运算符，该运算符可以对两个或多个集合执行类似于LEFT-OUTER-JOIN的操作。但是有一个陷阱……

MongoDB聚合 (MongoDB Aggregation)

$lookup is only permitted in aggregation operations. Think of these as a pipeline of operators which query, filter and group a result. The output of one operator is used as the input for the next.

$lookup仅在聚合操作中允许。可以将它们视为查询，过滤和分组结果的运算符管道。一个运算符的输出用作下一个运算符的输入。

Aggregation is more difficult to understand than simpler find queries and will generally run slower. However, they are powerful and an invaluable option for complex search operations.

聚合比简单的find查询更难理解，并且通常运行速度较慢。但是，它们功能强大，是复杂搜索操作的宝贵选择。

Aggregation is best explained with an example. Presume we’re creating a social media platform with a user collection. It stores every user’s details in separate documents. For example:

最好用一个例子来解释聚合。假设我们正在创建一个包含user集合的社交媒体平台。它将每个用户的详细信息存储在单独的文档中。例如：

{
  "_id": ObjectID("45b83bda421238c76f5c1969"),
  "name": "User One",
  "email: "userone@email.com",
  "country": "UK",
  "dob": ISODate("1999-09-13T00:00:00.000Z")
}

We can add as many fields as necessary, but all MongoDB documents require an _id field which has a unique value. The _id is similar to an SQL primary key, and will be inserted automatically if necessary.

我们可以根据需要添加任意多个字段，但是所有MongoDB文档都需要一个具有唯一值的_id字段。 _id与SQL主键类似，如有必要，将自动插入。

Our social network now requires a post collection, which stores numerous insightful updates from users. The documents store the text, date, a rating and a reference to the user who wrote it in a user_id field:

我们的社交网络现在需要一个post集，该post集可以存储来自用户的大量有见地的更新。这些文档在user_id字段中存储文本，日期，等级和对编写该文本的用户的引用：

{
  "_id": ObjectID("17c9812acff9ac0bba018cc1"),
  "user_id": ObjectID("45b83bda421238c76f5c1969"),
  "date: ISODate("2016-09-05T03:05:00.123Z"),
  "text": "My life story so far",
  "rating": "important"
}

We now want to show the last twenty posts with an “important” rating from all users in reverse chronological order. Each returned document should contain the text, the time of the post and the associated user’s name and country.

现在，我们要按相反的时间顺序显示所有用户的“重要”评级的最后二十篇帖子。每个返回的文档应包含文本，发布时间以及关联的用户名和国家/地区。

The MongoDB aggregate query is passed an array of pipeline operators which define each operation in order. First, we need to extract all documents from the post collection which have the correct rating using the $match filter:

MongoDB 聚合查询传递给管道操作符数组，这些操作符按顺序定义每个操作。首先，我们需要使用$match过滤器从post集中提取所有具有正确评分的文档：

{ "$match": { "rating": "important" } }

We must now sort the matched items into reverse date order using the $sort operator:

现在，我们必须使用$sort运算符将匹配项按逆序$sort ：

{ "$sort": { "date": -1 } }

Since we only require twenty posts, we can apply a $limit stage so MongoDB only needs to process data we want:

由于我们只需要二十个帖子，因此我们可以应用$limit阶段，因此MongoDB只需要处理我们想要的数据：

{ "$limit": 20 }

We can now join data from the user collection using the new $lookup operator. It requires an object with four parameters:

现在，我们可以使用新的$lookup运算符$lookup user集合中的数据。它需要一个具有四个参数的对象：

localField: the lookup field in the input document
localField ：输入文档中的查找字段
from: the collection to join
from ：收藏加盟
foreignField: the field to lookup in the from collection
foreignField ：要在from集合中查找的字段
as: the name of the output field.
as ：输出字段的名称。

Our operator is therefore:

因此，我们的运营商是：

{ "$lookup": {
  "localField": "user_id",
  "from": "user",
  "foreignField": "_id",
  "as": "userinfo"
} }

This will create a new field in our output named userinfo. It contains an array where each value is the matching the user document:

这将在我们的输出中创建一个名为userinfo的新字段。它包含一个数组，其中每个值都是与user文档匹配的数组：

"userinfo": [
  { "name": "User One", ... }
]

We have a one-to-one relationship between the post.user_id and user._id, since a post can only have one author. Therefore, our userinfo array will only ever contain one item. We can use the $unwind operator to deconstruct it into a sub-document:

由于post只能有一位作者，因此post.user_id和user._id之间存在一对一的关系。因此，我们的userinfo数组将仅包含一项。我们可以使用$unwind运算符将其解构为子文档：

{ "$unwind": "$userinfo" }

The output will now be converted to a more practical format which can have further operators applied:

现在，输出将转换为更实用的格式，可以应用更多运算符：

"userinfo": {
  "name": "User One",
  "email: "userone@email.com",
  …
}

Finally, we can return the text, the time of the post, the user’s name and country using a $project stage in the pipeline:

最后，我们可以使用管道中的$project阶段返回文本，发布时间，用户名和国家/地区：

{ "$project": {
  "text": 1,
  "date": 1,
  "userinfo.name": 1,
  "userinfo.country": 1
} }

放在一起 (Putting It All Together)

Our final aggregate query matches posts, sorts into order, limits to the latest twenty items, joins user data, flattens the user array and returns necessary fields only. The full command:

我们的最终汇总查询将匹配帖子，进行排序，限制最新的20个项目，加入用户数据，展平用户数组并仅返回必要的字段。完整命令：

db.post.aggregate([
  { "$match": { "rating": "important" } },
  { "$sort": { "date": -1 } },
  { "$limit": 20 },
  { "$lookup": {
    "localField": "user_id",
    "from": "user",
    "foreignField": "_id",
    "as": "userinfo"
  } },
  { "$unwind": "$userinfo" },
  { "$project": {
    "text": 1,
    "date": 1,
    "userinfo.name": 1,
    "userinfo.country": 1
  } }
]);

The result is a collection of up to twenty documents. For example:

结果是最多收集二十个文档。例如：

[
  {
    "text": "The latest post",
    "date: ISODate("2016-09-27T00:00:00.000Z"),
    "userinfo": {
      "name": "User One",
      "country": "UK"
    }
  },
  {
    "text": "Another post",
    "date: ISODate("2016-09-26T00:00:00.000Z"),
    "userinfo": {
      "name": "User One",
      "country": "UK"
    }
  }
  ...
]

大！我终于可以切换到NoSQL了！ (Great! I Can Finally Switch to NoSQL!)

MongoDB $lookup is useful and powerful, but even this basic example requires a complex aggregate query. It’s not a substitute for the more powerful JOIN clause offered in SQL. Neither does MongoDB offer constraints; if a user document is deleted, orphan post documents would remain.

MongoDB $lookup是有用且强大的，但是即使是这个基本示例也需要复杂的聚合查询。它不能替代SQL中提供的功能更强大的JOIN子句。 MongoDB也不提供约束；如果删除user文档，则将保留孤立的post文档。

Ideally, the $lookup operator should be required infrequently. If you need it a lot, you’re possibly using the wrong data store …

理想情况下，应该很少需要$lookup运算符。如果您非常需要它，则可能使用了错误的数据存储…

If you have relational data, use a relational (SQL) database!

如果您有关系数据，请使用关系(SQL)数据库！

That said, $lookup is a welcome addition to MongoDB 3.2. It can overcome some of the more frustrating issues when using small amounts of relational data in a NoSQL database.

也就是说， $lookup是MongoDB 3.2的一个受欢迎的功能。当在NoSQL数据库中使用少量关系数据时，它可以克服一些更令人沮丧的问题。