sql vs nosql_SQL vs NoSQL：如何选择

最新推荐文章于 2024-09-22 11:17:00 发布

culi3118

最新推荐文章于 2024-09-22 11:17:00 发布

阅读量178

点赞数

文章标签：数据库 python java 大数据 mysql

原文链接：https://www.sitepoint.com/sql-vs-nosql-choose/

版权

sql vs nosql

In the previous article we discussed the primary differences between SQL and NoSQL databases. In this follow-up, we’ll apply our knowledge to specific scenarios and determine the best option.

在上一篇文章中，我们讨论了SQL和NoSQL数据库之间的主要区别。在此后续活动中，我们将把我们的知识应用于特定的场景并确定最佳选择。

To recap:

回顾一下：

SQL databases:

SQL数据库：

store related data in tables
将相关数据存储在表中
require a schema which defines tables prior to use
需要使用架构先定义表格
encourage normalization to reduce data redundancy
鼓励规范化以减少数据冗余
support table JOINs to retrieve related data from multiple tables in a single command
支持表联接，可在单个命令中从多个表中检索相关数据
implement data integrity rules
实施数据完整性规则
provide transactions to guarantee two or more updates succeed or fail as an atomic unit
提供事务以确保两个或多个更新作为一个原子单元成功或失败
can be scaled (with some effort)
可以缩放(需要一些努力)
use a powerful declarative language for querying
使用强大的声明性语言进行查询
offer plenty of support, expertise and tools.
提供大量的支持，专业知识和工具。

NoSQL databases:

NoSQL数据库：

store related data in JSON-like, name-value documents
将相关数据存储在类似JSON的名称/值文档中
can store data without specifying a schema
可以存储数据而无需指定架构
must usually be denormalized so information about an item is contained in a single document
通常必须将其规范化，以便将有关项目的信息包含在单个文档中
should not require JOINs (presuming denormalized documents are used)
不应要求JOIN(假设使用了非规范化文档)
permit any data to be saved anywhere at anytime without verification
允许将任何数据随时随地保存，无需验证
guarantee updates to a single document — but not multiple documents
确保更新单个文档，但不能更新多个文档
provide excellent performance and scalability
提供出色的性能和可扩展性
use JSON data objects for querying
使用JSON数据对象进行查询
are a newer, exciting technology.
是一种新的，令人兴奋的技术。

SQL databases are ideal for projects where requirements can be determined and robust data integrity is essential. NoSQL databases are ideal for unrelated, indeterminate or evolving data requirements where speed and scalability are more important. In simpler terms:

SQL数据库非常适合需要确定需求且鲁棒的数据完整性至关重要的项目。 NoSQL数据库非常适合无关的，不确定的或不断发展的数据需求，在这些需求中，速度和可伸缩性更为重要。简单来说：

SQL is digital. It works best for clearly defined, discrete items with exact specifications. Typical use cases are online stores and banking systems.
SQL是数字的。 它最适合于明确定义的，具有准确规格的离散项目。典型的用例是在线商店和银行系统。
NoSQL is analog. It works best for organic data with fluid requirements. Typical use cases are social networks, customer management and web analytics systems.
NoSQL是模拟的。 它最适合具有流体需求的有机数据。典型的用例是社交网络，客户管理和Web分析系统。

Few projects will be an exact fit. Either option could be viable if you have shallower or naturally denormalized data. But please be aware these simplified example scenarios with sweeping generalizations! You know more about your project than I do, and I wouldn’t recommend switching from SQL to NoSQL or vice versa unless it offers considerable benefits. It’s your choice. Consider the pros and cons at the start of your project and you can’t go wrong.

很少有项目是完全合适的。如果您的数据较浅或自然不规范，则两种方法都可行。但是请注意，这些简化的示例方案具有广泛的概括！您比我更了解您的项目，并且我不建议您从SQL切换到NoSQL，反之亦然，除非这样做能带来很多好处。这是你的选择。在项目开始时考虑利弊，您不会出错。

方案一：联系人列表 (Scenario One: a Contact List)

Let’s re-invent the wheel and implement an SQL-based address book system. Our initial naive contact table is defined with the following fields:

让我们重新发明轮子，并实现基于SQL的地址簿系统。我们的初始天真contact表定义为以下字段：

id
ID
title
标题
firstname
名字
lastname
姓
gender
性别
telephone
电话
email
电子邮件
address1
地址1
address2
地址2
address3
地址3
city
市
region
地区
zipcode
邮政编码
country
国家

Problem one: few people have a single telephone number. We probably need at least three for land-line, mobile and workplace, but it doesn’t matter how many we allocate — someone, somewhere will want more. Let’s create a separate telephone table so contacts can have as many as they like. This also normalizes our data — we don’t need a NULL for contacts without a number:

问题一：很少有人有一个电话号码。我们可能需要至少三个用于固定电话，移动电话和工作场所，但是分配多少并不重要-某个地方的人会想要更多。让我们创建一个单独的telephone表，以便联系人可以拥有任意数量的联系人。这也会规范化我们的数据-对于没有数字的联系人，我们不需要NULL：

contact_id
contact_id
name (text such as land-line, work mobile, etc.)
名称(如固定电话，办公电话等文字)
number
数

Problem two: we have the same issue with email addresses, so let’s create a similar email table:

问题二：电子邮件地址存在相同的问题，因此让我们创建一个类似的email表：

contact_id
contact_id
name (text such as home email, work email, etc.)
名称(例如家庭电子邮件，工作电子邮件等的文本)
address
地址

Problem three: we may not wish to enter a (geographic) address, or we may want to enter multiple addresses for work, home, holiday homes, etc. We therefore need a new address table:

问题三：我们可能不希望输入一个(地理位置)地址，或者我们可能想为工作，住所，度假屋等输入多个地址。因此，我们需要一个新的address表：

contact_id
contact_id
name (text such as home, office, etc.)
名称(例如家庭，办公室等文字)
address1
地址1
address2
地址2
address3
地址3
city
市
region
地区
zipcode
邮政编码
country
国家

Our original contact table has been reduced to:

我们原来的contact表已减少为：

id
ID
title
标题
firstname
名字
lastname
姓
gender
性别

Great — we have a normalized database which can store any number of telephone numbers, email addresses and addresses for any contact. Unfortunately …

太好了-我们有一个规范化的数据库，该数据库可以存储任意数量的电话号码，电子邮件地址和任何联系人的地址。不幸的是……

The schema is rigid We’ve not considered the contact’s middle name(s), date of birth, company or job role. It doesn’t matter how many fields we add, we’ll soon receive update requests for notes, anniversaries, relationship statuses, social media accounts, inside leg measurements, favorite type of cheese etc. It’s impossible to foresee every option, so we’d possibly create an otherdata table with name-value pairs to cope.

模式是严格的，我们没有考虑联系人的中间名，出生日期，公司或职位。不管添加多少字段，我们都将很快收到有关便笺，周年纪念日，关系状态，社交媒体帐户，腿内度量，最喜欢的奶酪类型等的更新请求。无法预见每个选项，因此我们d可能会创建otherdata具有名称-值对的otherdata表来应对。

The data is fragmented It’s not easy to for developers or system administrators to examine the database. The program logic will also become slower and more complex, because it’s not practical to retrieve a contact’s data in a single SELECT statement with multiple JOIN clauses. (You could, but the result would contain every combination of telephone, email and address: if someone had three telephone numbers, five emails and two addresses, the SQL query would generate thirty results.)

数据是零散的对于开发人员或系统管理员而言，检查数据库并不容易。程序逻辑也将变得更加缓慢和复杂，因为在具有多个JOIN子句的单个SELECT语句中检索联系人的数据并不现实。 (您可以，但是结果将包含电话，电子邮件和地址的每种组合：如果某人有三个电话号码，五个电子邮件和两个地址，则SQL查询将生成三十个结果。)

Finally, full-text search is difficult. If someone enters the string “SitePoint”, we must check all four tables to see if it’s part of a contact name, telephone, email or address and rank the result accordingly. If you’ve ever used WordPress’s search, you’ll understand how frustrating that can be.

最后，全文搜索很困难。如果有人输入字符串“ SitePoint” ，我们必须检查所有四个表以查看它是否属于联系人姓名，电话，电子邮件或地址，并相应地对结果进行排名。如果您曾经使用过WordPress的搜索功能，那么您将了解到它会多么令人沮丧。

NoSQL替代 (The NoSQL Alternative)

Our contact data concerns people. They are unpredictable and have differing requirements at different times. The contact list would benefit from using a NoSQL database, which stores all data about an individual in a single document in the contacts collection:

我们的联系数据涉及人员。它们是不可预测的，并且在不同时间有不同的要求。联系人列表将受益于使用NoSQL数据库，该数据库将有关个人的所有数据存储在contacts集合的单个文档中：

{
  name: [
    "Billy", "Bob", "Jones"
  ],
  company: "Fake Goods Corp",
  jobtitle: "Vice President of Data Management",
  telephone: {
    home: "0123456789",
    mobile: "9876543210",
    work: "2244668800"
  },
  email: {
    personal: "bob@myhomeemail.net",
    work: "bob@myworkemail.com"
  },
  address: {
    home: {
      line1: "10 Non-Existent Street",
      city: "Nowhere",
      country: "Australia"
    }
  },
  birthdate: ISODate("1980-01-01T00:00:00.000Z"),
  twitter: '@bobsfakeaccount',
  note: "Don't trust this guy",
  weight: "200lb",
  photo: "52e86ad749e0b817d25c8892.jpg"
}

In this example, we haven’t stored the contact’s title or gender, and we’ve added data which need not apply to anyone else. It doesn’t matter — our NoSQL database won’t mind, and we can add or remove fields at will.

在此示例中，我们没有存储联系人的标题或性别，并且添加了不需要应用于其他任何人的数据。没关系-我们的NoSQL数据库不会介意，我们可以随意添加或删除字段。

Because the contact’s data is contained in a single document, we can retrieve some or all information using a single query. A full-text search is also simpler; in MongoDB we can define an index on all contact text fields using:

由于联系人的数据包含在单个文档中，因此我们可以使用单个查询来检索部分或全部信息。全文搜索也更简单；在MongoDB中，我们可以使用以下方法在所有contact文本字段上定义索引：

db.contact.createIndex({ "$**": "text" });

then perform a full-text search using:

然后使用以下命令执行全文搜索：

db.contact.find({
  $text: { $search: "something" }
});

方案二：社交网络 (Scenario Two: a Social Network)

A social network may use similar contact data stores, but it expands on the feature set with options such as relationship links, status updates, messaging and “likes”. These facilities may be implemented and be dropped in response to user demand — it’s impossible to predict how they will evolve.

社交网络可以使用类似的联系人数据存储，但是它在功能集上扩展了诸如关系链接，状态更新，消息传递和“喜欢”之类的选项。这些功能可能会根据用户需求而实现和删除-无法预测它们将如何发展。

In addition:

此外：

Most data updates have a single point of origin: the user. It’s unlikely we’ll need to update two or more records at any one time, so transaction-like functionality is not required.
大多数数据更新都有一个原始点：用户。我们不太可能需要一次更新两个或更多记录，因此不需要类似事务的功能。
Despite what some users may think, a failed status update is unlikely to cause a global meltdown or financial loss. The application’s interface and performance take a higher priority than robust data integrity.
尽管有些用户可能会认为，状态更新失败可能不会导致全局崩溃或财务损失。与健壮的数据完整性相比，应用程序的界面和性能具有更高的优先级。

NoSQL appears to be a good fit. The database allows us to quickly implement features storing different types of data. For example, all the user’s dated status updates could be placed in a single document in the status collection:

NoSQL似乎很合适。该数据库使我们能够快速实现存储不同类型数据的功能。例如，所有用户标有日期的状态更新都可以放在status集合的单个文档中：

{
  user_id: ObjectID("65f82bda42e7b8c76f5c1969"),
  update: [
    {
      date: ISODate("2015-09-18T10:02:47.620Z"),
      text: "feeling more positive today"
    },
    {
      date: ISODate("2015-09-17T13:14:20.789Z"),
      text: "spending far too much time here"
    }
    {
      date: ISODate("2015-09-17T12:33:02.132Z"),
      text: "considering my life choices"
    }
  ]
}

While this document could become long, we can fetch a subset of the array, such as the most recent update. The whole status history for every user can also be searched quickly.

尽管此文档可能会很长，但我们可以获取数组的子集，例如最新更新。每个用户的整个状态历史记录也可以快速搜索。

Now presume we wanted to introduce an emoticon choice when posting an update. This would be a matter of adding a graphic reference to new entries in the update array. Unlike an SQL store, there’s no need to set previous message emoticons to NULL — our program logic can show a default or no image if an emoticon isn’t set.

现在假设我们要在发布更新时引入表情符号选择。这将是向update数组中的新条目添加图形引用的问题。与SQL存储区不同，无需将先前的消息表情设置为NULL-如果未设置表情，我们的程序逻辑可以显示默认图像或不显示图像。

方案三：仓库管理系统 (Scenario Three: a Warehouse Management System)

Consider a system which monitors warehoused goods. We need to record:

考虑一个监视仓库货物的系统。我们需要记录：

products arriving at the warehouse and being allocated to a specific location/bay
产品到达仓库并分配到特定位置/托架
movements of goods within the warehouse, e.g. rearranging stock so the same products are in adjacent locations
仓库内货物的移动，例如重新安排库存，以便相同的产品位于相邻的位置
orders and the subsequent removal of products from the warehouse for delivery.
订单以及随后从仓库中取出要交付的产品。

Our data requirements:

我们的数据要求：

Generic product information such as box quantities, dimensions and color can be stored, but it’s discrete data we can identify and apply to anything. We’re unlikely to be concerned with specifics, such as laptop processor speed or estimated smartphone battery life.
可以存储诸如箱数量，尺寸和颜色之类的通用产品信息，但它是我们可以识别并应用于任何事物的离散数据。我们不太可能担心细节，例如笔记本电脑处理器速度或智能手机的预计电池寿命。
It’s imperative to minimize mistakes. We can’t have products disappearing or being moved to a location where different products are already being stored.
必须尽量减少错误。我们不能让产品消失或转移到已经存储了不同产品的位置。
In its simplest form, we’re recording the transfer of items from one physical area to another — or removing from location A and placing in location B. That’s two updates for the same action.
以最简单的形式，我们正在记录物品从一个物理区域到另一个物理区域的转移-或从位置A移出并放置在位置B。这是同一动作的两次更新。

We need a robust store with enforced data integrity and transaction support. Only an SQL database will (currently) satisfy those requirements.

我们需要一个具有强制数据完整性和事务支持的强大存储。当前只有SQL数据库可以满足这些要求。

暴露自己！ (Expose Yourself!)

I hope these scenarios help, but every project is different and, ultimately, you need to make your own decision. (Although, we developers are adept at justifying our technological choices, regardless of how good they are!)

我希望这些方案对您有所帮助，但是每个项目都是不同的，最终，您需要做出自己的决定。 (尽管如此，无论开发人员多么优秀，我们的开发人员都善于证明我们的技术选择合理！)

The best advice: expose yourself to as many technologies as possible. That knowledge will allow you to make a reasoned and emotionally impartial judgment regarding SQL or NoSQL. Best of luck.

最好的建议：尽可能多地使用技术。这些知识将使您对SQL或NoSQL做出理性且情感上公正的判断。祝你好运。