SQL或NoSQL：Google App Engine-第1部分-CSDN博客

NoSQL is a trending topic and pretty much everyone from Google to Facebook has some flavor of it. In this two part post we try to answer the dilemma of Sql or NoSQL. First part will explain benefits and mechanics of each plain old Sql and new shiny NoSQL. In second part we will specifically look at Google App Engine Datastore and try to answer if it’s a better choice to a given business problem

NoSQL是一个热门话题，从Google到Facebook的几乎每个人都对此有所了解。在这两部分中，我们试图回答Sql或NoSQL的困境。第一部分将说明每个简单的旧Sql和新的闪亮的NoSQL的好处和机制。在第二部分中，我们将专门研究Google App Engine数据存储区，并尝试回答对于特定的业务问题是否是更好的选择

传统的RDBMS (The Traditional RDBMS)

It’s probably safe to say, that the majority of real-world applications rely on some type of RDBMS to store and retrieve their data. There might be plenty of reasons for your application to not choose NoSQL and stick to RDBMS instead. Let’s take a look at the key points RDBMS excel at:

可以肯定地说，大多数实际应用程序都依赖某种RDBMS来存储和检索其数据。您的应用程序可能有很多原因不选择NoSQL而是坚持使用RDBMS。让我们看一下RDBMS擅长的关键点：

Query flexibility
查询灵活性
Maintaining consistency across the dataset
在整个数据集中保持一致性
Managing transactions
管理交易
Separation of concerns / Dealing with an ever evolving application(s) underneath it
关注点分离/处理其下不断发展的应用程序

In order to maintain a consistent dataset, the RDBMS enforces integrity constraints. Every action on the dataset has to take place within a transaction with ACID properties. This guarantees, that whatever happens inside the transaction will never break the consistency of your data.

为了维护一致的数据集，RDBMS强制执行完整性约束。数据集上的每个操作都必须在具有ACID属性的事务中进行。这样可以保证事务内部发生的任何事情都不会破坏数据的一致性。

Whatever new challenges your application might face, as long as your data-model stays the same, the RDBMS will happily process your queries. This separation between database and application is ideal for multiple application accessing the same database. The structure (the relations) in your data is handled by the RDBMS. This of course requires complete knowledge of how your data is connected (provided by the relations you create), as well as strict compliance with the 12 principles of the relational model. These principles provide a solid theoretical foundation for simple and highly structured storage of (ideally) well-defined data.

无论您的应用程序可能面对什么新挑战，只要您的数据模型保持不变，RDBMS都会很高兴地处理您的查询。数据库和应用程序之间的这种分离对于多个应用程序访问同一数据库是理想的。数据中的结构(关系)由RDBMS处理。当然，这需要完全了解您的数据如何连接(由您创建的关系提供)，以及严格遵守关系模型的12条原则。这些原理为(理想)定义明确的数据的简单且高度结构化的存储提供了坚实的理论基础。

骚乱的力量 (A Disturbance in the Force)

Unfortunately, all these great features come with a few downsides, to mention a significant few:

不幸的是，所有这些出色的功能都有一些缺点，其中包括以下几个方面：

Entities with variable or complex attributes are not supported well
具有可变或复杂属性的实体不被很好地支持
Weak support for hierarchical or graph data
对层次或图形数据的支持不足
No easy way to scale
没有简单的扩展方法

First and second are both functional deficits: RDBMS need structure, and it can only structure what fits logically inside a relation. You could store complex attributes as binary strings for example, but the RDBMS won’t be able to operate efficiently on them. Variable attributes don’t go well with the static schema and every row is forced to contain every attribute. Schema updates are slow and require scheduled downtime. From a “relational point of view”, this makes absolute sense. Remember, the RDBMS handles tasks that require knowledge about the structure of your data. It has to be “informed” whenever you intend to change that structure (by adding or removing attributes for example).

首先和第二个都是功能缺陷：RDBMS需要结构，并且它只能构造逻辑上适合于关系的结构。例如，您可以将复杂属性存储为二进制字符串，但是RDBMS将无法对其进行有效操作。可变属性与静态模式不能很好地配合，并且每一行都必须包含每个属性。模式更新很慢，需要计划的停机时间。从“关系的角度”来看，这是绝对合理的。记住，RDBMS处理需要了解有关数据结构的任务。每当您打算更改结构时(例如通过添加或删除属性)，都必须“告知”它。

Relational databases are simply not great for hierarchical or graph data. These types of modelling require lot’s of one-to-many and many-to-many relationships, which can’t be modeled efficiently in a relational database. Figuratively speaking, you are trying to fit a tree or a mesh into what is essentially a table.

关系数据库根本不适用于分层数据或图形数据。这些类型的建模需要大量的一对多和多对多关系，而这些关系无法在关系数据库中有效地建模。形象地说，您正在尝试将一棵树或一个网格拟合为本质上是一张桌子。

The third weakness is performance related. RDBMS can’t easily scale out horizontally, a huge (quite literally) problem for today’s multi-million user “web-scale” applications.

第三个弱点是与性能有关。 RDBMS无法轻松地横向扩展，这对于当今数百万用户的“ Web规模”应用程序来说是一个巨大的问题(从字面上看)。

At a very basic level , databases scale by sharding. If one machine can’t handle the volume anymore, the dataset is split into subsets, shards, that can then be stored on multiple machines. A master-server handles load-balancing and routes each request to the appropriate machine (“slave”). This master-slave configuration is not the only one possible, but for the sake of brevity I won’t go into other models.

在最基本的级别上，数据库通过分片扩展。如果一台计算机无法再处理该卷，则将数据集拆分为子集(碎片)，然后可以将其存储在多台计算机上。主服务器处理负载平衡并将每个请求路由到适当的计算机(“从属”)。这种主从配置不是唯一的一种可能，但是为了简洁起见，我不再介绍其他模型。

And this is the point, RDBMS can’t automatically shard data into subsets, because the information for one application-entity is (usually) stored across several database relations. If you are not familiar with relational data modelling, normalization is a good point to start. Yes, some giant web-scale applications like Twitter run on RDBMS, but developers have to implement an application-specific sharding layer, this is not automatically handled by the RDBMS.

这就是重点，RDBMS无法自动将数据分片为子集，因为一个应用程序实体的信息(通常)存储在多个数据库关系中。如果您不熟悉关系数据建模，则标准化是一个不错的起点。是的，某些大型的Web规模应用程序(如Twitter)在RDBMS上运行，但是开发人员必须实现特定于应用程序的分片层，RDBMS不会自动处理这一层。

新品种：NoSQL (A New Breed: NoSQL)

NoSQL, “not only SQL” is a generic term used to describe a variety of databases, many of which are by no means new. But they’ve been experiencing a kind of renaissance lately. NoSQL databases aim to solve the performance problems of RDBMS by putting the structuring work back into the hands of the application programmer – You. Fortunately, applications don’t always require aggregation and structure at database level, which is why those apps can safely benefit from the vast performance improvements NoSQL databases can provide. Our goal will be to find out if your app is one of those, but first let me present a few NoSQL databases (roughly in order of complexity):

NoSQL，“不仅是SQL”，是用于描述各种数据库的通用术语，其中许多绝不是新的。但是最近他们经历了一种复兴。 NoSQL数据库旨在通过将结构化工作交还给应用程序程序员–您来解决RDBMS的性能问题。幸运的是，应用程序并不总是需要数据库级别的聚合和结构，这就是为什么这些应用程序可以安全地受益于NoSQL数据库可以提供的巨大性能改进的原因。我们的目标是确定您的应用程序是否属于其中之一，但首先让我介绍一些NoSQL数据库(大致按复杂程度排序)：

(Ordered) Key-Value stores (Apache Cassandra, Dynamo, Project Voldemort)
(已订购)键值存储(Apache Cassandra，Dynamo，Voldemort项目)
Object Stores (AppEngine Datastore)
对象存储(AppEngine数据存储)
Document stores (CouchDB)
文件储存库(CouchDB)
Tree & Graph databases (Neo4J, Twitter’s FlockDB)
树形图数据库(Neo4J，Twitter的FlockDB)

All of these solve some of the problems, and of course bring some of their own.

所有这些解决了一些问题，当然也带来了一些问题。

We now have a fairly solid understanding of mechanics and constraints of a relational and a NoSQL database. With that as a context we will take a look at Google App Engine datastore in next part of this post. Keep watching.

现在，我们对关系数据库和NoSQL数据库的机制和约束有了相当扎实的理解。以此为背景，我们将在本文的下一部分中介绍Google App Engine数据存储。一直在看。