关系型数据库负载均衡_非关系数据库和支持的混合工作负载

最新推荐文章于 2021-09-14 15:40:22 发布

culi4814

最新推荐文章于 2021-09-14 15:40:22 发布

阅读量336

点赞数

文章标签：数据库 java 大数据 mysql python

原文链接：https://www.sitepoint.com/non-relational-databases-and-supporting-mixed-workloads/

版权

关系型数据库负载均衡

This article was created in partnership with MongoDB. Thank you for supporting the partners who make SitePoint possible.

本文是与MongoDB合作创建的。 感谢您支持使SitePoint成为可能的合作伙伴。

Suppose that you’re building an e-commerce platform and as part of the exercise, you need to come up with a new data architecture for inventory management. You need to support fast, transactional workloads to actually keep track of inventory in near real-time.

假设您正在构建一个电子商务平台，并且作为练习的一部分，您需要为库存管理提出一个新的数据架构。您需要支持快速的事务性工作负载，才能真正实时地跟踪库存。

The business would also like to be able to answer questions such as “based on historical data, when should we restock on widgets and gizmos?” and “who are the people that are buying widgets and generally, where are they located?” Your data architecture needs to support mixed workloads.

该企业还希望能够回答诸如“基于历史数据，我们什么时候应该补充小部件和小物件的库存？”之类的问题。和“购买小部件的人是谁，通常它们在哪里？” 您的数据体系结构需要支持混合工作负载。

Where would you start?

你将从哪里开始？

For the transactional component, you would likely realize that you need an operational database — that is, one that allows you to conduct read, write, and update operations on your data. This should make sense as you would need to not only know how many widgets you have in your inventory but also be able to update that number when a customer purchases a widget. And you’d also need to make sure that your data layer is able to serve up a consistent view of the data to any connected applications. Otherwise, your soon-to-be unhappy customers would find themselves putting items in their carts that are not actually available.

对于事务组件，您可能会意识到需要一个可操作的数据库-即一个允许您对数据进行读取，写入和更新操作的数据库。这应该是有道理的，因为您不仅需要知道库存中有多少个小部件，而且还需要在客户购买小部件时更新该数量。而且，您还需要确保数据层能够为任何连接的应用程序提供一致的数据视图。否则，您很快就会不满意的客户会发现自己将无法实际使用的物品放入购物车中。

To support your transactional workload, there is no shortage of operational databases to choose from as the underlying technologies go back 40 years. For applications that need to handle a variety of data types and data structures, such as our inventory application, many companies have opted for newer non-relational options in lieu of relational databases such as Oracle, MySQL, or SQL Server.

为了支持您的事务处理工作，底层技术可以追溯到40年前，因此不乏可供选择的运营数据库。对于需要处理各种数据类型和数据结构的应用程序，例如我们的清单应用程序，许多公司选择了更新的非关系选项来代替关系数据库，例如Oracle，MySQL或SQL Server。

This is because non-relational databases, which do not store data in rows and columns as relational databases do, offer more flexibility in their ability to ingest and process data of various formats and shapes, saving significant amounts of time and effort during both app development and iteration cycles. Designed to scale vertically (“get a bigger machine”), traditional relational databases also have a difficult time supporting distributed requests with low latency and can run into performance limitations. This could be problematic if we have geographically distributed customers or unexpected peaks in application usage.

这是因为非关系数据库不像关系数据库那样将数据存储在行和列中，它们在摄取和处理各种格式和形状的数据方面提供了更大的灵活性，从而在两个应用程序开发期间节省了大量的时间和精力和迭代周期。传统的关系数据库被设计为垂直扩展(“获得更大的机器”)，而且在支持具有低延迟的分布式请求方面也很困难，并且会遇到性能限制。如果我们有地理上分散的客户或应用程序使用中的意外高峰，这可能会成问题。

For the purposes of discussing data architectures to support mixed workloads, let’s compare implementation with two popular non-relational operational databases : DynamoDB, which is a non-relational database service developed at AWS; and MongoDB, one of the most popular non-relational databases.

为了讨论支持混合工作负载的数据架构，让我们将实现与两个流行的非关系型操作数据库进行比较：DynamoDB，这是在AWS上开发的非关系型数据库服务；和MongoDB ，这是最受欢迎的非关系数据库之一。

DynamoDB的混合工作负载 (Mixed Workloads with DynamoDB)

DynamoDB is a fully managed cloud database service that stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects. This makes the ingestion and persistence of a large variety of data far simpler compared to using a relational database.

DynamoDB是一项完全托管的云数据库服务，将数据存储为一组键值对，其中键用作唯一标识符。键和值都可以是任何值，范围从简单的对象到复杂的复合对象。与使用关系数据库相比，这使大量数据的摄取和持久化变得更加简单。

However, for anything beyond simple queries such as the analytics we want our data architecture to support, AWS recommends that you use additional products such as Amazon EMR, Amazon Redshift, and others.

但是，除了简单的查询(例如我们希望我们的数据架构支持的分析)以外，AWS建议您使用其他产品，例如Amazon EMR，Amazon Redshift等。

Source: https://aws.amazon.com/dynamodb/

资料来源： https : //aws.amazon.com/dynamodb/

This is because the expressive power of the DynamoDB query language, or in simpler terms, the breadth of ideas that can be represented and communicated using DynamoDB’s query language, is somewhat limited. This quality is quite common amongst non-relational databases — sometimes referred to as “NoSQL” databases — which optimized for data model flexibility and scalability, oftentimes at the expense of core database functionality.

这是因为DynamoDB查询语言的表达能力，或者用更简单的术语来说，可以使用DynamoDB查询语言表示和传达的思想的广度受到一定程度的限制。在非关系型数据库(有时称为“ NoSQL”数据库)中，这种质量非常普遍，这种数据库针对数据模型的灵活性和可伸缩性进行了优化，通常会以牺牲核心数据库功能为代价。

As you can tell from the recommended pattern above, data is stored in DynamoDB, then moved to Amazon EMR, which provides a managed big data framework, for processing. The data is then piped to Amazon Redshift, a managed data warehouse for aggregation. Finally, Amazon Quicksight, a business intelligence tool, can use the aggregated data to create charts and dashboards that business users can leverage.

从上面的推荐模式可以看出，数据存储在DynamoDB中，然后移至提供了托管大数据框架进行处理的Amazon EMR。然后将数据通过管道传输到托管数据仓库Amazon Redshift进行汇总。最后，商业智能工具Amazon Quicksight可以使用汇总数据来创建业务用户可以利用的图表和仪表板。

There are quite a few moving parts in this data architecture, not to mention the added complexity of learning to work with, building on, and operating multiple components (offset some by using managed services rather than building it all on your own) and costs. And since data is being moved from system to system, there is a very good possibility that the data represented in the charts and dashboards on one end is inconsistent with the actual state of things in the source database.

此数据体系结构中有很多活动的部分，更不用说学习使用，构建和操作多个组件(通过使用托管服务而不是自己构建所有组件来抵消一些组件)和成本的额外复杂性。而且由于数据是在系统之间移动的，因此很有可能一端的图表和仪表板中表示的数据与源数据库中事物的实际状态不一致。

There’s nothing fundamentally wrong with this approach as long as you’re okay with the caveats above but let’s look at another one.

只要您对上述注意事项还可以，这种方法从根本上没有错，但是让我们来看另一种方法。

MongoDB的混合工作负载 (Mixed Workloads with MongoDB)

MongoDB is similar to DynamoDB in a few ways:

MongoDB在某些方面类似于DynamoDB：

It’s a non-relational database
这是一个非关系型数据库
It’s available as a fully managed cloud database through MongoDB Atlas
可通过MongoDB Atlas作为完全托管的云数据库使用

For the most part, that’s where the similarities end. Unlike DynamoDB, data is stored in JSON-like documents. Documents can contain as many key-value pairs or complex nested structures as an application requires. MongoDB also has an expressive query language which differentiates it from other non-relational databases. Not only is it easy to get data into the database, but it’s also easy to get data back out in ways that can serve a variety of use cases. For example, the database has an aggregation framework that allows you to perform analytics in-place without moving data to another system.

在大多数情况下，相似之处就在那里结束。与DynamoDB不同，数据存储在类似JSON的文档中。文档可以包含应用程序所需的任意多个键值对或复杂的嵌套结构。 MongoDB还具有一种表达性查询语言，可将其与其他非关系数据库区分开。不仅很容易将数据获取到数据库中，而且还很容易以可以满足各种用例的方式取回数据。例如，数据库具有一个聚合框架，该框架使您可以就地执行分析，而无需将数据移动到另一个系统。

This means our data architecture for supporting mixed workloads can be a lot simpler. If we remove Amazon EMR and Amazon Redshift (or the equivalent services from your cloud provider), we’re left with the database and our business intelligence or dashboarding tool of choice.

这意味着我们用于支持混合工作负载的数据体系结构可能要简单得多。如果我们删除Amazon EMR和Amazon Redshift(或您的云提供商提供的等效服务)，则将剩下数据库以及我们选择的商业智能或仪表板工具。

We do have another thing to consider, however — how do we ensure that analytical queries, which are typically longer-running than those supporting a transactional workload, do not impact the performance of the overall system? Luckily, MongoDB has an answer for that as well. The database natively supports replication and automated failover to ensure high availability but replica nodes can also be added and used to isolate specific workloads and queries.

但是，我们确实要考虑另一件事-我们如何确保分析查询(通常比支持事务性工作负载的查询运行时间更长)不会影响整个系统的性能？幸运的是，MongoDB也对此有一个答案。该数据库本机支持复制和自动故障转移以确保高可用性，但是还可以添加副本节点并将其用于隔离特定的工作负载和查询。

Atlas, the fully managed service for MongoDB, allows you to create a database cluster and add extra replica nodes for workload isolation (called specialized ‘analytics’ nodes) with the click of a button or simple API call. Any long-running analytical queries would hit these analytics nodes, ensuring that the performance of transactional workloads is entirely unaffected.

Atlas是MongoDB的完全托管服务，允许您单击按钮或简单的API调用来创建数据库集群并添加额外的副本节点以隔离工作负载(称为专门的“分析”节点)。任何长期运行的分析查询都将影响这些分析节点，从而确保事务性工作负载的性能完全不受影响。

Atlas also provides a self-service analytics tool in the cloud called MongoDB Charts, which runs natively on MongoDB data with no data movement or transformations. This gives you more accurate information about the true state of things because the BI tool leverages live data.

Atlas还在云中提供了一个称为MongoDB图表的自助服务分析工具，该工具可以在MongoDB数据上本地运行，而无需进行数据移动或转换。由于BI工具利用实时数据，因此可以为您提供有关事物真实状态的更准确信息。

Note that because you’d be running analytical queries against a replica, there’s also the possibility of eventual consistency. The “lag” in this scenario is likely to be shorter as it’s tied to the delay between operation on the “primary” replica and the application of that operation to the analytics replica, and not physically moving data across multiple disparate systems as shown in the previous architecture.

请注意，由于您将对副本运行分析查询，因此最终一致性的可能性也很大。 这种情况下的“滞后”可能会更短，因为它与“主”副本上的操作与该操作应用于分析副本上的延迟之间的延迟有关，而不是如图所示，在多个不同系统之间物理移动数据以前的架构。

There you have it — two different data architectures for supporting mixed workloads using non-relational databases. Each has its trade-offs. If you require complex analytics on your transactional data, it may be worth the added complexity, latency, and cost to transform your data and move it through Amazon EMR and Amazon Redshift.

在那里，您可以使用两种不同的数据架构来支持使用非关系数据库的混合工作负载。每个都有其权衡。如果您需要对交易数据进行复杂的分析，则值得增加数据的复杂性，延迟和成本，以转换数据并将其通过Amazon EMR和Amazon Redshift进行移动。

However, the analytics questions raised at the beginning of this article don’t call for this level of complexity. By selecting a database that allows you to run analytics in-place AND a way to isolate those workloads to ensure minimal performance impact to real-time operations, your architecture can be much simpler and easier to work with.

但是，本文开头提出的分析问题并不要求如此复杂。通过选择一个数据库，该数据库可以使您就地运行分析，并通过隔离这些工作负载的方式来确保对实时操作的性能影响最小，因此您的架构可以变得更加简单易用。

翻译自: https://www.sitepoint.com/non-relational-databases-and-supporting-mixed-workloads/

关系型数据库负载均衡

culi4814

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
关系型数据库负载均衡_非关系数据库和支持的混合工作负载

关系型数据库负载均衡This article was created in partnership with MongoDB. Thank you for supporting the partners who make SitePoint possible. 本文是与MongoDB合作创建的。感谢您支持使SitePoint成为可能的合作伙伴。 Suppose that you’re bui...
复制链接

扫一扫