图形数据库关系数据库_在微服务中使用图形数据库

最新推荐文章于 2024-01-28 03:09:26 发布

weixin_26755331

最新推荐文章于 2024-01-28 03:09:26 发布

阅读量253

点赞数

原文链接：https://medium.com/@yoad.gidron/using-a-graph-database-with-microservices-1029a8d28d03

版权

图形数据库关系数据库

Microservices architecture doesn’t require an introduction. It has been widely adopted in recent years by many companies in various domains and sizes. Some companies have redesigned their monolith architecture and broke it up to microservices, while others have built it up from the ground up as a pure microservices architecture.

微服务架构不需要介绍。近年来，许多领域和规模的公司已广泛采用它。一些公司重新设计了整体架构并将其拆分为微服务，而其他公司则从头开始将其构建为纯微服务架构。

In any case, a true microservices platform requires each service to be responsible for its own data (also known as isolated persistence or decentralized data management). This concept is implemented by the database-per-service pattern, which means that each microservice should have its own private database instance, collections or tables that are not shared with other services.

在任何情况下，真正的微服务平台都要求每个服务负责自己的数据(也称为隔离持久性或分散式数据管理)。这个概念是通过“ 每个服务的数据库”模式实现的，这意味着每个微服务都应具有自己的私有数据库实例，集合或表，这些实例不与其他服务共享。

If we allow database sharing between microservices, it means that our data model is managed by multiple services, making it hard to guarantee consistency and invariance. All the other services should either request the data through the API of the responsible service or keep a read-only non-canonical (maybe materialized) copy of it.

如果我们允许微服务之间的数据库共享，则意味着我们的数据模型由多个服务管理，从而难以保证一致性和不变性。所有其他服务应通过负责服务的API请求数据，或保留其只读的非规范(可能实现)的副本。

Now, let’s start with a contrived example, which is often being used in the context of graph databases. In our data model we have movies and actors and we want to describe the relations between them: for each movie we want to know the cast, and for each actor we want to know their filmography. This is a simplified version of the online Internet Movie Database, which is well known as IMDb.com. Also, for simplicity we will just store for each movie its title, genre and release year. Similarly, for actors we will store their name, picture URL and year of birth. In addition, we also want to store the name of the character that every actor played in each movie.

现在，让我们从一个人为设计的示例开始，该示例经常在图形数据库的上下文中使用。在我们的数据模型中，我们有电影和演员，我们想描述它们之间的关系：对于每部电影，我们都想知道演员，对于每个演员，我们都想知道他们的电影。这是在线Internet电影数据库的简化版本，该数据库众所周知为IMDb.com 。另外，为简单起见，我们将仅为每部电影存储其标题，类型和发行年份。同样，对于演员，我们将存储他们的姓名，图片网址和出生年份。此外，我们还希望存储每个演员在每部电影中扮演的角色的名称。

With this simple data model, our microservices architecture is straight forward. We probably need one microservice for managing movies and another one for actors. Following the database-pre-service pattern that was introduced earlier, we will need a database collection for movies and a separate collection for actors. If we choose a document DB (like MongoDB or Couchbase), our movie data model would like the following example:

有了这个简单的数据模型，我们的微服务架构就很简单。我们可能需要一种微服务来管理电影，而另一种微服务来帮助演员。按照前面介绍的数据库服务前模式，我们将需要一个用于电影的数据库集合和一个用于演员的单独集合。如果我们选择文档数据库(例如MongoDB或Couchbase)，那么电影数据模型将需要以下示例：

{  “id”: “m1”,  “title”: “The Matrix”,  “released”: 1999,  “genre”: “Sci-Fi”,  “actors”: [{    “actor_id”: “a1”,    “actor_name”: “Keanu Reeves”,    “character”: “Neo”  }, {    “actor_id”: “a2”,    “actor_name”: “Laurence Fishburne”,    “character”: “Morpheus”  }, {    “actor_id”: “a3”,    “actor_name”: “Carrie-Anne Moss”,    “character”: “Trinity”  }]}

And for actor:

对于演员：

{  “id”: “a1”,  “name”: “Keanu Reeves”,  “born”: 1964,  “picture”: “http://www.media.com/actors/keanu_reeves.jpg”,  “movies”: [{    “movie_id”: “m1”,    “title”: “The Matrix”,    “character”: “Neo”  }, {    “movie_id”: “m2”,    “title”: “The Matrix Reloaded”,    “character”: “Neo”  }, {    “movie_id”: “m3”,    “title”: “The Matrix Revolutions”,    “character”: “Neo”  }]}

In this example we use a kind of “foreign key” for a reference from movies to actors and vice versa. This key will be useful when we want to fetch the full data about a specific movie or actor. We also keep actor names in a movie document and movie names in an actor name. The main motivation is to keep these documents human readable, but it can also be useful for producing quick views without the need to query multiple services/collections. The down-side is that we store multiple copies of the same data, but this is not an issue as this data is immutable. Another interesting observation is that the character name is neither an attribute of a movie nor an actor. It is actually an attribute of the relationship movie-actor. In this data model it is kept as an embedded document inside both movies and actors.

在此示例中，我们使用一种“外键”作为电影到演员的参考，反之亦然。当我们想获取有关特定电影或演员的完整数据时，此键很有用。我们还将演员名称保留在电影文档中，并将电影名称保留在演员名称中。主要动机是使这些文档易于阅读，但对于生成快速视图而无需查询多个服务/集合也很有用。缺点是我们存储同一数据的多个副本，但这不是问题，因为此数据是不可变的。另一个有趣的发现是角色名称既不是电影的属性也不是演员。它实际上是关系电影演员的属性。在此数据模型中，它作为嵌入式文档保存在电影和演员中。

Image for post — Microservices architecture

Now that we have our data model and microservices defined, let’s look at some use cases.

现在我们已经定义了数据模型和微服务，让我们看一些用例。

In the first use case, we should display a simple view of a movie with it’s title, release year, genre and cast (actor and character names). Let’s assume that we already know the movie ID (e.g. we got a link in search results). The implementation of this use case is easy. The client app needs to call the movies service with the movie ID. The movies service needs to perform a single read from the database and return the movie document as a JSON object with all the required data.

在第一个用例中，我们应该显示电影的简单视图，包括其标题，发行年份，类型和演员表(演员和角色名称)。假设我们已经知道电影的ID(例如，我们在搜索结果中有一个链接)。这个用例的实现很容易。客户端应用需要使用电影ID调用电影服务。电影服务需要从数据库中执行一次读取，然后将电影文档作为JSON对象返回并包含所有必需的数据。

Great, but what if we also need to display the picture of each actor? Here it is slightly more complicated because picture URLs are not stored in the movie document. We can think of several solutions to this problem:

太好了，但是如果我们还需要显示每个演员的照片怎么办？在此稍微复杂一点，因为图片URL未存储在电影文档中。我们可以想到一些解决此问题的方法：

Add picture URLs to the actor references within the movie document. This solution would work but it has 2 major disadvantages: (a) Each time we update the picture of the actor (assuming that it is a different URL), we need to update all the references inside actor records; and (b) the more data we need to display about each actor, the more data we need to duplicate. What happens if such a requirement is introduced after we already populated the database?
将图片URL添加到电影文档中的actor引用中。该解决方案可以工作，但是有两个主要缺点：(a)每次更新角色的图片时(假设它是一个不同的URL)，我们需要更新角色记录中的所有引用； (b)我们需要显示的关于每个演员的数据越多，我们需要复制的数据就越多。如果在我们已经填充数据库之后引入了这样的要求，将会发生什么？
Let the client app handle it by fetching the data of each actor from the actors service. Clearly we can see that a basic implementation of this approach would add a lot of API calls between the client and the backend. The efficiency of this approach can be improved if we add an endpoint to the actors service that would receive a list of actor IDs instead of a single one per call. Still, the actors service will need to fetch all these documents from the database and send them back to the client.
让客户端应用通过从actors服务中获取每个actor的数据来处理它。显然，我们可以看到，此方法的基本实现会在客户端和后端之间添加许多API调用。如果我们将一个端点添加到参与者服务，该端点将接收一个参与者ID列表，而不是每个调用一个，则可以提高这种方法的效率。尽管如此，actor服务仍需要从数据库中获取所有这些文档，然后将它们发送回客户端。
Let the backend handle it by adding this logic to the movies service. With this approach, the movies service calls the actors service, fetches the extra data (picture URL in this case) and combines it in the response. As suggested in the previous solution (2), it can be improved by fetching the data of multiple actors in a single call. The downside is that now we introduced a dependency between movies and actors services.
让后端通过将此逻辑添加到电影服务来处理它。通过这种方法，电影服务调用演员服务，获取额外的数据(在这种情况下为图片URL)，并将其组合到响应中。如先前解决方案(2)中所建议，可以通过在单个调用中获取多个参与者的数据来进行改进。缺点是现在我们引入了电影和演员服务之间的依赖关系。

Now let’s look at a more complicated use case. In this use case we want to find for a specific actor all other actors that played with her or him in the same movie. How can we implement this requirement? The basic approach would be to query the actors service first and get a list of all the movies that the original actor has played in. Then we need to query the movies service and get the list of all the unique actors (without duplicates) that played in these movies. If we want their pictures or any other data which is not in the movies service, we finally need to query again the actors service in order to get their details. Yes, it would work but is there a more efficient way? For anyone that dealt with relational databases, the answer is simple: SQL.

现在，让我们看一个更复杂的用例。在这种用例中，我们想为特定演员找到与他或他在同一部电影中一起演出的所有其他演员。我们如何执行此要求？基本方法是先查询演员服务，并获取原始演员已播放过的所有电影的列表。然后，我们需要查询电影服务，并获取已播放的所有唯一演员的列表(无重复)在这些电影中。如果我们想要他们的照片或电影服务中没有的其他任何数据，我们最终需要再次查询actors服务以获取其详细信息。是的，它可以工作，但是有没有更有效的方法？对于任何处理关系数据库的人来说，答案很简单：SQL。

It’s true, with SQL we could move this logic from the service layer into the database layer and efficiently run a single SQL query to get the results. But before we do that we need to restructure our data as a relational data model. Without getting into too much detail, the basic data model that was presented above, requires 3 tables: movies, actors and characters (actor-movie relationship). At this stage, we can already see that we might have an issue with data ownership. Who owns the characters table? Is it the movies services, actors service, or should we add a 3rd microservice? Even if we resolve this issue, we still want to run a single SQL query over 3 tables (using SQL join). This would break the concept of data isolation, because a single service needs to access tables that are owned by other services. This is not the only price we have to pay for choosing a relational database. It also means that we should adhere to a strict data schema and we might run into scale issues when the size of these tables will grow over a certain limit. As we add more entities and relations to our data model (e.g., directors, writers, reviews), the number of tables will grow.

的确，使用SQL，我们可以将此逻辑从服务层移至数据库层，并有效地运行单个SQL查询以获取结果。但是在此之前，我们需要将数据重构为关系数据模型。无需过多讨论，上面介绍的基本数据模型需要3个表格：电影，演员和角色(演员与电影的关系)。在这个阶段，我们已经看到数据所有权可能有问题。谁拥有字符表？是电影服务，演员服务，还是我们应该添加第三个微服务？即使我们解决了这个问题，我们仍然希望对3个表运行单个SQL查询(使用SQL连接)。这将打破数据隔离的概念，因为单个服务需要访问其他服务拥有的表。这不是我们选择关系数据库所要付出的唯一代价。这也意味着我们应该遵守严格的数据架构，并且当这些表的大小超过一定限制时，我们可能会遇到规模问题。随着我们在数据模型中添加更多实体和关系(例如，导演，作家，评论)，表的数量将会增加。

Graph database, as the name implies, is a (NoSQL) database that uses a graph data model for semantic queries. By graph data model we mean a structure which consists of nodes and edges that connect them (i.e. relationships). The main advantage of a graph database is that it stores a graph data model in a natural and efficient manner. As we saw earlier, trying to represent a graph data model in a document or relational database may become cumbersome as it requires a lot of duplications (document) or tables (relational). With a graph database we don’t need to worry about the representation of the graph model, as the database handles it for us.

顾名思义，图形数据库是(NoSQL)数据库，它使用图形数据模型进行语义查询。通过图数据模型，我们指的是由节点和连接它们的边(即关系)组成的结构。图形数据库的主要优点是它以自然有效的方式存储图形数据模型。正如我们前面所看到的，尝试在文档或关系数据库中表示图形数据模型可能变得很麻烦，因为它需要大量重复(文档)或表(关系)。使用图数据库，我们无需担心图模型的表示形式，因为数据库会为我们处理它。

How can we use a graph database for our model? This is quite simple. We need 2 node types: Movie and Actor. And we need one relationship (edge) between movies and actors. Let’s call it Character (i.e. role). Movie nodes will have the properties: title, release_year and genre. Actor nodes will have the properties: name, birth_year and picture. Character edges will have the property name. That’s all we need. A graph database makes it easy to quickly find all the neighbors of a given node. Thus finding all the actors that played in a movie or all the movies that an actor played in is easy. Also when we need to find actors who played in the same movie, a simple query to the graph DB will return the unique results. We just need to start with a specific Actor node and look for other actors that are connected through a Movie node and 2 Character edges (we won’t go into the details of the query syntax in this article, as this is graph DB vendor specific).

如何为模型使用图形数据库？这很简单。我们需要2个节点类型：电影和演员。我们需要电影和演员之间的一种关系(边缘)。我们称它为角色(即角色)。电影节点将具有以下属性：title，release_year和genre。 Actor节点将具有以下属性：name，birth_year和picture。字符边缘将具有属性名称。这就是我们所需要的。图形数据库使快速查找给定节点的所有邻居变得容易。因此，查找电影中播放的所有演员或演员所播放的所有电影很容易。同样，当我们需要查找在同一部电影中播放过的演员时，对图数据库的简单查询将返回唯一的结果。我们只需要从一个特定的Actor节点开始，并寻找通过Movie节点和2个Character边缘连接的其他actor(我们将不讨论本文查询语法的详细信息，因为这是图形数据库供应商的特定设置) )。

Once we realized that a graph database would perfectly match our data model, it’s time to redesign our microservices. Previously we had 2 microservices: movies and actors. But, how can we manage data ownership in a graph DB? It doesn’t have any collections or tables, so we may try to split it based on node types. Naturally, movies service will own Movie nodes and actors service will own Actor nodes. Make sense, but who owns the Character relationship? Do we need a third service for it or is it shared between movies and actors? This is where our data ownership concept starts to break. And think about queries… Many queries will return nodes of different types. For example, if we look for all the actors that played in a specific movie. We start with a Movie node and find Actor nodes. If this query is executed by the movies services, it will need to access Actor nodes and vice versa. If we want to take full advantage of the graph model where each node can be connected to any other node, we lose the ability to manage data ownership and isolation between microservices.

一旦我们意识到图形数据库将完全匹配我们的数据模型，就该重新设计微服务了。以前我们有2种微服务：电影和演员。但是，我们如何管理图形数据库中的数据所有权？它没有任何集合或表，因此我们可以尝试根据节点类型对其进行拆分。自然，电影服务将拥有电影节点，而演员服务将拥有演员节点。有道理，但谁拥有角色关系？我们需要第三项服务吗？还是在电影和演员之间共享？这就是我们的数据所有权概念开始破裂的地方。考虑一下查询……许多查询将返回不同类型的节点。例如，如果我们寻找在特定电影中扮演的所有演员。我们从一个Movie节点开始，然后找到Actor节点。如果此查询由电影服务执行，则将需要访问Actor节点，反之亦然。如果要充分利用每个节点都可以连接到任何其他节点的图模型，我们将失去管理数据所有权和微服务之间隔离的能力。

Unfortunately there is no clean solution to this problem. It seems that there is an inherent contradiction between a graph database model and the database-per-service pattern. One option is to ditch this pattern and allow data sharing between microservices. We should still make sure that each node type is created and modified by a single service, but other services can read it. This could be a reasonable compromise. Another solution is to implement a separate microservice for managing the graph. This service will provide exclusive access to the graph database for all other services. The graph service will manage the entire data model and will be responsible for any read and writes from/to the Graph DB. This approach doesn’t provide real data isolation but it prevents the risk of 2 different microservices accessing the same data entity in the database.

不幸的是，没有解决此问题的解决方案。似乎在图数据库模型和每个服务数据库模式之间存在内在矛盾。一种选择是放弃这种模式，并允许微服务之间共享数据。我们仍应确保每种节点类型都是由单个服务创建和修改的，但是其他服务可以读取它。这可能是一个合理的妥协。另一个解决方案是实现单独的微服务来管理图形。该服务将为所有其他服务提供对图形数据库的独占访问。图服务将管理整个数据模型，并负责对图数据库的任何读/写操作。这种方法不能提供真正的数据隔离，但是可以防止两个不同的微服务访问数据库中相同数据实体的风险。

To summarize, we discussed the advantages of using a graph database, especially for graph-based data models. However, using a graph database in the context of a microservices architecture can be challenging in the aspects of data ownership and isolation. In order to take full advantage of the graph database capabilities, we may need to make adjustments to the microservices guidelines.

总而言之，我们讨论了使用图数据库的优势，特别是对于基于图的数据模型。但是，在微服务体系结构的上下文中使用图形数据库在数据所有权和隔离方面可能具有挑战性。为了充分利用图数据库功能，我们可能需要对微服务准则进行调整。