云应用设计模式(一)

Design Patterns 设计模式

  • 项目Project
  • 2015/08/26
  • 6 分钟可看完You can watch it in six minutes

在这里插入图片描述

The guide contains twenty-four design patterns that are useful in cloud-hosted applications. Each pattern is provided in a common format that describes the context and problem, the solution, issues and considerations for applying the pattern, and an example based on Microsoft Azure. Each pattern also includes links to other related patterns.

该指南包含24种在云托管应用程序中有用的设计模式。每个模式都以一种通用的格式提供,这种格式描述了上下文和问题、解决方案、应用该模式的问题和注意事项,以及一个基于 Microsoft Azure 的示例。每个模式还包括到其他相关模式的链接。

The design patterns are allocated to one or more of the following eight categories: availability, data management, design and implementation, messaging, management and monitoring, performance and scalibility, resiliency, and secuity.

设计模式被分配到以下八个类别中的一个或多个: 可用性、数据管理、设计和实现、消息传递、管理和监视、性能和可伸缩性、弹性和安全性。

1. Cache-aside Pattern 隐藏模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

Load data on demand into a cache from a data store. This pattern can improve performance and also helps to maintain consistency between data held in the cache and the data in the underlying data store.

根据需要将数据从数据存储区加载到缓存中。此模式可以提高性能,还有助于维护缓存中保存的数据与底层数据存储区中的数据之间的一致性。

在这里插入图片描述

For more info, see the Cache-aside Pattern.

有关更多信息,请参见 Cache-side 模式。

2. Circuit Breaker Pattern 断路器模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述

Handle faults that may take a variable amount of time to rectify when connecting to a remote service or resource. This pattern can improve the stability and resiliency of an application.

处理在连接到远程服务或资源时可能需要花费不同时间来纠正的错误。此模式可以提高应用程序的稳定性和弹性。

在这里插入图片描述

For more info, see the Circuit Breaker Pattern.

有关更多信息,请参见断路器模式。

3. Compensating Transaction Pattern 补偿事务模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述
Undo the work performed by a series of steps, which together define an eventually consistent operation, if one or more of the operations fails. Operations that follow the eventual consistency model are commonly found in cloud-hosted applications that implement complex business processes and workflows.

撤消由一系列步骤执行的工作,如果一个或多个操作失败,这些步骤将共同定义最终一致的操作。遵循最终一致性模式的操作通常存在于实现复杂业务流程和工作流的云托管应用程序中。

在这里插入图片描述
For more info, see the Compensating Transaction Pattern.

有关更多信息,请参见补偿事务模式。

4. Competing Consumers Pattern 消费者竞争模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Enable multiple concurrent consumers to process messages received on the same messaging channel. This pattern enables a system to process multiple messages concurrently to optimize throughput, to improve scalability and availability, and to balance the workload.

允许多个并发使用者处理在同一消息传递通道上接收的消息。此模式使系统能够并发处理多个消息,以优化吞吐量,提高可伸缩性和可用性,并平衡工作负载。

在这里插入图片描述
For more info, see the Competing Consumers Pattern.

有关更多信息,请参见竞争消费者模式。

5. Compute Resource Consolidation Pattern 计算资源合并模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Consolidate multiple tasks or operations into a single computational unit. This pattern can increase compute resource utilization, and reduce the costs and management overhead associated with performing compute processing in cloud-hosted applications.

将多个任务或操作合并为一个计算单元。这种模式可以提高计算资源的利用率,并减少与在云托管应用程序中执行计算处理相关的成本和管理开销。
在这里插入图片描述
For more info, see the Compute Resource Consolidation Pattern.

有关更多信息,请参见计算资源合并模式。

6. Command and Query Responsibility Segregation (CQRS) Pattern 命令和查询责任分离(CQRS)模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Segregate operations that read data from operations that update data by using separate interfaces. This pattern can maximize performance, scalability, and security; support evolution of the system over time through higher flexibility; and prevent update commands from causing merge conflicts at the domain level.

将读取数据的操作与使用单独接口更新数据的操作分离。此模式可以最大限度地提高性能、可伸缩性和安全性; 通过更高的灵活性支持系统随时间的演变; 并防止更新命令导致域级别的合并冲突。
在这里插入图片描述
For more info, see the Command and Query Responsibility Segregation (CQRS) Pattern.

有关更多信息,请参见命令和查询责任分离(CQRS)模式。

7. Event Sourcing Pattern 事件源模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Use an append-only store to record the full series of events that describe actions taken on data in a domain, rather than storing just the current state, so that the store can be used to materialize the domain objects. This pattern can simplify tasks in complex domains by avoiding the requirement to synchronize the data model and the business domain; improve performance, scalability, and responsiveness; provide consistency for transactional data; and maintain full audit trails and history that may enable compensating actions.

使用仅追加存储记录描述对域中数据采取的操作的完整系列事件,而不是仅存储当前状态,这样存储就可以用于具体化域对象。通过避免同步数据模型和业务领域的需求,这种模式可以简化复杂领域中的任务; 提高性能、可伸缩性和响应性; 为事务数据提供一致性; 以及维护可能支持补偿操作的完整审计跟踪和历史记录。

Event Source Pattern

For more info, see the Event Sourcing Pattern.

有关更多信息,请参见事件源模式。

8. External Configuration Store Pattern 外部配置存储模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Move configuration information out of the application deployment package to a centralized location. This pattern can provide opportunities for easier management and control of configuration data, and for sharing configuration data across applications and application instances.

将配置信息从应用程序部署包移到集中的位置。此模式可以提供更容易地管理和控制配置数据的机会,以及跨应用程序和应用程序实例共享配置数据的机会。

在这里插入图片描述

For more info, see the External Configuration Store Pattern.

有关更多信息,请参见外部配置存储模式。

9. Federated Identity Pattern 联邦身份模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述
Delegate authentication to an external identity provider. This pattern can simplify development, minimize the requirement for user administration, and improve the user experience of the application.

将身份验证委托给外部标识提供程序。此模式可以简化开发,最大限度地减少用户管理需求,并改善应用程序的用户体验。

在这里插入图片描述

For more info, see the Federated Identity Pattern.

有关更多信息,请参见联邦身份模式。

10. Gatekeeper Pattern 守门人模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述
Protect applications and services by using a dedicated host instance that acts as a broker between clients and the application or service, validates and sanitizes requests, and passes requests and data between them. This pattern can provide an additional layer of security, and limit the attack surface of the system.

通过使用专用的主机实例来保护应用程序和服务,该主机实例充当客户机与应用程序或服务之间的代理,验证和清理请求,并在它们之间传递请求和数据。此模式可以提供额外的安全层,并限制系统的攻击面。

在这里插入图片描述

For more info, see the Gatekeeper Pattern.

有关更多信息,请参见“守门人模式”。

11. Health Endpoint Monitoring Pattern 健康端点监测模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Implement functional checks within an application that external tools can access through exposed endpoints at regular intervals. This pattern can help to verify that applications and services are performing correctly.

在应用程序中实现功能检查,外部工具可以定期通过公开的端点访问这些检查。此模式有助于验证应用程序和服务是否正确执行。

在这里插入图片描述

For more info, see the Health Endpoint Monitoring Pattern.

有关更多信息,请参见健康端点监视模式。

12. Index Table Pattern 索引表模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Create indexes over the fields in data stores that are frequently referenced by query criteria. This pattern can improve query performance by allowing applications to more quickly retrieve data from a data store.

在数据存储区中经常被查询条件引用的字段上创建索引。通过允许应用程序更快地从数据存储区检索数据,此模式可以提高查询性能。
在这里插入图片描述
For more info, see the Index Table Pattern.

有关更多信息,请参见索引表模式。

13. Leader Election Pattern 领袖选举模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Coordinate the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances. This pattern can help to ensure that tasks do not conflict with each other, cause contention for shared resources, or inadvertently interfere with the work that other task instances are performing.

通过选择一个实例作为负责管理其他实例的领导,协调分布式应用程序中协作任务实例集合执行的操作。此模式可以帮助确保任务不会相互冲突,不会引起共享资源的争用,或者不经意地干扰其他任务实例正在执行的工作。

在这里插入图片描述

For more info, see the Leader Election Pattern.

有关更多信息,请参见领袖选举模式。

14. Materialized View Pattern 实体化视图模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Generate pre-populated views over the data in one or more data stores when the data is formatted in a way that does not favor the required query operations. This pattern can help to support efficient querying and data extraction, and improve application performance.

当数据的格式化方式不利于所需的查询操作时,通过一个或多个数据存储区中的数据生成预填充视图。此模式有助于支持高效的查询和数据提取,并提高应用程序的性能。
在这里插入图片描述
For more info, see the Materialized View Pattern.

有关更多信息,请参见物化视图模式。

15. Pipes and Filters Pattern 管道及过滤器模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Decompose a task that performs complex processing into a series of discrete elements that can be reused. This pattern can improve performance, scalability, and reusability by allowing task elements that perform the processing to be deployed and scaled independently.

将执行复杂处理的任务分解为一系列可重用的离散元素。通过允许独立地部署和扩展执行处理的任务元素,该模式可以提高性能、可伸缩性和可重用性。

在这里插入图片描述
For more info, see the Pipes and Filters Pattern.

有关更多信息,请参见管道和过滤器模式。

16. Priority Queue Pattern 优先队列模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Prioritize requests sent to services so that requests with a higher priority are received and processed more quickly than those of a lower priority. This pattern is useful in applications that offer different service level guarantees to individual types of client.

对发送给服务的请求进行优先排序,以便优先级较高的请求比优先级较低的请求更快地得到接收和处理。这种模式在为不同类型的客户机提供不同服务水平保证的应用程序中非常有用。
在这里插入图片描述
For more info, see the Priority Queue Pattern.

有关更多信息,请参见优先队列模式。

17. Queue-based Load Leveling Pattern 基于队列的负载均衡模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads that may otherwise cause the service to fail or the task to timeout. This pattern can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.

使用一个队列作为任务和调用的服务之间的缓冲区,以平滑间歇性重负载,否则可能导致服务失败或任务超时。此模式可以帮助最小化需求高峰对任务和服务的可用性和响应性的影响。
在这里插入图片描述
For more info, see the Queue-based Load Leveling Pattern.

有关更多信息,请参见基于队列的负载均衡模式。

18. Retry Pattern 重试模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述
Enable an application to handle temporary failures when connecting to a service or network resource by transparently retrying the operation in the expectation that the failure is transient. This pattern can improve the stability of the application.

使应用程序能够在连接到服务或网络资源时处理临时故障,方法是透明地重试操作,以期望故障是临时的。此模式可以提高应用程序的稳定性。
在这里插入图片描述
For more info, see the Retry Pattern.

有关更多信息,请参见重试模式。

19. Runtime Reconfiguration Pattern 运行时重新配置模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Design an application so that it can be reconfigured without requiring redeployment or restarting the application. This helps to maintain availability and minimize downtime.

设计一个应用程序,使其可以重新配置,而不需要重新部署或重新启动应用程序。这有助于维护可用性和最小化停机时间。
在这里插入图片描述
For more info, see the Runtime Reconfiguration Pattern.

有关更多信息,请参见运行时重新配置模式。

20. Scheduler Agent Supervisor Pattern 调度代理主管模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Coordinate a set of actions across a distributed set of services and other remote resources, attempt to transparently handle faults if any of these actions fail, or undo the effects of the work performed if the system cannot recover from a fault. This pattern can add resiliency to a distributed system by enabling it to recover and retry actions that fail due to transient exceptions, long-lasting faults, and process failures.

跨分布式服务和其他远程资源协调一组操作,如果这些操作中的任何一个失败,尝试透明地处理错误,或者如果系统无法从错误中恢复,则撤消所执行工作的影响。此模式通过使分布式系统能够恢复和重试由于暂时异常、长期故障和进程失败而失败的操作,从而为分布式系统增加了弹性。
在这里插入图片描述
For more info, see the Scheduler Agent Supervisor Pattern.

有关更多信息,请参见调度程序代理管理器模式。

21. Sharding Pattern 分片模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Divide a data store into a set of horizontal partitions shards. This pattern can improve scalability when storing and accessing large volumes of data.

将数据存储区划分为一组水平分区碎片。这种模式可以提高存储和访问大量数据时的可伸缩性。
在这里插入图片描述
For more info, see the Sharding Pattern.

有关更多信息,请参见分片模式。

22. Static Content Hosting Pattern 静态内容托管模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Deploy static content to a cloud-based storage service that can deliver these directly to the client. This pattern can reduce the requirement for potentially expensive compute instances.

将静态内容部署到基于云的存储服务,该服务可以将静态内容直接交付给客户机。这种模式可以减少对可能开销很大的计算实例的需求。
在这里插入图片描述
For more info, see the Static Content Hosting Pattern.

有关更多信息,请参见静态内容托管模式。

23. Throttling Pattern 节流模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. This pattern can allow the system to continue to function and meet service level agreements, even when an increase in demand places an extreme load on resources.

控制应用程序实例、单个租户或整个服务使用的资源的消耗。这种模式可以使系统继续运行并满足服务水平协议,即使需求的增加对资源造成了极大的负担。
在这里插入图片描述

For more info, see the Throttling Pattern.

有关更多信息,请参见节流模式。

24. Valet Key Pattern 代客泊车钥匙模式

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Use a token or key that provides clients with restricted direct access to a specific resource or service in order to offload data transfer operations from the application code. This pattern is particularly useful in applications that use cloud-hosted storage systems or queues, and can minimize cost and maximize scalability and performance.

使用令牌或密钥为客户端提供对特定资源或服务的受限制的直接访问,以便从应用程序代码中卸载数据传输操作。此模式在使用云托管存储系统或队列的应用程序中特别有用,可以最小化成本并最大限度地提高可伸缩性和性能。
在这里插入图片描述
For more info, see the Valet Key Pattern.

有关更多信息,请参见代客钥匙模式。

Cache-Aside Pattern 隐藏模式

  • 项目Project
  • 2015/08/26
  • 7 分钟可看完Seven minutes

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

Load data on demand into a cache from a data store. This pattern can improve performance and also helps to maintain consistency between data held in the cache and the data in the underlying data store.

根据需要将数据从数据存储区加载到缓存中。此模式可以提高性能,还有助于维护缓存中保存的数据与底层数据存储区中的数据之间的一致性。

Context and Problem 背景与问题

Applications use a cache to optimize repeated access to information held in a data store. However, it is usually impractical to expect that cached data will always be completely consistent with the data in the data store. Applications should implement a strategy that helps to ensure that the data in the cache is up to date as far as possible, but can also detect and handle situations that arise when the data in the cache has become stale.

应用程序使用缓存优化对数据存储中保存的信息的重复访问。但是,期望缓存的数据始终与数据存储区中的数据完全一致通常是不切实际的。应用程序应该实现一种策略,该策略有助于确保缓存中的数据尽可能最新,但也可以检测和处理缓存中的数据变得过时时出现的情况。

Solution 解决方案

Many commercial caching systems provide read-through and write-through/write-behind operations. In these systems, an application retrieves data by referencing the cache. If the data is not in the cache, it is transparently retrieved from the data store and added to the cache. Any modifications to data held in the cache are automatically written back to the data store as well.

许多商业缓存系统提供了通读和通写/通写(write-through/write-behind)操作。在这些系统中,应用程序通过引用缓存来检索数据。如果数据不在缓存中,则透明地从数据存储区检索该数据并将其添加到缓存中。对缓存中保存的数据的任何修改也会自动写回数据存储。

For caches that do not provide this functionality, it is the responsibility of the applications that use the cache to maintain the data in the cache.

对于不提供此功能的缓存,使用缓存维护缓存中数据的应用程序负责。

An application can emulate the functionality of read-through caching by implementing the cache-aside strategy. This strategy effectively loads data into the cache on demand. Figure 1 summarizes the steps in this process.

应用程序可以通过实现缓存旁置策略来模拟通读缓存的功能。此策略根据需要有效地将数据加载到缓存中。图1总结了这个过程中的步骤。

在这里插入图片描述
Figure 1 - Using the Cache-Aside pattern to store data in the cache

图1-使用 Cache-Side 模式在缓存中存储数据

If an application updates information, it can emulate the write-through strategy as follows:

如果应用程序更新信息,它可以模拟 write-through 策略,如下所示:

  1. Make the modification to the data store 对数据存储区进行修改
  2. Invalidate the corresponding item in the cache. 使缓存中的相应项无效

When the item is next required, using the cache-aside strategy will cause the updated data to be retrieved from the data store and added back into the cache.

当下一次需要该项时,使用缓存旁置策略将导致从数据存储区检索更新的数据并将其添加回缓存中。

Issues and Considerations 问题及考虑

Consider the following points when deciding how to implement this pattern:

在决定如何实现此模式时,请考虑以下几点:

  • Lifetime of Cached Data 缓存数据的生存期. Many caches implement an expiration policy that causes data to be invalidated and removed from the cache if it is not accessed for a specified period. For cache-aside to be effective, ensure that the expiration policy matches the pattern of access for applications that use the data. Do not make the expiration period too short because this can cause applications to continually retrieve data from the data store and add it to the cache. Similarly, do not make the expiration period so long that the cached data is likely to become stale. Remember that caching is most effective for relatively static data, or data that is read frequently. .许多缓存实现了一个过期策略,如果在指定的时间段内没有访问数据,该策略将导致数据失效并从缓存中删除。为了使缓存保持有效,请确保过期策略与使用数据的应用程序的访问模式匹配。不要让过期时间太短,因为这会导致应用程序不断地从数据存储中检索数据并将其添加到缓存中。类似地,不要使过期期间太长,以致缓存的数据可能变得过时。请记住,缓存对于相对静态的数据或经常读取的数据是最有效的
  • Evicting Data 驱逐数据. Most caches have only a limited size compared to the data store from where the data originates, and they will evict data if necessary. Most caches adopt a least-recently-used policy for selecting items to evict, but this may be customizable. Configure the global expiration property and other properties of the cache, and the expiration property of each cached item, to help ensure that the cache is cost effective. It may not always be appropriate to apply a global eviction policy to every item in the cache. For example, if a cached item is very expensive to retrieve from the data store, it may be beneficial to retain this item in cache at the expense of more frequently accessed but less costly items. .与数据来源的数据存储区相比,大多数缓存只有有限的大小,如果需要,它们将驱逐数据。大多数缓存采用最近使用次数最少的策略来选择要清除的项,但这可能是可定制的。配置缓存的全局过期属性和其他属性以及每个缓存项的过期属性,以帮助确保缓存的成本效益。对缓存中的每个项应用全局清除策略可能并不总是合适的。例如,如果从数据存储区检索缓存项的成本非常高,那么以更频繁访问但成本较低的项为代价将该项保留在缓存中可能是有益的
  • Priming the Cache 启动缓存. Many solutions prepopulate the cache with the data that an application is likely to need as part of the startup processing. The Cache-Aside pattern may still be useful if some of this data expires or is evicted. .许多解决方案在缓存中预填充应用程序在启动处理过程中可能需要的数据。如果其中一些数据过期或被驱逐,Cache-Side 模式可能仍然有用
  • Consistency. 一致性 Implementing the Cache-Aside pattern does not guarantee consistency between the data store and the cache. An item in the data store may be changed at any time by an external process, and this change might not be reflected in the cache until the next time the item is loaded into the cache. In a system that replicates data across data stores, this problem may become especially acute if synchronization occurs very frequently. 实现 Cache-Side 模式并不能保证数据存储和缓存之间的一致性。外部进程可以随时更改数据存储区中的项,这种更改可能在下次将该项加载到缓存中之前不会反映在缓存中。在跨数据存储区复制数据的系统中,如果同步发生得非常频繁,这个问题可能会变得特别严重
  • Local (In-Memory) Caching 本地(内存中)缓存. A cache could be local to an application instance and stored in-memory. Cache-aside can be useful in this environment if an application repeatedly accesses the same data. However, a local cache is private and so different application instances could each have a copy of the same cached data. This data could quickly become inconsistent between caches, so it may be necessary to expire data held in a private cache and refresh it more frequently. In these scenarios it may be appropriate to investigate the use of a shared or a distributed caching mechanism. .缓存可以是应用程序实例的本地缓存并存储在内存中。如果应用程序重复访问相同的数据,则缓存旁路在此环境中非常有用。但是,本地缓存是私有的,因此不同的应用程序实例可以各自拥有相同的缓存数据的副本。这些数据可能很快在缓存之间变得不一致,因此可能需要过期保存在私有缓存中的数据并更频繁地刷新它。在这些场景中,研究共享缓存机制或分布式缓存机制的使用可能是合适的

When to Use this Pattern 何时使用此模式

Use this pattern when:

在以下情况下使用这种模式:

  • A cache does not provide native read-through and write-through operations. 缓存不提供本机通读和通写操作
  • Resource demand is unpredictable. This pattern enables applications to load data on demand. It makes no assumptions about which data an application will require in advance. 资源需求是不可预测的。此模式使应用程序能够根据需要加载数据。它不假设应用程序预先需要哪些数据

This pattern might not be suitable:

这种模式可能并不合适:

  • When the cached data set is static. If the data will fit into the available cache space, prime the cache with the data on startup and apply a policy that prevents the data from expiring. 当缓存的数据集是静态的时候。如果数据适合可用的缓存空间,那么在启动时用数据填充缓存,并应用防止数据过期的策略
  • For caching session state information in a web application hosted in a web farm. In this environment, you should avoid introducing dependencies based on client-server affinity. 用于在承载于 Web 场中的 Web 应用程序中缓存会话状态信息。在这种环境中,应该避免引入基于客户机-服务器关联的依赖关系

Example 例子

In Microsoft Azure you can use Azure Cache to create a distributed cache that can be shared by multiple instances of an application. The GetMyEntityAsync method in the following code example shows an implementation of the Cache-aside pattern based on Azure Cache. This method retrieves an object from the cache using the read-though approach.

在微软 Azure 中,你可以使用 Azure 缓存来创建一个分布式缓存,这个缓存可以被一个应用程序的多个实例共享。下面的代码示例中的 GetMyEntityAsync 方法显示了基于 Azure Cache 的 Cache-side 模式的实现。此方法使用读取方法从缓存中检索对象。

An object is identified by using an integer ID as the key. The GetMyEntityAsync method generates a string value based on this key (the Azure Cache API uses strings for key values) and attempts to retrieve an item with this key from the cache. If a matching item is found, it is returned. If there is no match in the cache, the GetMyEntityAsync method retrieves the object from a data store, adds it to the cache, and then returns it (the code that actually retrieves the data from the data store has been omitted because it is data store dependent). Note that the cached item is configured to expire in order to prevent it from becoming stale if it is updated elsewhere.

对象通过使用整数 ID 作为键来标识。GetMyEntityAsync 方法根据这个键生成一个字符串值(Azure Cache API 使用字符串作为键值) ,并尝试从缓存中检索具有这个键的项。如果找到匹配项,则返回该项。如果缓存中没有匹配,GetMyEntityAsync 方法将从数据存储中检索对象,将其添加到缓存中,然后返回该对象(实际从数据存储中检索数据的代码被省略,因为它与数据存储相关)。请注意,将缓存的项配置为过期,是为了防止它在其他地方更新时变得过时。

private DataCache cache;...public async Task<MyEntity> GetMyEntityAsync(int id){    // Define a unique key for this method and its parameters.  var key = string.Format("StoreWithCache_GetAsync_{0}", id);  var expiration = TimeSpan.FromMinutes(3);  bool cacheException = false;  try  {    // Try to get the entity from the cache.    var cacheItem = cache.GetCacheItem(key);    if (cacheItem != null)    {      return cacheItem.Value as MyEntity;    }  }  catch (DataCacheException)  {    // If there is a cache related issue, raise an exception     // and avoid using the cache for the rest of the call.    cacheException = true;  }  // If there is a cache miss, get the entity from the original store and cache it.  // Code has been omitted because it is data store dependent.    var entity = ...;  if (!cacheException)  {    try    {      // Avoid caching a null value.      if (entity != null)      {        // Put the item in the cache with a custom expiration time that         // depends on how critical it might be to have stale data.        cache.Put(key, entity, timeout: expiration);      }    }    catch (DataCacheException)    {      // If there is a cache related issue, ignore it      // and just return the entity.    }  }  return entity;}

备注

The examples use the Azure Cache API to access the store and retrieve information from the cache. For more information about the Azure Cache API, see Using Microsoft Azure Cache on MSDN.

这些示例使用 Azure Cache API 访问存储并从缓存中检索信息。有关 Azure 缓存 API 的更多信息,请参见在 MSDN 上使用微软 Azure 缓存。

The UpdateEntityAsync method shown below demonstrates how to invalidate an object in the cache when the value is changed by the application. This is an example of a write-through approach. The code updates the original data store and then removes the cached item from the cache by calling the Remove method, specifying the key (the code for this part of the functionality has been omitted as it will be data store dependent).

下面显示的 UpdateEntityAsync 方法演示如何在应用程序更改对象值时使缓存中的对象失效。这是一个贯穿写入方法的示例。该代码更新原始数据存储,然后通过调用 Remove 方法从缓存中删除缓存项,并指定键(这部分功能的代码已被省略,因为它将依赖于数据存储)。

备注

The order of the steps in this sequence is important. If the item is removed before the cache is updated, there is a small window of opportunity for a client application to fetch the data (because it is not found in the cache) before the item in the data store has been changed, resulting in the cache containing stale data.

这个顺序中步骤的顺序很重要。如果在更新缓存之前删除了该项,那么在更改数据存储中的项之前,客户端应用程序有一个很小的机会窗口来获取数据(因为在缓存中没有找到) ,从而导致缓存中包含过期数据。

public async Task UpdateEntityAsync(MyEntity entity){  // Update the object in the original data store  await this.store.UpdateEntityAsync(entity).ConfigureAwait(false);  // Get the correct key for the cached object.  var key = this.GetAsyncCacheKey(entity.Id);  // Then, invalidate the current cache object  this.cache.Remove(key);}private string GetAsyncCacheKey(int objectId){  return string.Format("StoreWithCache_GetAsync_{0}", objectId);}

Related Patterns and Guidance 相关模式及指引

The following patterns and guidance may also be relevant when implementing this pattern:

下列模式和指南在实现此模式时也可能有用:

  • Caching Guidance 缓存指南. This guidance provides additional information on how you can cache data in a cloud solution, and the issues that you should consider when you implement a cache. .本指南提供了有关如何在云解决方案中缓存数据的附加信息,以及在实现缓存时应考虑的问题
  • Data Consistency Primer 数据一致性入门. Cloud applications typically use data that is dispersed across data stores. Managing and maintaining data consistency in this environment can become a critical aspect of the system, particularly in terms of the concurrency and availability issues that can arise. This primer describes the issues surrounding consistency across distributed data, and summarizes how an application can implement eventual consistency to maintain the availability of data. .云应用程序通常使用分散在数据存储区中的数据。在这种环境中管理和维护数据一致性可能成为系统的一个关键方面,特别是在可能出现的并发性和可用性问题方面。本文描述了分布式数据的一致性问题,并总结了应用程序如何实现最终一致性以维护数据的可用性

Circuit Breaker Pattern 断路器模式

  • Article文章
  • 08/26/2015 2015年8月26日
  • 18 minutes to read还有18分钟

在这里插入图片描述在这里插入图片描述在这里插入图片描述

Handle faults that may take a variable amount of time to rectify when connecting to a remote service or resource. This pattern can improve the stability and resiliency of an application.

处理在连接到远程服务或资源时可能需要花费不同时间来纠正的错误。此模式可以提高应用程序的稳定性和弹性。

Context and Problem 背景与问题

In a distributed environment such as the cloud, where an application performs operations that access remote resources and services, it is possible for these operations to fail due to transient faults such as slow network connections, timeouts, or the resources being overcommitted or temporarily unavailable. These faults typically correct themselves after a short period of time, and a robust cloud application should be prepared to handle them by using a strategy such as that described by the Retry pattern.

在云等分布式环境中,应用程序执行访问远程资源和服务的操作,这些操作可能会因为网络连接缓慢、超时或资源超量使用或暂时不可用等暂时性故障而失败。这些错误通常会在短时间内自我纠正,一个健壮的云应用程序应该准备好使用 Retry 模式所描述的策略来处理这些错误。

However, there may also be situations where faults are due to unexpected events that are less easily anticipated, and that may take much longer to rectify. These faults can range in severity from a partial loss of connectivity to the complete failure of a service. In these situations it may be pointless for an application to continually retry performing an operation that is unlikely to succeed, and instead the application should quickly accept that the operation has failed and handle this failure accordingly.

但是,也有可能出现由于意外事件导致的错误,这些事件不太容易预料到,并且可能需要更长的时间来纠正。这些故障的严重程度可以从连接的部分丧失到服务的完全失败。在这些情况下,应用程序不断重试执行不太可能成功的操作可能是没有意义的,相反,应用程序应该迅速接受操作失败并相应地处理这种失败。

Additionally, if a service is very busy, failure in one part of the system may lead to cascading failures. For example, an operation that invokes a service could be configured to implement a timeout, and reply with a failure message if the service fails to respond within this period. However, this strategy could cause many concurrent requests to the same operation to be blocked until the timeout period expires. These blocked requests might hold critical system resources such as memory, threads, database connections, and so on. Consequently, these resources could become exhausted, causing failure of other possibly unrelated parts of the system that need to use the same resources. In these situations, it would be preferable for the operation to fail immediately, and only attempt to invoke the service if it is likely to succeed. Note that setting a shorter timeout may help to resolve this problem, but the timeout should not be so short that the operation fails most of the time, even if the request to the service would eventually succeed.

此外,如果服务非常繁忙,系统某个部分的故障可能会导致级联故障。例如,可以将调用服务的操作配置为实现超时,并在服务未能在此期间响应时使用失败消息进行响应。但是,此策略可能导致对同一操作的许多并发请求被阻塞,直到超时期限过期。这些被阻塞的请求可能包含关键的系统资源,如内存、线程、数据库连接等。因此,这些资源可能会耗尽,导致系统中需要使用相同资源的其他可能不相关的部分出现故障。在这些情况下,最好是操作立即失败,并且只有在可能成功的情况下才尝试调用服务。请注意,设置较短的超时可能有助于解决这个问题,但是超时不应该太短,以至于操作大多数时候都会失败,即使对服务的请求最终会成功。

Solution 解决方案

The Circuit Breaker pattern can prevent an application repeatedly trying to execute an operation that is likely to fail, allowing it to continue without waiting for the fault to be rectified or wasting CPU cycles while it determines that the fault is long lasting. The Circuit Breaker pattern also enables an application to detect whether the fault has been resolved. If the problem appears to have been rectified, the application can attempt to invoke the operation.

断路器模式可以防止应用程序重复尝试执行可能失败的操作,允许它继续运行,而不必等待故障被纠正或浪费 CPU 周期,同时它确定故障是长期持续的。断路器模式还使应用程序能够检测故障是否已得到解决。如果问题看起来已经得到纠正,应用程序可以尝试调用该操作。

备注

The purpose of the Circuit Breaker pattern is different from that of the Retry Pattern. The Retry Pattern enables an application to retry an operation in the expectation that it will succeed. The Circuit Breaker pattern prevents an application from performing an operation that is likely to fail. An application may combine these two patterns by using the Retry pattern to invoke an operation through a circuit breaker. However, the retry logic should be sensitive to any exceptions returned by the circuit breaker and abandon retry attempts if the circuit breaker indicates that a fault is not transient.

断路器模式的目的与重试模式的目的不同。重试模式使应用程序能够在预期操作成功的情况下重试操作。断路器模式防止应用程序执行可能失败的操作。应用程序可以通过使用重试模式来通过断路器调用操作来组合这两种模式。然而,重试逻辑应该对断路器返回的任何异常敏感,如果断路器表明故障不是暂态的,则放弃重试尝试。

A circuit breaker acts as a proxy for operations that may fail. The proxy should monitor the number of recent failures that have occurred, and then use this information to decide whether to allow the operation to proceed, or simply return an exception immediately.

断路器充当可能失败的操作的代理。代理应该监视最近发生的故障数量,然后使用此信息来决定是允许继续操作,还是仅仅立即返回异常。

The proxy can be implemented as a state machine with the following states that mimic the functionality of an electrical circuit breaker:

该代理可以作为状态机实现,具有下列状态,模拟电气断路器的功能:

  • Closed: The request from the application is routed through to the operation. The proxy maintains a count of the number of recent failures, and if the call to the operation is unsuccessful the proxy increments this count. If the number of recent failures exceeds a specified threshold within a given time period, the proxy is placed into the Open state. At this point the proxy starts a timeout timer, and when this timer expires the proxy is placed into the Half-Open state.

    已关闭: 来自应用程序的请求被路由到操作。代理维护最近失败次数的计数,如果对操作的调用不成功,代理将增加这个计数。如果在给定的时间段内,最近的故障数量超过了指定的阈值,则代理将处于 Open 状态。此时,代理启动一个超时计时器,当该计时器过期时,代理将被放入 Half-Open。

    备注

    The purpose of the timeout timer is to give the system time to rectify the problem that caused the failure before allowing the application to attempt to perform the operation again.

    超时计时器的作用是在允许应用程序再次尝试执行操作之前,给系统时间来纠正导致故障的问题。

  • Open: The request from the application fails immediately and an exception is returned to the application.

    Open: 来自应用程序的请求立即失败,并向应用程序返回异常。

  • Half-Open: A limited number of requests from the application are allowed to pass through and invoke the operation. If these requests are successful, it is assumed that the fault that was previously causing the failure has been fixed and the circuit breaker switches to the Closed state (the failure counter is reset). If any request fails, the circuit breaker assumes that the fault is still present so it reverts back to the Open state and restarts the timeout timer to give the system a further period of time to recover from the failure.

    Half-Open: 允许来自应用程序的有限数量的请求通过和调用操作。如果这些请求成功,则假定以前导致故障的故障已经修复,断路器切换到关闭状态(故障计数器重置)。如果任何请求失败,断路器假定故障仍然存在,因此它将恢复到 Open 状态并重新启动超时计时器,以便给系统更长的时间从故障中恢复。

    备注

    The Half-Open state is useful to prevent a recovering service from suddenly being inundated with requests. As a service recovers, it may be able to support a limited volume of requests until the recovery is complete, but while recovery is in progress a flood of work may cause the service to time out or fail again.

    半开放状态有助于防止恢复服务突然被请求淹没。随着服务的恢复,它可能能够支持有限数量的请求,直到恢复完成,但是在恢复过程中,大量的工作可能导致服务超时或再次失败。

Figure 1 illustrates the states for one possible implementation of a circuit breaker.

图1说明了断路器的一种可能实现的状态。

在这里插入图片描述
Figure 1 - Circuit Breaker states

图1-断路器状态

Note that, in Figure 1, the failure counter used by the Closed state is time-based. It is automatically reset at periodic intervals. This helps to prevent the circuit breaker from entering the Open state if it experiences occasional failures; the failure threshold that trips the circuit breaker into the Open state is only reached when a specified number of failures have occurred during a specified interval. The success counter used by the Half-Open state records the number of successful attempts to invoke the operation. The circuit breaker reverts to the Closed state after a specified number of consecutive operation invocations have been successful. If any invocation fails, the circuit breaker enters the Open state immediately and the success counter will be reset the next time it enters the Half-Open state.

请注意,在图1中,Close 状态使用的故障计数器是基于时间的。它会定期自动复位。这有助于防止断路器在偶尔发生故障时进入开启状态; 只有在特定时间间隔内发生特定数量的故障时,才能达到将断路器触发到开启状态的故障阈值。Half-Open 状态使用的成功计数器记录成功尝试调用操作的次数。断路器在连续调用指定数量的操作成功后恢复到关闭状态。如果任何调用失败,断路器立即进入开放状态,成功计数器将在下次进入 Half-Open 时重置。

备注

How the system recovers is handled externally, possibly by restoring or restarting a failed component or repairing a network connection.

系统如何恢复由外部处理,可能是通过恢复或重新启动失败的组件或修复网络连接。

Implementing the circuit breaker pattern adds stability and resiliency to a system, offering stability while the system recovers from a failure and minimizing the impact of this failure on performance. It can help to maintain the response time of the system by quickly rejecting a request for an operation that is likely to fail, rather than waiting for the operation to time out (or never return). If the circuit breaker raises an event each time it changes state, this information can be used to monitor the health of the part of the system protected by the circuit breaker, or to alert an administrator when a circuit breaker trips to the Open state.

实现断路器模式增加了系统的稳定性和弹性,在系统从故障中恢复时提供稳定性,并将故障对性能的影响降至最低。通过快速拒绝可能失败的操作请求,而不是等待操作超时(或永远不返回) ,它可以帮助维护系统的响应时间。如果断路器每次更改状态时都会引发一个事件,则此信息可用于监视受断路器保护的系统部分的健康状况,或者当断路器进入打开状态时向管理员发出警报。

The pattern is customizable and can be adapted according to the nature of the possible failure. For example, you can apply an increasing timeout timer to a circuit breaker. You could place the circuit breaker in the Open state for a few seconds initially, and then if the failure has not been resolved increase the timeout to a few minutes, and so on. In some cases, rather than the Open state returning failure and raising an exception, it could be useful to return a default value that is meaningful to the application.

该模式是可定制的,并且可以根据可能出现的故障的性质进行调整。例如,可以对断路器应用增加的超时计时器。您可以首先将断路器置于 Open 状态几秒钟,然后如果故障尚未解决,则将超时时间增加到几分钟,以此类推。在某些情况下,与 Open 状态返回失败并引发异常不同,返回对应用程序有意义的默认值可能是有用的。

Issues and Considerations 问题及考虑

You should consider the following points when deciding how to implement this pattern:

在决定如何实现此模式时,应考虑以下几点:

  • Exception Handling. An application invoking an operation through a circuit breaker must be prepared to handle the exceptions that could be raised if the operation is unavailable. The way in which such exceptions are handled will be application specific. For example, an application could temporarily degrade its functionality, invoke an alternative operation to try to perform the same task or obtain the same data, or report the exception to the user and ask them to try again later.

    异常处理。通过断路器调用操作的应用程序必须准备好处理操作不可用时可能引发的异常。处理此类异常的方式将是特定于应用程序的。例如,应用程序可能会暂时降低其功能,调用替代操作来尝试执行相同的任务或获取相同的数据,或者向用户报告异常并要求他们稍后再试。

  • Types of Exceptions. A request may fail for a variety of reasons, some of which may indicate a more severe type of failure than others. For example, a request may fail because a remote service has crashed and may take several minutes to recover, or failure could be caused by a timeout due to the service being temporarily overloaded. A circuit breaker may be able to examine the types of exceptions that occur and adjust its strategy depending on the nature of these exceptions. For example, it may require a larger number of timeout exceptions to trip the circuit breaker to the Open state compared to the number of failures due to the service being completely unavailable.

    异常类型。请求可能由于各种原因而失败,其中一些原因可能意味着比其他原因更严重的失败类型。例如,请求可能会失败,因为远程服务已经崩溃,可能需要几分钟才能恢复,或者由于服务暂时过载导致超时而导致失败。断路器可以检查发生的异常类型,并根据这些异常的性质调整策略。例如,与由于服务完全不可用而导致的故障数量相比,可能需要更多的超时异常才能使断路器断开至开启状态。

  • Logging. A circuit breaker should log all failed requests (and possibly successful requests) to enable an administrator to monitor the health of the operation that it encapsulates.

    伐木。断路器应该记录所有失败的请求(以及可能成功的请求) ,使管理员能够监视其封装的操作的健康状况。

  • Recoverability. You should configure the circuit breaker to match the likely recovery pattern of the operation it is protecting. For example, if the circuit breaker remains in the Open state for a long period, it could raise exceptions even if the reason for the failure has long since been resolved. Similarly, a circuit breaker could oscillate and reduce the response times of applications if it switches from the Open state to the Half-Open state too quickly.

    恢复能力。你应该配置断路器来匹配它所保护的操作的可能恢复模式。例如,如果断路器长时间处于开启状态,即使故障原因早已得到解决,也可能产生例外情况。同样,如果断路器过快地从开放州切换到 Half-Open 州,断路器可能会发生振荡,从而减少应用程序的响应时间。

  • Testing Failed Operations. In the Open state, rather than using a timer to determine when to switch to the Half-Open state, a circuit breaker may instead periodically ping the remote service or resource to determine whether it has become available again. This ping could take the form of an attempt to invoke an operation that had previously failed, or it could use a special operation provided by the remote service specifically for testing the health of the service, as described by the Health Endpoint Monitoring pattern.

    测试失败的操作。在开放状态下,断路器可能不会使用计时器来决定何时切换到 Half-Open,而是会周期性地向远程服务或资源发送 ping 信号,以确定它是否已恢复可用。此 ping 可以采取尝试调用以前失败的操作的形式,或者可以使用远程服务提供的专门用于测试服务健康状况的特殊操作,正如 Health Endpoint Monitor 模式所描述的那样。

  • Manual Override. In a system where the recovery time for a failing operation is extremely variable, it may be beneficial to provide a manual reset option that enables an administrator to forcibly close a circuit breaker (and reset the failure counter). Similarly, an administrator could force a circuit breaker into the Open state (and restart the timeout timer) if the operation protected by the circuit breaker is temporarily unavailable.

    手动控制。在一个故障操作的恢复时间非常可变的系统中,提供一个手动复位选项可能是有益的,该选项使管理员能够强制关闭断路器(并复位故障计数器)。类似地,如果断路器保护的操作暂时不可用,管理员可以强制断路器进入 Open 状态(并重新启动超时计时器)。

  • Concurrency. The same circuit breaker could be accessed by a large number of concurrent instances of an application. The implementation should not block concurrent requests or add excessive overhead to each call to an operation.

    并发性。应用程序的大量并发实例可以访问相同的断路器。实现不应该阻塞并发请求,也不应该为每个对操作的调用增加额外的开销。

  • Resource Differentiation. Be careful when using a single circuit breaker for one type of resource if there might be multiple underlying independent providers. For example, in a data store that comprises multiple shards, one shard may be fully accessible while another is experiencing a temporary issue. If the error responses in these scenarios are conflated, an application may attempt to access some shards even when failure is highly likely, while access to other shards may be blocked even though it is likely to succeed.

    资源分化。如果可能有多个底层独立提供程序,那么在对一种资源使用单个断路器时要小心。例如,在包含多个碎片的数据存储区中,一个碎片可能是完全可访问的,而另一个碎片出现了临时问题。如果这些场景中的错误响应合并在一起,应用程序可能会尝试访问某些碎片,即使失败的可能性很高,而对其他碎片的访问可能会被阻止,即使它可能会成功。

  • Accelerated Circuit Breaking. Sometimes a failure response can contain enough information for the circuit breaker implementation to know it should trip immediately and stay tripped for a minimum amount of time. For example, the error response from a shared resource that is overloaded could indicate that an immediate retry is not recommended and that the application should instead try again in a few minutes time.

    加速断路。有时,故障响应可以包含足够的信息,使断路器实现知道它应该立即跳闸,并保持跳闸的最短时间。例如,来自重载的共享资源的错误响应可能表明不建议立即重试,应用程序应该在几分钟内重试。

    备注

    he HTTP protocol defines the “HTTP 503 Service Unavailable” response that can be returned if a requested service is not currently available on a particular web server. This response can include additional information, such as the anticipated duration of the delay.

    HTTP 协议定义了“ HTTP503服务不可用”响应,如果请求的服务当前在特定的 Web 服务器上不可用,则可以返回该响应。这个响应可以包括其他信息,例如延迟的预期持续时间。

  • Replaying Failed Requests. In the Open state, rather than simply failing quickly, a circuit breaker could also record the details of each request to a journal and arrange for these requests to be replayed when the remote resource or service becomes available.

    重播失败的请求。在 Open 状态下,断路器不仅可以快速失败,还可以将每个请求的详细信息记录到日志中,并安排在远程资源或服务可用时重播这些请求。

  • Inappropriate Timeouts on External Services. A circuit breaker may not be able to fully protect applications from operations that fail in external services that are configured with a lengthy timeout period. If the timeout is too long, a thread running a circuit breaker may be blocked for an extended period before the circuit breaker indicates that the operation has failed. In this time, many other application instances may also attempt to invoke the service through the circuit breaker and tie up a significant number of threads before they all fail.

    外部服务的不适当超时。断路器可能无法完全保护应用程序不受配置了较长超时时间的外部服务中出现故障的操作的影响。如果超时时间过长,在断路器显示操作失败之前,运行断路器的线程可能会被阻塞一段较长的时间。这时,许多其他应用程序实例也可能试图通过断路器调用服务,并在线程全部失败之前占用大量线程。

When to Use this Pattern 何时使用此模式

Use this pattern:

使用以下模式:

  • To prevent an application from attempting to invoke a remote service or access a shared resource if this operation is highly likely to fail. 如果此操作极有可能失败,则防止应用程序尝试调用远程服务或访问共享资源

This pattern might not be suitable:

这种模式可能并不合适:

  • For handling access to local private resources in an application, such as in-memory data structure. In this environment, using a circuit breaker would simply add overhead to your system. 用于处理对应用程序中的本地专用资源(如内存中的数据结构)的访问。在这种环境下,使用断路器只会增加系统的开销
  • As a substitute for handling exceptions in the business logic of your applications. 作为在应用程序的业务逻辑中处理异常的替代

Example 例子

In a web application, several of the pages are populated with data retrieved from an external service. If the system implements minimal caching, most hits to each of these pages will cause a round trip to the service. Connections from the web application to the service could be configured with a timeout period (typically 60 seconds), and if the service does not respond in this time the logic in each web page will assume that the service is unavailable and throw an exception.

在 Web 应用程序中,有几个页面由从外部服务检索到的数据填充。如果系统实现了最小的缓存,那么对每个页面的大多数命中都会导致到服务的往返过程。从 Web 应用程序到服务的连接可以配置为超时时间(通常为60秒) ,如果服务在此时间内没有响应,每个 Web 页面中的逻辑将假定该服务不可用并抛出异常。

However, if the service fails and the system is very busy, users could be forced to wait for up to 60 seconds before an exception occurs. Eventually resources such as memory, connections, and threads could be exhausted, preventing other users from connecting to the system—even if they are not accessing pages that retrieve data from the service.

但是,如果服务失败并且系统非常繁忙,用户可能被迫在异常发生前等待长达60秒。最终,内存、连接和线程等资源可能会耗尽,从而阻止其他用户连接到系统ーー即使他们不访问从服务中检索数据的页面。

Scaling the system by adding further web servers and implementing load balancing may delay the point at which resources become exhausted, but it will not resolve the issue because user requests will still be unresponsive and all web servers could still eventually run out of resources.

通过增加更多的网络服务器和实现负载平衡来扩展系统可能会延迟资源耗尽的时间点,但是这不会解决问题,因为用户的请求仍然没有响应,所有的网络服务器最终仍然可能耗尽资源。

Wrapping the logic that connects to the service and retrieves the data in a circuit breaker could help to alleviate the effects of this problem and handle the service failure more elegantly. User requests will still fail, but they will fail more quickly and the resources will not be blocked.

将连接到服务并在断路器中检索数据的逻辑包装起来,可以帮助减轻这个问题的影响,并更好地处理服务故障。用户请求仍然会失败,但失败的速度会更快,资源也不会被阻塞。

The CircuitBreaker class maintains state information about a circuit breaker in an object that implements the ICircuitBreakerStateStore interface shown in the following code.

CircuitBreaker 类在实现 ICircuitBreakerStateStore 接口的对象中维护有关断路器的状态信息,如下面的代码所示。

C# C #Copy 收到

interface ICircuitBreakerStateStore{  CircuitBreakerStateEnum State { get; }  Exception LastException { get; }  DateTime LastStateChangedDateUtc { get; }  void Trip(Exception ex);  void Reset();  void HalfOpen();  bool IsClosed { get; }}

The State property indicates the current state of the circuit breaker, and will be one of the values Open, HalfOpen, or Closed as defined by the CircuitBreakerStateEnum enumeration. The IsClosed property should be true if the circuit breaker is closed, but false if it is open or half-open. The Trip method switches the state of the circuit breaker to the open state and records the exception that caused the change in state, together with the date and time that the exception occurred. The LastException and the LastStateChangedDateUtc properties return this information. The Reset method closes the circuit breaker, and the HalfOpen method sets the circuit breaker to half-open.

State 属性指示断路器的当前状态,将是 CircuitBreakerStateEnum 枚举定义的 Open、 HalfOpen 或 Close 值之一。如果断路器已关闭,则 IsClose 属性应为 true,但如果断路器处于开启状态或半开启状态,则为 false。Trip 方法将断路器的状态切换到断开状态,并记录导致状态改变的异常以及异常发生的日期和时间。LastException 和 LastStateChangedDateUtc 属性返回此信息。复位法关闭断路器,半开法将断路器设置为半开。

The InMemoryCircuitBreakerStateStore class in the example contains an implementation of the ICircuitBreakerStateStore interface. The CircuitBreaker class creates an instance of this class to hold the state of the circuit breaker.

示例中的 InMemory yCircuitBreakerStateStore 类包含 ICircuitBreakerStateStore 接口的实现。CircuitBreaker 类创建该类的一个实例来保存断路器的状态。

The ExecuteAction method in the CircuitBreaker class wraps an operation (in the form of an Action delegate) that could fail. When this method runs, it first checks the state of the circuit breaker. If it is closed (the local IsOpen property, which returns true if the circuit breaker is open or half-open, is false) the ExecuteAction method attempts to invoke the Action delegate. If this operation fails, an exception handler executes the TrackException method, which sets the state of the circuit breaker to open by calling the Trip method of the InMemoryCircuitBreakerStateStore object. The following code example highlights this flow.

CircuitBreaker 类中的 ExecuteAction 方法包装可能失败的操作(以 Action 委托的形式)。当这种方法运行时,它首先检查断路器的状态。如果它关闭(如果断路器处于打开状态或半打开状态,则本地 IsOpen 属性返回 true,该属性为 false) ,ExecuteAction 方法将尝试调用 Action 委托。如果此操作失败,异常处理程序将执行 TrackException 方法,该方法通过调用 InMemory yCircuitBreakerStateStore 对象的 Trip 方法来设置断路器的状态以打开。下面的代码示例突出显示此流。

C# C #Copy 收到

public class CircuitBreaker{  private readonly ICircuitBreakerStateStore stateStore =    CircuitBreakerStateStoreFactory.GetCircuitBreakerStateStore();  private readonly object halfOpenSyncObject = new object ();
  ...
  public bool IsClosed { get { return stateStore.IsClosed; } }  public bool IsOpen { get { return !IsClosed; } }  public void ExecuteAction(Action action)  {    ...    if (IsOpen)    {      // The circuit breaker is Open.
      ... (see code sample below for details)
    }

    // The circuit breaker is Closed, execute the action.    try    {      action();    }    catch (Exception ex)    {      // If an exception still occurs here, simply       // re-trip the breaker immediately.      this.TrackException(ex);      // Throw the exception so that the caller can tell      // the type of exception that was thrown.      throw;    }
  }

  private void TrackException(Exception ex)  {
    // For simplicity in this example, open the circuit breaker on the first exception.
    // In reality this would be more complex. A certain type of exception, such as one
    // that indicates a service is offline, might trip the circuit breaker immediately. 
    // Alternatively it may count exceptions locally or across multiple instances and
    // use this value over time, or the exception/success ratio based on the exception
    // types, to open the circuit breaker.
    this.stateStore.Trip(ex);  }
}

The following example shows the code (omitted from the previous example) that is executed if the circuit breaker is not closed. It first checks if the circuit breaker has been open for a period longer than the time specified by the local OpenToHalfOpenWaitTime field in the CircuitBreaker class. If this is the case, the ExecuteAction method sets the circuit breaker to half-open, then attempts to perform the operation specified by the Action delegate.

下面的示例显示了在断路器未关闭时执行的代码(前面的示例中省略了)。它首先检查断路器是否已经开启的时间长于本地 OpenToHalfOpenWaitTime 字段在 CircuitBreaker 类中指定的时间。如果是这种情况,ExecuteAction 方法将断路器设置为半开,然后尝试执行 Action 委托指定的操作。

If the operation is successful, the circuit breaker is reset to the closed state. If the operation fails, it is tripped back to the open state and the time at which the exception occurred is updated so that the circuit breaker will wait for a further period before attempting to perform the operation again.

如果操作成功,断路器将复位到关闭状态。如果操作失败,它将回到打开状态,并更新异常发生的时间,以便断路器将等待更长的时间,然后再次尝试执行操作。

If the circuit breaker has only been open for a short time, less than the OpenToHalfOpenWaitTime value, the ExecuteAction method simply throws a CircuitBreakerOpenException exception and returns the error that caused the circuit breaker to transition to the open state.

如果断路器开启的时间很短,小于 OpenToHalfOpenWaitTime 值,ExecuteAction 方法只会抛出 CircuitBreakerOpenException 异常,并返回导致断路器转换到开启状态的错误。

Additionally, to prevent the circuit breaker from attempting to perform concurrent calls to the operation while it is half-open, it uses a lock. A concurrent attempt to invoke the operation will be handled as if the circuit breaker was open, and it will fail with an exception as described later.

另外,为了防止断路器在半开的情况下尝试对操作执行并发调用,断路器使用了一个锁。同时尝试调用该操作将被处理为如同断路器已打开,它将失败,如后面所述的异常。

C# C #Copy 收到

...
    if (IsOpen)    {      // The circuit breaker is Open. Check if the Open timeout has expired.      // If it has, set the state to HalfOpen. Another approach may be to simply       // check for the HalfOpen state that had be set by some other operation.      if (stateStore.LastStateChangedDateUtc + OpenToHalfOpenWaitTime < DateTime.UtcNow)      {// The Open timeout has expired. Allow one operation to execute. Note that, in        // this example, the circuit breaker is simply set to HalfOpen after being         // in the Open state for some period of time. An alternative would be to set         // this using some other approach such as a timer, test method, manually, and         // so on, and simply check the state here to determine how to handle execution        // of the action.         // Limit the number of threads to be executed when the breaker is HalfOpen.        // An alternative would be to use a more complex approach to determine which        // threads or how many are allowed to execute, or to execute a simple test         // method instead.
        bool lockTaken = false;
        try        {          Monitor.TryEnter(halfOpenSyncObject, ref lockTaken)
          if (lockTaken)
          {            // Set the circuit breaker state to HalfOpen.            stateStore.HalfOpen();            // Attempt the operation.            action();            // If this action succeeds, reset the state and allow other operations.            // In reality, instead of immediately returning to the Open state, a counter            // here would record the number of successful operations and return the            // circuit breaker to the Open state only after a specified number succeed.            this.stateStore.Reset();            return;          }          catch (Exception ex)          {            // If there is still an exception, trip the breaker again immediately.            this.stateStore.Trip(ex);            // Throw the exception so that the caller knows which exception occurred.            throw;          }          finally          {
            if (lockTaken)
            {              Monitor.Exit(halfOpenSyncObject);            }          }        }      }// The Open timeout has not yet expired. Throw a CircuitBreakerOpen exception to      // inform the caller that the caller that the call was not actually attempted,       // and return the most recent exception received.      throw new CircuitBreakerOpenException(stateStore.LastException);
    }
    ...

To use a CircuitBreaker object to protect an operation, an application creates an instance of the CircuitBreaker class and invokes the ExecuteAction method, specifying the operation to be performed as the parameter. The application should be prepared to catch the CircuitBreakerOpenException exception if the operation fails because the circuit breaker is open. The following code shows an example:

为了使用 CircuitBreaker 对象来保护操作,应用程序创建 CircuitBreaker 类的实例并调用 ExecuteAction 方法,指定要执行的操作作为参数。应用程序应该准备捕捉 CircuitBreakerOpenException 异常,如果操作失败,因为断路器是打开的。下面的代码显示了一个示例:

C# C #Copy 收到

var breaker = new CircuitBreaker();try{  breaker.ExecuteAction(() =>  {    // Operation protected by the circuit breaker.    ...  });}catch (CircuitBreakerOpenException ex){  // Perform some different action when the breaker is open.  // Last exception details are in the inner exception.  ...}catch (Exception ex){  ...}

Related Patterns and Guidance 相关模式及指引

The following patterns may also be relevant when implementing this pattern:

在实现此模式时,下列模式也可能是相关的:

  • Retry Pattern 重试模式. The Retry pattern is a useful adjunct to the Circuit Breaker pattern. It describes how an application can handle anticipated temporary failures when it attempts to connect to a service or network resource by transparently retrying an operation that has previously failed in the expectation that the cause of the failure is transient. .重试模式是断路器模式的有用辅助工具。它描述了当应用程序尝试连接到服务或网络资源时,如何通过透明地重试以前失败过的操作来处理预期的临时失败,而预期的失败原因是临时的
  • Health Endpoint Monitoring Pattern 健康端点监测模式. A circuit breaker may be able to test the health of a service by sending a request to an endpoint exposed by the service. The service should return information indicating its status. .断路器可以通过向服务公开的端点发送请求来测试服务的健康状况。服务应该返回指示其状态的信息

Compensating Transaction Pattern 补偿事务模式

  • Article文章
  • 08/26/2015 2015年8月26日
  • 8 minutes to read还有8分钟

在这里插入图片描述在这里插入图片描述在这里插入图片描述

Undo the work performed by a series of steps, which together define an eventually consistent operation, if one or more of the steps fail. Operations that follow the eventual consistency model are commonly found in cloud-hosted applications that implement complex business processes and workflows.

撤消由一系列步骤执行的工作,如果一个或多个步骤失败,这些步骤将共同定义最终一致的操作。遵循最终一致性模式的操作通常存在于实现复杂业务流程和工作流的云托管应用程序中。

Context and Problem 背景与问题

Applications running in the cloud frequently modify data. This data may be spread across an assortment of data sources held in a variety of geographic locations. To avoid contention and improve performance in a distributed environment such as this, an application should not attempt to provide strong transactional consistency. Rather, the application should implement eventual consistency. In this model, a typical business operation consists of a series of autonomous steps. While these steps are being performed the overall view of the system state may be inconsistent, but when the operation has completed and all of the steps have been executed the system should become consistent again.

在云中运行的应用程序经常修改数据。这些数据可以分布在不同地理位置的各种数据源中。为了避免争用和提高分布式环境中的性能,应用程序不应该尝试提供强的事务一致性。相反,应用程序应该实现最终一致性。在该模型中,典型的业务操作由一系列自治步骤组成。当执行这些步骤时,系统状态的总体视图可能不一致,但是当操作完成并且所有步骤都已执行时,系统应该再次变得一致。

Note

注意

The Data Consistency Primer provides more information about why distributed transactions do not scale well, and the principles that underpin the eventual consistency model.

数据一致性入门提供了更多关于为什么分布式事务不能很好地扩展的信息,以及支撑最终一致性模型的原则。

A significant challenge in the eventual consistency model is how to handle a step that has failed irrecoverably. In this case it may be necessary to undo all of the work completed by the previous steps in the operation. However, the data cannot simply be rolled back because other concurrent instances of the application may have since changed it. Even in cases where the data has not been changed by a concurrent instance, undoing a step might not simply be a matter of restoring the original state. It may be necessary to apply various business-specific rules (see the travel website described in the Example section).

最终一致性模型面临的一个重大挑战是,如何处理一个不可挽回地失败的步骤。在这种情况下,可能需要撤消操作中前面步骤完成的所有工作。但是,不能简单地回滚数据,因为应用程序的其他并发实例可能已经更改了数据。即使在数据没有被并发实例更改的情况下,撤消某个步骤可能也不仅仅是恢复原始状态这么简单。可能有必要应用各种特定于业务的规则(参见示例部分中描述的旅游网站)。

If an operation that implements eventual consistency spans several heterogeneous data stores, undoing the steps in such an operation will require visiting each data store in turn. The work performed in every data store must be undone reliably to prevent the system from remaining inconsistent.

如果一个实现最终一致性的操作跨越多个异构数据存储区,撤销该操作中的步骤将需要依次访问每个数据存储区。必须可靠地撤销在每个数据存储中执行的工作,以防止系统保持不一致。

Not all data affected by an operation that implements eventual consistency might be held in a database. In a Service Oriented Architecture (SOA) environment an operation may invoke an action in a service, and cause a change in the state held by that service. To undo the operation, this state change must also be undone. This may involve invoking the service again and performing another action that reverses the effects of the first.

并非所有受实现最终一致性操作影响的数据都可能保存在数据库中。在面向服务的体系结构(SOA)环境中,操作可能调用服务中的某个操作,并导致该服务持有的状态发生变化。若要撤消操作,还必须撤消此状态更改。这可能涉及到再次调用服务并执行另一个操作来逆转第一个操作的效果。

Solution 解决方案

Implement a compensating transaction. The steps in a compensating transaction must undo the effects of the steps in the original operation. A compensating transaction might not be able to simply replace the current state with the state the system was in at the start of the operation because this approach could overwrite changes made by other concurrent instances of an application. Rather, it must be an intelligent process that takes into account any work done by concurrent instances. This process will usually be application-specific, driven by the nature of the work performed by the original operation.

实现补偿事务。补偿事务中的步骤必须撤消原始操作中步骤的效果。补偿事务可能不能简单地将当前状态替换为操作开始时系统所处的状态,因为这种方法可能会覆盖应用程序的其他并发实例所做的更改。相反,它必须是一个考虑到并发实例所做的任何工作的智能流程。这个过程通常是特定于应用程序的,由原始操作执行的工作的性质驱动。

A common approach to implementing an eventually consistent operation that requires compensation is to use a workflow. As the original operation proceeds, the system records information about each step and how the work performed by that step can be undone. If the operation fails at any point, the workflow rewinds back through the steps it has completed and performs the work that reverses each step. Note that a compensating transaction might not have to undo the work in the exact mirror-opposite order of the original operation, and it may be possible to perform some of the undo steps in parallel.

实现需要补偿的最终一致操作的一种常见方法是使用工作流。随着原始操作的进行,系统会记录关于每个步骤的信息,以及如何撤消该步骤执行的工作。如果操作在任何时候失败,工作流将回退到它已经完成的步骤,并执行逆转每个步骤的工作。请注意,补偿事务可能不必按与原始操作完全相反的镜像顺序撤消工作,并且可以并行执行一些撤消步骤。

Note

注意

This approach is similar to the Sagas strategy. A description of this strategy is available online in Clemens Vasters’ blog.

这种方法类似于 Sagas 的策略,这种策略的描述可以在 Clemens Vaster 的博客上找到。

A compensating transaction is itself an eventually consistent operation and it could also fail. The system should be able to resume the compensating transaction at the point of failure and continue. It may be necessary to repeat a step that has failed, so the steps in a compensating transaction should be defined as idempotent commands. For more information about idempotency, see Idempotency Patterns on Jonathan Oliver’s blog.

补偿事务本身就是最终一致的操作,它也可能失败。系统应该能够在故障点恢复补偿事务并继续。可能需要重复失败的步骤,因此补偿事务中的步骤应该定义为幂等命令。关于幂等性的更多信息,请参阅 Jonathan Oliver 博客上的幂等性模式。

In some cases it may not be possible to recover from a step that has failed except through manual intervention. In these situations the system should raise an alert and provide as much information as possible about the reason for the failure.

在某些情况下,除非通过人工干预,否则可能无法从失败的步骤中恢复过来。在这些情况下,系统应该发出警报,并提供尽可能多的关于故障原因的信息。

Issues and Considerations 问题及考虑

Consider the following points when deciding how to implement this pattern:

在决定如何实现此模式时,请考虑以下几点:

  • It might not be easy to determine when a step in an operation that implements eventual consistency has failed. A step might not fail immediately, but instead it could block. It may be necessary to implement some form of time-out mechanism. 确定实现最终一致性的操作步骤何时失败可能并不容易。步骤可能不会立即失败,但可能会阻塞。可能需要实现某种形式的超时机制
  • Compensation logic is not easily generalized. A compensating transaction is application-specific; it relies on the application having sufficient information to be able to undo the effects of each step in a failed operation. 补偿逻辑不容易推广。补偿事务是特定于应用程序的; 它依赖于应用程序具有足够的信息,以便能够撤消失败操作中每个步骤的影响
  • You should define the steps in a compensating transaction as idempotent commands. This enables the steps to be repeated if the compensating transaction itself fails. 应该将补偿事务中的步骤定义为幂等命令。这使得在补偿事务本身失败时可以重复这些步骤
  • The infrastructure that handles the steps in the original operation, and the compensating transaction, must be resilient. It must not lose the information required to compensate for a failing step, and it must be able to reliably monitor the progress of the compensation logic. 处理原始操作中的步骤和补偿事务的基础结构必须具有弹性。它不能丢失补偿失败步骤所需的信息,它必须能够可靠地监视补偿逻辑的进展
  • A compensating transaction does not necessarily return the data in the system to the state it was in at the start of the original operation. Instead, it compensates for the work performed by the steps that completed successfully before the operation failed. 补偿事务不一定将系统中的数据返回到原始操作开始时的状态。相反,它补偿了在操作失败之前成功完成的步骤所执行的工作
  • The order of the steps in the compensating transaction does not necessarily have to be the mirror opposite of the steps in the original operation. For example, one data store may be more sensitive to inconsistencies than another, and so the steps in the compensating transaction that undo the changes to this store should occur first. 补偿事务中步骤的顺序不一定要与原始操作中的步骤相反。例如,一个数据存储区可能比另一个数据存储区对不一致更敏感,因此补偿事务中撤消对该存储区的更改的步骤应该首先发生
  • Placing a short-term timeout-based lock on each resource that is required to complete an operation, and obtaining these resources in advance, can help increase the likelihood that the overall activity will succeed. The work should be performed only after all the resources have been acquired. All actions must be finalized before the locks expire. 对完成一项操作所需的每个资源设置基于短期超时的锁定,并提前获得这些资源,有助于增加整个活动成功的可能性。只有在获得所有资源之后才能执行工作。所有操作必须在锁过期之前完成
  • Consider using retry logic that is more forgiving than usual to minimize failures that trigger a compensating transaction. If a step in an operation that implements eventual consistency fails, try handling the failure as a transient exception and repeat the step. Only abort the operation and initiate a compensating transaction if a step fails repeatedly or irrecoverably. 考虑使用比通常更宽容的重试逻辑,以尽量减少触发补偿事务的失败。如果实现最终一致性的操作中的某个步骤失败,尝试将失败作为暂时异常处理,然后重复该步骤。只有在步骤重复或不可恢复地失败时才中止操作并启动补偿事务

Note

注意

Many of the challenges and issues of implementing a compensating transaction are the same as those concerned with implementing eventual consistency. See the section Considerations for Implementing Eventual Consistency in the Data Consistency Primer for more information.

实施补偿交易的许多挑战和问题与实施最终一致性的挑战和问题相同。有关更多信息,请参见数据一致性入门中实现最终一致性的注意事项一节。

When to Use this Pattern 何时使用此模式

Use this pattern only for operations that must be undone if they fail. If possible, design solutions to avoid the complexity of requiring compensating transactions (for more information, see the Data Consistency Primer).

此模式仅用于在操作失败时必须撤消的操作。如果可能,设计解决方案以避免需要补偿事务的复杂性(有关更多信息,请参见数据一致性入门)。

Example 例子

A travel website enables customers to book itineraries. A single itinerary may comprise a series of flights and hotels. A customer traveling from Seattle to London and then on to Paris could perform the following steps when creating an itinerary:

旅游网站使客户能够预订行程。一个单一的行程可能包括一系列的航班和酒店。从西雅图到伦敦再到巴黎的客户在制定行程时可以执行以下步骤:

  1. Book a seat on flight F1 from Seattle to London. 订一张从西雅图飞往伦敦的 F1航班的机票
  2. Book a seat on flight F2 from London to Paris. 订一张从伦敦飞往巴黎的 F2航班的机票
  3. Book a seat on flight F3 from Paris to Seattle. 订一张从巴黎飞往西雅图的 F3航班的机票
  4. Reserve a room at hotel H1 in London. 在伦敦 H1酒店预订一个房间
  5. Reserve a room at hotel H2 in Paris. 在巴黎 H2酒店预订一个房间

These steps constitute an eventually consistent operation, although each step is essentially a separate atomic action in its own right. Therefore, as well as performing these steps, the system must also record the counter operations necessary to undo each step in case the customer decides to cancel the itinerary. The steps necessary to perform the counter operations can then run as a compensating transaction if necessary.

这些步骤构成了最终一致的操作,尽管每个步骤本质上都是独立的原子操作。因此,在执行这些步骤的同时,系统还必须记录撤消每个步骤所需的计数器操作,以防客户决定取消行程。然后,执行计数器操作所需的步骤可以在必要时作为补偿事务运行。

Notice that the steps in the compensating transaction might not be the exact opposite of the original steps, and the logic in each step in the compensating transaction must take into account any business-specific rules. For example, “unbooking” a seat on a flight might not entitle the customer to a complete refund of any money paid.

注意,补偿事务中的步骤可能与原始步骤不完全相反,补偿事务中每个步骤中的逻辑必须考虑到任何特定于业务的规则。例如,“取消预订”航班上的一个座位可能不会使客户有权获得任何已付款项的全额退款。
在这里插入图片描述
Figure 1 - Generating a compensating transaction to undo a long-running transaction to book a travel itinerary

图1-生成一个补偿事务来撤消一个长时间运行的事务来预订旅行路线

Note

注意

It may be possible for the steps in the compensating transaction to be performed in parallel, depending on how you have designed the compensating logic for each step.

补偿事务中的步骤可以并行执行,这取决于您如何为每个步骤设计补偿逻辑。

In many business solutions, failure of a single step does not always necessitate rolling the system back by using a compensating transaction. For example, if—after having booked flights F1, F2, and F3 in the travel website scenario—the customer is unable to reserve a room at hotel H1, it is preferable to offer the customer a room at a different hotel in the same city rather than cancelling the flights. The customer may still elect to cancel (in which case the compensating transaction runs and undoes the bookings made on flights F1, F2, and F3), but this decision should be made by the customer rather than by the system.

在许多业务解决方案中,单个步骤的失败并不总是需要通过使用补偿事务来回滚系统。例如,如果ーー在旅游网站场景中预订了 F1、 F2和 F3航班后ーー客户无法在 H1酒店预订房间,最好是在同一城市的另一家酒店为客户提供一个房间,而不是取消航班。客户仍然可以选择取消(在这种情况下,补偿事务运行并取消在 F1、 F2和 F3航班上的预订) ,但是这个决定应该由客户而不是系统做出。

Related Patterns and Guidance 相关模式及指引

The following patterns and guidance may also be relevant when implementing this pattern:

下列模式和指南在实现此模式时也可能有用:

  • Data Consistency Primer 数据一致性入门. The Compensating Transaction pattern is frequently used to undo operations that implement the eventual consistency model. This primer provides more information on the benefits and tradeoffs of eventual consistency. .补偿事务模式经常用于撤销实现最终一致性模型的操作。这本入门书提供了更多关于最终一致性利弊的信息
  • Scheduler-Agent-Supervisor Pattern 调度器-代理-管理器模式. This pattern describes how to implement resilient systems that perform business operations that utilize distributed services and resources. In some circumstances, it may be necessary to undo the work performed by an operation by using a compensating transaction. .此模式描述如何实现执行利用分布式服务和资源的业务操作的弹性系统。在某些情况下,可能需要通过使用补偿事务撤消操作执行的工作
  • Retry Pattern 重试模式. Compensating transactions can be expensive to perform, and it may be possible to minimize their use by implementing an effective policy of retrying failing operations by following the Retry pattern. .补偿事务的执行成本可能很高,通过遵循 Retry 模式实现重试失败操作的有效策略,可以最大限度地减少事务的使用

Competing Consumers Pattern 消费者竞争模式

  • Article文章
  • 08/26/2015 2015年8月26日
  • 10 minutes to read还有10分钟

In this article
在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

Enable multiple concurrent consumers to process messages received on the same messaging channel. This pattern enables a system to process multiple messages concurrently to optimize throughput, to improve scalability and availability, and to balance the workload.

允许多个并发使用者处理在同一消息传递通道上接收的消息。此模式使系统能够并发处理多个消息,以优化吞吐量,提高可伸缩性和可用性,并平衡工作负载。

Context and Problem 背景与问题

An application running in the cloud may be expected to handle a large number of requests. Rather than process each request synchronously, a common technique is for the application to pass them through a messaging system to another service (a consumer service) that handles them asynchronously. This strategy helps to ensure that the business logic in the application is not blocked while the requests are being processed.

在云中运行的应用程序可能需要处理大量请求。通常的技术不是同步处理每个请求,而是让应用程序通过消息传递系统将请求传递给异步处理请求的另一个服务(使用者服务)。此策略有助于确保在处理请求时不阻塞应用程序中的业务逻辑。

The number of requests could vary significantly over time for many reasons. A sudden burst in user activity or aggregated requests coming from multiple tenants may cause unpredictable workload. At peak hours a system might need to process many hundreds of requests per second, while at other times the number could be very small. Additionally, the nature of the work performed to handle these requests might be highly variable. Using a single instance of the consumer service might cause that instance to become flooded with requests or the messaging system may be overloaded by an influx of messages coming from the application. To handle this fluctuating workload, the system can run multiple instances of the consumer service. However these consumers must be coordinated to ensure that each message is only delivered to a single consumer. The workload also needs to be load balanced across consumers to prevent an instance from becoming a bottleneck.

由于许多原因,请求的数量可能会随着时间的推移而发生很大的变化。用户活动的突然爆发或来自多个租户的聚合请求可能会导致不可预测的工作负载。在高峰时期,系统可能需要每秒处理数百个请求,而在其他时候,这个数字可能非常小。此外,为处理这些请求而执行的工作的性质可能是高度可变的。使用使用者服务的单个实例可能会导致该实例被请求淹没,或者来自应用程序的大量消息可能会使消息传递系统超载。为了处理这种波动的工作负载,系统可以运行使用者服务的多个实例。但是,必须对这些使用者进行协调,以确保每个消息只传递给单个使用者。工作负载还需要在使用者之间进行负载平衡,以防止实例成为瓶颈。

Solution 解决方案

Use a message queue to implement the communication channel between the application and the instances of the consumer service. The application posts requests in the form of messages to the queue, and the consumer service instances receive messages from the queue and process them. This approach enables the same pool of consumer service instances to handle messages from any instance of the application. Figure 1 illustrates this architecture.

使用消息队列实现应用程序与使用者服务实例之间的通信通道。应用程序以消息的形式将请求发送到队列,使用者服务实例从队列接收消息并处理它们。此方法使相同的使用者服务实例池能够处理来自应用程序任何实例的消息。图1说明了这个架构。

在这里插入图片描述

Figure 1 - Using a message queue to distribute work to instances of a service

图1-使用消息队列将工作分配给服务的实例

This solution offers the following benefits:

这种解决办法有以下好处:

  • It enables an inherently load-leveled system that can handle wide variations in the volume of requests sent by application instances. The queue acts as a buffer between the application instances and the consumer service instances, which can help to minimize the impact on availability and responsiveness for both the application and the service instances (as described by the 它支持固有的负载级别系统,可以处理应用程序实例发送的请求数量的巨大变化。队列充当应用程序实例和使用者服务实例之间的缓冲区,这有助于最小化对应用程序和服务实例的可用性和响应性的影响(如Queue-based Load Leveling pattern 基于队列的负载均衡模式). Handling a message that requires some long-running processing to be performed does not prevent other messages from being handled concurrently by other instances of the consumer service. ).处理需要执行某些长时间运行的处理的消息不会阻止使用者服务的其他实例并发处理其他消息
  • It improves reliability. If a producer communicates directly with a consumer instead of using this pattern, but does not monitor the consumer, there is a high probability that messages could be lost or fail to be processed if the consumer fails. In this pattern messages are not sent to a specific service instance, a failed service instance will not block a producer, and messages can be processed by any working service instance. 它提高了可靠性。如果生产者直接与使用者通信,而不使用此模式,但不监视使用者,那么如果使用者失败,消息很可能丢失或无法处理。在此模式中,消息不会发送到特定的服务实例,失败的服务实例不会阻塞生成器,并且消息可由任何工作服务实例处理
  • It does not require complex coordination between the consumers, or between the producer and the consumer instances. The message queue ensures that each message is delivered at least once. 它不需要使用者之间或生产者与使用者实例之间的复杂协调。消息队列确保每条消息至少传递一次
  • It is scalable. The system can dynamically increase or decrease the number of instances of the consumer service as the volume of messages fluctuates. 它是可扩展的。随着消息量的波动,系统可以动态地增加或减少消费者服务的数量
  • It can improve resiliency if the message queue provides transactional read operations. If a consumer service instance reads and processes the message as part of a transactional operation, and if this consumer service instance subsequently fails, this pattern can ensure that the message will be returned to the queue to be picked up and handled by another instance of the consumer service. 如果消息队列提供事务性读操作,则可以提高弹性。如果使用者服务实例作为事务操作的一部分读取和处理消息,并且随后该使用者服务实例失败,则此模式可以确保将消息返回到队列,由使用者服务的另一个实例拾取和处理

Issues and Considerations 问题及考虑

Consider the following points when deciding how to implement this pattern:

在决定如何实现此模式时,请考虑以下几点:

  • Message Ordering. The order in which consumer service instances receive messages is not guaranteed, and does not necessarily reflect the order in which the messages were created. Design the system to ensure that message processing is idempotent because this will help to eliminate any dependency on the order in which messages are handled. For more information about idempotency, see Idempotency Patterns on Jonathon Oliver’s blog.

    消息订购。不保证使用者服务实例接收消息的顺序,并且不一定反映创建消息的顺序。设计系统以确保消息处理是幂等的,因为这将有助于消除对消息处理顺序的任何依赖。关于幂等性的更多信息,请参阅 Jonathon Oliver 博客上的幂等性模式。

    Note

    注意

    Microsoft Azure Service Bus Queues can implement guaranteed first-in-first-out ordering of messages by using message sessions. For more information, see Messaging Patterns Using Sessions on MSDN.

    MicrosoftAzure 服务总线队列可以通过使用消息会话实现有保证的消息先进先出排序。有关更多信息,请参见在 MSDN 上使用会话的消息传递模式。

  • Designing Services for Resiliency. If the system is designed to detect and restart failed service instances, it may be necessary to implement the processing performed by the service instances as idempotent operations to minimize the effects of a single message being retrieved and processed more than once.

    为弹性设计服务。如果系统设计用于检测和重新启动失败的服务实例,则可能需要将服务实例执行的处理作为幂等操作来实现,以尽量减少检索和处理多次的单个消息的影响。

  • Detecting Poison Messages. A malformed message, or a task that requires access to resources that are not available, may cause a service instance to fail. The system should prevent such messages being returned to the queue, and instead capture and store the details of these messages elsewhere so that they can be analyzed if necessary.

    检测有毒信息。格式不正确的消息或需要访问不可用资源的任务可能导致服务实例失败。系统应该防止这样的消息被返回到队列中,而应该在其他地方捕获和存储这些消息的细节,以便在必要时可以对它们进行分析。

  • Handling Results. The service instance handling a message is fully decoupled from the application logic that generates the message, and they may not be able to communicate directly. If the service instance generates results that must be passed back to the application logic, this information must be stored in a location that is accessible to both and the system must provide some indication of when processing has completed to prevent the application logic from retrieving incomplete data.

    处理结果。处理消息的服务实例与生成消息的应用程序逻辑完全解耦,它们可能无法直接通信。如果服务实例生成的结果必须传递回应用程序逻辑,那么这些信息必须存储在双方都可以访问的位置,并且系统必须提供处理完成时的某种指示,以防止应用程序逻辑检索到不完整的数据。

    Note

    注意

    If you are using Azure, a worker process may be able to pass results back to the application logic by using a dedicated message reply queue. The application logic must be able to correlate these results with the original message. This scenario is described in more detail in the Asynchronous Messaging Primer.

    如果您使用 Azure,工作进程可以通过使用专用的消息应答队列将结果传递回应用程序逻辑。应用程序逻辑必须能够将这些结果与原始消息关联起来。这个场景在异步消息入门中有更详细的描述。

  • Scaling the Messaging System. In a large-scale solution, a single message queue could be overwhelmed by the number of messages and become a bottleneck in the system. In this situation, consider partitioning the messaging system to direct messages from specific producers to a particular queue, or use load balancing to distribute messages across multiple message queues.

    缩放消息系统。在大规模解决方案中,单个消息队列可能会被消息数量淹没,成为系统中的瓶颈。在这种情况下,可以考虑对消息传递系统进行分区,以将特定生产者的消息定向到特定队列,或者使用负载平衡将消息分布到多个消息队列。

  • Ensuring Reliability of the Messaging System. A reliable messaging system is needed to guarantee that, once the application enqueues a message, it will not be lost. This is essential for ensuring that all messages are delivered at least once.

    确保消息系统的可靠性。需要一个可靠的消息传递系统来保证,一旦应用程序将消息排队,消息就不会丢失。这对于确保所有消息至少传递一次至关重要。

When to Use this Pattern 何时使用此模式

Use this pattern when:

在以下情况下使用这种模式:

  • The workload for an application is divided into tasks that can run asynchronously. 应用程序的工作负载分为可以异步运行的任务
  • Tasks are independent and can run in parallel. 任务是独立的,可以并行运行
  • The volume of work is highly variable, requiring a scalable solution. 工作量是高度可变的,需要一个可伸缩的解决方案
  • The solution must provide high availability, and must be resilient if the processing for a task fails. 解决方案必须提供高可用性,并且在任务处理失败时必须具有弹性

This pattern may not be suitable when:

在下列情况下,这种模式可能不适用:

  • It is not easy to separate the application workload into discrete tasks, or there is a high degree of dependence between tasks.

    将应用程序工作负载分离为离散的任务并不容易,或者任务之间存在高度的依赖性。

  • Tasks must be performed synchronously, and the application logic must wait for a task to complete before continuing.

    任务必须同步执行,应用程序逻辑必须等待任务完成后才能继续。

  • Tasks must be performed in a specific sequence.

    任务必须按照特定的顺序执行。

    Note

    注意

    Some messaging systems support sessions that enable a producer to group messages together and ensure that they are all handled by the same consumer. This mechanism can be used with prioritized messages (if they are supported) to implement a form of message ordering that delivers messages in sequence from a producer to a single consumer.

    一些消息传递系统支持使生产者能够将消息分组在一起并确保它们都由同一个使用者处理的会话。这种机制可以与已排序的消息(如果支持的话)一起使用,以实现一种消息排序形式,该形式按顺序将消息从生产者传递到单个消费者。

Example 例子

Azure provides storage queues and Service Bus queues that can act as a suitable mechanism for implementing this pattern. The application logic can post messages to a queue, and consumers implemented as tasks in one or more roles can retrieve messages from this queue and process them. For resiliency, a Service Bus queue enables a consumer to use PeekLock mode when it retrieves a message from the queue. This mode does not actually remove the message, but simply hides it from other consumers. The original consumer can delete the message when it has finished processing it. If the consumer should fail, the peek lock will time out and the message will become visible again, allowing another consumer to retrieve it.

Azure 提供了存储队列和服务总线队列,它们可以作为实现此模式的合适机制。应用程序逻辑可以将消息发送到队列,作为一个或多个角色中的任务实现的使用者可以从该队列中检索消息并处理它们。对于弹性,服务总线队列允许使用者在从队列检索消息时使用 PeekLock 模式。此模式实际上并不删除消息,而只是对其他使用者隐藏消息。原始使用者可以在处理完消息后删除该消息。如果使用者失败,则查看锁将超时,消息将再次可见,从而允许其他使用者检索它。

Note

注意

For detailed information on using Azure Service Bus queues, see Service Bus Queues, Topics, and Subscriptions on MSDN. For information on using Azure storage queues, see How to use the Queue Storage Service on MSDN.

有关使用 Azure 服务总线队列的详细信息,请参阅 MSDN 上的服务总线队列、主题和订阅。有关使用 Azure 存储队列的信息,请参见如何在 MSDN 上使用队列存储服务。

The following code shows from the QueueManager class in CompetingConsumers solution of the examples available for download for this guidance shows how you can create a queue by using a QueueClient instance in the Start event handler in a web or worker role.

下面的代码显示了可供下载的本指南的示例解决方案中的 QueueManager 类,它显示了如何通过在 Web 或 worker 角色的 Start 事件处理程序中使用 QueueClient 实例来创建队列。

C# C #Copy 收到

private string queueName = ...;
private string connectionString = ...;
...

public async Task Start()
{
  // Check if the queue already exists.
  var manager = NamespaceManager.CreateFromConnectionString(this.connectionString);
  if (!manager.QueueExists(this.queueName))
  {
    var queueDescription = new QueueDescription(this.queueName);

    // Set the maximum delivery count for messages in the queue. A message 
    // is automatically dead-lettered after this number of deliveries. The
    // default value for dead letter count is 10.
    queueDescription.MaxDeliveryCount = 3;

    await manager.CreateQueueAsync(queueDescription);
  }
  ...

  // Create the queue client. By default the PeekLock method is used.
  this.client = QueueClient.CreateFromConnectionString(
    this.connectionString, this.queueName);
}

The next code snippet shows how an application can create and send a batch of messages to the queue.

下一个代码片段显示应用程序如何创建并向队列发送一批消息。

C# C #Copy 收到

public async Task SendMessagesAsync()
{
  // Simulate sending a batch of messages to the queue.
  var messages = new List<BrokeredMessage>();

  for (int i = 0; i < 10; i++)
  {
    var message = new BrokeredMessage() { MessageId = Guid.NewGuid().ToString() };
    messages.Add(message);
  }
  await this.client.SendBatchAsync(messages);
}

The following code shows how a consumer service instance can receive messages from the queue by following an event-driven approach. The processMessageTask parameter to the ReceiveMessages method is a delegate that references the code to run when a message is received. This code is run asynchronously.

下面的代码显示了使用者服务实例如何通过遵循事件驱动方法从队列接收消息。ReceiveMessages 方法的 processMessageTask 参数是一个委托,它引用在收到消息时运行的代码。此代码异步运行。

C# C #Copy 收到

private ManualResetEvent pauseProcessingEvent;
...

public void ReceiveMessages(Func<BrokeredMessage, Task> processMessageTask)
{
  // Set up the options for the message pump.
  var options = new OnMessageOptions();

  // When AutoComplete is disabled it is necessary to manually  // complete or abandon the messages and handle any errors.
  options.AutoComplete = false;
  options.MaxConcurrentCalls = 10;
  options.ExceptionReceived += this.OptionsOnExceptionReceived;// Use of the Service Bus OnMessage message pump.   // The OnMessage method must be called once, otherwise an exception will occur.  this.client.OnMessageAsync(    async (msg) =>    {      // Will block the current thread if Stop is called.      this.pauseProcessingEvent.WaitOne();      // Execute processing task here.      await processMessageTask(msg);    },    options);
}
...

private void OptionsOnExceptionReceived(object sender,   ExceptionReceivedEventArgs exceptionReceivedEventArgs){  ...}

Note that autoscaling features, such as those available in Azure, can be used to start and stop role instances as the queue length fluctuates. For more information, see Autoscaling Guidance. In addition, it is not necessary to maintain a one-to-one correspondence between role instances and worker processes—a single role instance can implement multiple worker processes. For more information, see Compute Resource Consolidation pattern.

请注意,可以使用自动缩放特性(比如 Azure 中提供的特性)在队列长度波动时启动和停止角色实例。有关更多信息,请参见自动缩放指南。此外,没有必要在角色实例和辅助进程之间保持一个双射ーー一个角色实例可以实现多个辅助进程。有关更多信息,请参见计算资源合并模式。

Related Patterns and Guidance 相关模式及指引

The following patterns and guidance may be relevant when implementing this pattern:

在实现此模式时,下列模式和指南可能是相关的:

  • Asynchronous Messaging Primer 异步消息入门. Message queues are an inherently asynchronous communications mechanism. If a consumer service needs to send a reply to an application, it may be necessary to implement some form of response messaging. The Asynchronous Messaging Primer provides information on how to implement request/reply messaging by using message queues. .消息队列是一种固有的异步通信机制。如果使用者服务需要向应用程序发送应答,则可能需要实现某种形式的响应消息传递。异步消息入门提供了有关如何使用消息队列实现请求/应答消息传递的信息
  • Autoscaling Guidance 自动缩放导航. It may be possible to start and stop instances of a consumer service as the length of the queue to which applications post messages varies. Autoscaling can help to maintain throughput during times of peak processing. .随着应用程序向其发送消息的队列长度的变化,可以启动和停止使用者服务的实例。自动伸缩有助于在处理高峰期间保持吞吐量
  • Compute Resource Consolidation Pattern 计算资源合并模式. It may be possible to consolidate multiple instances of a consumer service into a single process to reduce costs and management overhead. The Compute Resource Consolidation pattern describes the benefits and tradeoffs of following this approach. .可以将一个使用者服务的多个实例合并到一个流程中,以减少成本和管理开销。计算资源合并模式描述了遵循这种方法的好处和利弊
  • Queue-based Load Leveling Pattern 基于队列的负载均衡模式. Introducing a message queue can add resiliency to the system, enabling service instances to handle widely varying volumes of requests from application instances. The message queue effectively acts as a buffer which levels the load. The Queue-based Load Leveling pattern describes this scenario in more detail. .引入消息队列可以增加系统的弹性,使服务实例能够处理来自应用程序实例的大量不同的请求。消息队列有效地充当了负载水平的缓冲区。基于队列的负载均衡模式更详细地描述了此方案

Compute Resource Consolidation Pattern 计算资源合并模式

  • Article文章
  • 08/26/2015 2015年8月26日
  • 13 minutes to read还有13分钟

In this article

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

Consolidate multiple tasks or operations into a single computational unit. This pattern can increase compute resource utilization, and reduce the costs and management overhead associated with performing compute processing in cloud-hosted applications.

将多个任务或操作合并为一个计算单元。这种模式可以提高计算资源的利用率,并减少与在云托管应用程序中执行计算处理相关的成本和管理开销。

Context and Problem 背景与问题

A cloud application frequently implements a variety of operations. In some solutions it may make sense initially to follow the design principle of separation of concerns, and divide these operations into discrete computational units that are hosted and deployed individually (for example, as separate roles in a Microsoft Azure Cloud Service, separate Azure Web Sites, or separate Virtual Machines). However, although this strategy can help to simplify the logical design of the solution, deploying a large number of computational units as part of the same application can increase runtime hosting costs and make management of the system more complex.

云应用程序经常实现各种操作。在一些解决方案中,最初遵循关注点分离的设计原则,并将这些操作划分为单独托管和部署的离散计算单元(例如,在微软 Azure 云服务中作为单独的角色,单独的 Azure 网站,或单独的虚拟机)可能是有意义的。然而,尽管这种策略可以帮助简化解决方案的逻辑设计,但是作为同一应用程序的一部分部署大量计算单元可能会增加运行时托管成本,并使系统管理更加复杂。

As an example, Figure 1 shows the simplified structure of a cloud-hosted solution that is implemented using more than one computational unit. Each computational unit runs in its own virtual environment. Each function has been implemented as a separate task (labeled Task A through Task E) running in its own computational unit.

作为示例,图1显示了使用多个计算单元实现的云托管解决方案的简化结构。每个计算单元在其自己的虚拟环境中运行。每个函数都被实现为一个单独的任务(标签为 TaskA 到 TaskE) ,在它自己的计算单元中运行。

在这里插入图片描述

Figure 1 - Running tasks in a cloud environment by using a set of dedicated computational units

图1-通过使用一组专用的计算单元在云环境中运行任务

Each computational unit consumes chargeable resources, even when it is idle or lightly used. Therefore, this approach may not always be the most cost-effective solution.

每个计算单元消耗可收费的资源,即使它是空闲的或轻微使用。因此,这种方法可能并不总是最具成本效益的解决方案。

In Azure, this concern applies to roles in a Cloud Service, Web Sites, and Virtual Machines. These items execute in their own virtual environment. Running a collection of separate roles, web sites, or virtual machines that are designed to perform a set of well-defined operations, but that need to communicate and cooperate as part of a single solution, may be an inefficient use of resources.

在 Azure 中,这个问题适用于云服务、 Web 站点和虚拟机中的角色。这些项目在它们自己的虚拟环境中执行。运行一组独立的角色、网站或虚拟机,这些角色、网站或虚拟机被设计用于执行一组定义良好的操作,但是需要作为单个解决方案的一部分进行通信和协作,这可能是对资源的低效利用。

Solution 解决方案

To help reduce costs, increase utilization, improve communication speed, and ease the management effort it may be possible to consolidate multiple tasks or operations into a single computational unit.

为了帮助降低成本,提高利用率,提高通信速度,并简化管理工作,将多个任务或操作合并到一个单一的计算单元是可能的。

Tasks can be grouped according to a variety of criteria based on the features provided by the environment, and the costs associated with these features. A common approach is to look for tasks that have a similar profile concerning their scalability, lifetime, and processing requirements. Grouping these items together allows them to scale as a unit. The elasticity provided by many cloud environments enables additional instances of a computational unit to be started and stopped according to the workload. For example, Azure provides autoscaling that you can apply to roles in a Cloud Service, Web Sites, and Virtual Machines. For more information, see Autoscaling Guidance.

根据环境提供的特性以及与这些特性相关的成本,可以根据各种标准对任务进行分组。一种常见的方法是查找与其可伸缩性、生命周期和处理需求相似的任务。将这些项目组合在一起可以使它们作为一个单元进行扩展。许多云环境提供的灵活性允许根据工作负载启动和停止额外的计算单元实例。例如,Azure 提供了自动伸缩,您可以将其应用到云服务、 Web 站点和虚拟机中的角色。有关更多信息,请参见自动缩放指南。

As a counter example to show how scalability can be used to determine which operations should probably not be grouped together, consider the following two tasks:

作为一个反例,说明如何使用可伸缩性来确定哪些操作可能不应该组合在一起,请考虑以下两项任务:

  • Task 1 polls for infrequent, time-insensitive messages sent to a queue. Task 1轮询发送到队列的不常见、时间不敏感的消息
  • Task 2 handles high-volume bursts of network traffic. Task 2处理高容量的网络流量爆发

The second task requires elasticity that may involve starting and stopping a large number of instances of the computational unit. Applying the same scaling to the first task would simply result in more tasks listening for infrequent messages on the same queue, and is a waste of resources.

第二个任务需要弹性,可能涉及到启动和停止计算单元的大量数量。对第一个任务应用相同的扩展只会导致更多的任务侦听同一队列上不常见的消息,这是一种资源浪费。

In many cloud environments it is possible to specify the resources available to a computational unit in terms of the number of CPU cores, memory, disk space, and so on. Generally, the more resources specified, the greater the cost. For financial efficiency, it is important to maximize the amount of work an expensive computational unit performs, and not let it become inactive for an extended period.

在许多云环境中,可以根据 CPU 核心数量、内存、磁盘空间等指定可用于计算单元的资源。一般来说,指定的资源越多,成本就越高。为了提高财务效率,最大化昂贵的计算单元执行的工作量,并且不让它长时间处于不活跃状态是非常重要的。

If there are tasks that require a great deal of CPU power in short bursts, consider consolidating these into a single computational unit that provides the necessary power. However, it is important to balance this need to keep expensive resources busy against the contention that could occur if they are over-stressed. Long-running, compute-intensive tasks should probably not share the same computational unit, for example.

如果有一些任务在短时间内需要大量的 CPU 能量,那么考虑将这些任务合并到一个单独的计算单元中,以提供必要的能量。然而,在保持昂贵资源忙碌的需求与如果资源压力过大可能发生的争用之间进行平衡是非常重要的。例如,长时间运行的计算密集型任务可能不应该共享相同的计算单元。

Issues and Considerations 问题及考虑

Consider the following points when implementing this pattern:

在实现此模式时,请考虑以下几点:

  • Scalability and Elasticity. Many cloud solutions implement scalability and elasticity at the level of the computational unit by starting and stopping instances of units. Avoid grouping tasks that have conflicting scalability requirements in the same computational unit.

    可伸缩性和弹性。许多云解决方案通过启动和停止单元的实例,在计算单元级别实现可伸缩性和弹性。避免在同一计算单元中对具有冲突的可伸缩性需求的任务进行分组。

  • Lifetime. The cloud infrastructure may periodically recycle the virtual environment that hosts a computational unit. When executing many long-running tasks inside a computational unit, it may be necessary to configure the unit to prevent it from being recycled until these tasks have finished. Alternatively, design the tasks by using a check-pointing approach that enables them to stop cleanly, and continue at the point at which they were interrupted when the computational unit is restarted.

    一辈子。云基础设施可能会周期性地回收承载计算单元的虚拟环境。当在一个计算单元内执行许多长时间运行的任务时,可能有必要对该单元进行配置,以防止它在这些任务完成之前被再循环。或者,通过使用检查点方法来设计任务,该方法使任务能够干净地停止,并且在计算单元重新启动时任务中断时继续执行。

  • Release Cadence. If the implementation or configuration of a task changes frequently, it may be necessary to stop the computational unit hosting the updated code, reconfigure and redeploy the unit, and then restart it. This process will also require that all other tasks within the same computational unit are stopped, redeployed, and restarted.

    释放凯蒂丝。如果任务的实现或配置频繁更改,则可能需要停止承载更新代码的计算单元,重新配置和重新部署该单元,然后重新启动它。这个过程还需要停止、重新部署和重新启动同一计算单元内的所有其他任务。

  • Security. Tasks in the same computational unit may share the same security context and be able to access the same resources. There must be a high degree of trust between the tasks, and confidence that that one task is not going to corrupt or adversely affect another. Additionally, increasing the number of tasks running in a computational unit may increase the attack surface of the computational unit; each task is only as secure as the one with the most vulnerabilities.

    保安。同一计算单元中的任务可能共享相同的安全上下文,并能够访问相同的资源。任务之间必须有高度的信任,并相信一项任务不会腐蚀或对另一项任务产生不利影响。此外,增加在计算单元中运行的任务数量可能会增加计算单元的攻击面; 每个任务只有在具有最大漏洞时才是安全的。

  • Fault Tolerance. If one task in a computational unit fails or behaves abnormally, it can affect the other tasks running within the same unit. For example, if one task fails to start correctly it may cause the entire startup logic for the computational unit to fail, and prevent other tasks in the same unit from running.

    容错。如果一个计算单元中的一个任务失败或表现异常,它可能会影响在同一个单元中运行的其他任务。例如,如果一个任务无法正确启动,它可能导致计算单元的整个启动逻辑失败,并阻止同一单元中的其他任务运行。

  • Contention. Avoid introducing contention between tasks that compete for resources in the same computational unit. Ideally, tasks that share the same computational unit should exhibit different resource utilization characteristics. For example, two compute-intensive tasks should probably not reside in the same computational unit, and neither should two tasks that consume large amounts of memory. However, mixing a compute intensive task with a task that requires a large amount of memory may be a viable combination.

    竞争。避免在争夺同一计算单元资源的任务之间引入争用。理想情况下,共享相同计算单元的任务应该表现出不同的资源利用特征。例如,两个计算密集型任务可能不应该驻留在同一个计算单元中,两个消耗大量内存的任务也不应该驻留在同一个计算单元中。然而,将计算密集型任务与需要大量内存的任务混合可能是一个可行的组合。

    Note

    注意

    You should consider consolidating compute resources only for a system that has been in production for a period of time so that operators and developers can monitor the system and create a heat map that identifies how each task utilizes differing resources. This map can be used to determine which tasks are good candidates for sharing compute resources.

    您应该考虑只为生产了一段时间的系统合并计算资源,这样操作员和开发人员就可以监视系统并创建一个热图,该热图标识每个任务如何利用不同的资源。此映射可用于确定哪些任务适合共享计算资源。

  • **Complexity.****Combining multiple tasks into a single computational unit adds complexity to the code in the unit, possibly making it more difficult to test, debug, and maintain.

    复杂性。将多个任务组合到一个单独的计算单元中会增加单元中代码的复杂性,可能会使测试、调试和维护变得更加困难。

  • Stable Logical Architecture. Design and implement the code in each task so that it should not need to change, even if the physical environment in which task runs does change.

    稳定逻辑体系结构。设计并实现每个任务中的代码,使其不需要更改,即使任务运行所在的物理环境确实发生了更改。

  • **Other Strategies.****Consolidating compute resources is only one way to help reduce costs associated with running multiple tasks concurrently. It requires careful planning and monitoring to ensure that it remains an effective approach. Other strategies may be more appropriate, depending on the nature of the work being performed and the location of the users on whose behalf these tasks are running. For example, functional decomposition of the workload (as described by the Compute Partitioning Guidance) may be a better option.

    * * 其他策略。整合计算资源只是帮助降低与同时运行多个任务相关的成本的一种方法。它需要认真的规划和监测,以确保它仍然是一个有效的办法。其他策略可能更合适,具体取决于所执行工作的性质以及代表其运行这些任务的用户的位置。例如,工作负载的功能分解(正如计算分区指南所描述的)可能是一个更好的选择。

When to Use this Pattern 何时使用此模式

Use this pattern for tasks that are not cost effective if they run in their own computational units. If a task spends much of its time idle, running this task in a dedicated unit can be expensive.

如果任务在其自己的计算单元中运行,则对其成本效益不高的任务使用此模式。如果一个任务大部分时间处于空闲状态,那么在一个专门的单元中运行这个任务可能会非常昂贵。

This pattern might not be suitable for tasks that perform critical fault-tolerant operations, or tasks that process highly-sensitive or private data and require their own security context. These tasks should run in their own isolated environment, in a separate computational unit.

此模式可能不适合执行关键容错操作的任务,或处理高度敏感或私有数据并需要自己的安全上下文的任务。这些任务应该在它们自己的独立环境中,在一个单独的计算单元中运行。

Example 例子

When building a cloud service on Azure, it’s possible to consolidate the processing performed by multiple tasks into a single role. Typically this is a worker role that performs background or asynchronous processing tasks.

在 Azure 上构建云服务时,可以将多个任务执行的处理合并为单个角色。通常,这是一个执行后台或异步处理任务的辅助角色。

Note

注意

In some cases it may be possible to include background or asynchronous processing tasks in the web role. This technique can help to reduce costs and simplify deployment, although it can impact the scalability and responsiveness of the public-facing interface provided by the web role. The article Combining Multiple Azure Worker Roles into an Azure Web Role contains a detailed description of implementing background or asynchronous processing tasks in a web role.

在某些情况下,可以在 Web 角色中包含后台或异步处理任务。这种技术有助于降低成本和简化部署,尽管它会影响 Web 角色提供的面向公众的界面的可伸缩性和响应性。将多个 Azure 工作者角色组合成一个 Azure Web 角色的文章包含了在 Web 角色中实现后台或异步处理任务的详细描述。

The role is responsible for starting and stopping the tasks. When the Azure fabric controller loads a role, it raises the Start event for the role. You can override the OnStart method of the WebRole or WorkerRole class to handle this event, perhaps to initialize the data and other resources on which the tasks in this method depend.

角色负责启动和停止任务。当 Azure 结构控制器加载一个角色时,它会引发该角色的 Start 事件。您可以重写 WebRole 或 WorkerRole 类的 OnStart 方法来处理此事件,也许是为了初始化此方法中的任务所依赖的数据和其他资源。

When the OnStart method completes, the role can start responding to requests. You can find more information and guidance about using the OnStart and Run methods in a role in theApplication Startup Processes section in the patterns & practices guide Moving Applications to the Cloud.

OnStart 方法完成后,角色可以开始响应请求。您可以在模式和实践指南将应用程序移动到云中的应用程序启动过程部分中找到更多关于使用 OnStart 和 Run 方法的信息和指导。

Note

注意

Keep the code in the OnStart method as concise as possible. Azure does not impose any limit on the time taken for this method to complete, but the role will not be able to start responding to network requests sent to it until this method completes.

使 OnStart 方法中的代码尽可能简洁。Azure 没有对此方法完成所需的时间施加任何限制,但是在此方法完成之前,角色将无法开始响应发送给它的网络请求。

When the OnStart method has finished, the role executes the Run method. At this point, the fabric controller can start sending requests to the role.

OnStart 方法完成后,角色将执行 Run 方法。此时,结构控制器可以开始向角色发送请求。

Place the code that actually creates the tasks in the Run method. Note that the Run method effectively defines the lifetime of the role instance. When this method completes, the fabric controller will arrange for the role to be shut down.

在 Run 方法中放置实际创建任务的代码。注意,Run 方法有效地定义了角色实例的生存期。当此方法完成时,结构控制器将安排关闭角色。

When a role shuts down or is recycled, the fabric controller prevents any more incoming requests being received from the load balancer and raises the Stop event. You can capture this event by overriding the OnStop method of the role and perform any tidying up required before the role terminates.

当一个角色关闭或被回收时,结构控制器阻止从负载均衡器接收任何更多的传入请求,并引发 Stop 事件。可以通过重写角色的 OnStop 方法捕获此事件,并在角色终止之前执行所需的任何整理。

Note

注意

Any actions performed in the OnStop method must be completed within five minutes (or 30 seconds if you are using the Azure emulator on a local computer); otherwise the Azure fabric controller assumes that the role has stalled and will force it to stop.

在 OnStop 方法中执行的任何操作都必须在5分钟内完成(如果你在本地计算机上使用 Azure 模拟器,则需要在30秒内完成) ; 否则 Azure 面料控制器会假设角色已经停止并强制其停止。

Figure 2 illustrates the lifecycle of a role, and the tasks and resources that it hosts. The tasks are started by the Run method, which then waits for the tasks to complete. The tasks themselves, which implement the business logic of the cloud service, can respond to messages posted to the role through the Azure load balancer.

图2说明了一个角色的生命周期,以及它所承载的任务和资源。任务由 Run 方法启动,然后等待任务完成。实现云服务业务逻辑的任务本身可以通过 Azure 负载平衡器响应发送到角色的消息。

在这里插入图片描述
Figure 2 - The lifecycle of tasks and resources in a role in a Azure cloud service

图2-Azure 云服务中角色的任务和资源的生命周期

The WorkerRole.cs file in the ComputeResourceConsolidation.Worker project shows an example of how you might implement this pattern in a Azure cloud service.

ComputeResources 合并中的 WorkerRole.cs 文件。Worker 项目展示了如何在 Azure 云服务中实现此模式的示例。

Note

注意

The ComputeResourceConsolidation.Worker project is part of the ComputeResourceConsolidation solution that is available for download with this guidance.

计算机资源整合.Worker 项目是可通过本指南下载的计算机资源整合解决方案的一部分。

In the worker role, code that runs when the role is initialized creates the required cancellation token and a list of tasks to run.

在辅助角色中,初始化角色时运行的代码将创建所需的取消令牌和要运行的任务列表。

C# C #Copy 收到

public class WorkerRole: RoleEntryPoint{  // The cancellation token source used to cooperatively cancel running tasks.  private readonly CancellationTokenSource cts = new CancellationTokenSource ();  // List of tasks running on the role instance.  private readonly List<Task> tasks = new List<Task>();  // List of worker tasks to run on this role.  private readonly List<Func<CancellationToken, Task>> workerTasks                          = new List<Func<CancellationToken, Task>>    {      MyWorkerTask1,      MyWorkerTask2    };
  
  ...
}

The MyWorkerTask1 and the MyWorkerTask2 methods are provided to illustrate how to perform different tasks within the same worker role. The following code shows MyWorkerTask1. This is a simple task that sleeps for 30 seconds and then outputs a trace message. It repeats this process indefinitely until the task is cancelled. The code in MyWorkerTask2 is very similar.

提供 MyWorkerTask1和 MyWorkerTask2方法是为了说明如何在同一个 worker 角色中执行不同的任务。下面的代码显示 MyWorkerTask1。这是一个简单的任务,休眠30秒,然后输出一条跟踪消息。它无限期地重复这个过程,直到任务被取消。MyWorkerTask2中的代码非常相似。

C# C #Copy 收到

// A sample worker role task.private static async Task MyWorkerTask1(CancellationToken ct){  // Fixed interval to wake up and check for work and/or do work.  var interval = TimeSpan.FromSeconds(30);  try  {    while (!ct.IsCancellationRequested)    {      // Wake up and do some background processing if not canceled.      // TASK PROCESSING CODE HERE      Trace.TraceInformation("Doing Worker Task 1 Work");      // Go back to sleep for a period of time unless asked to cancel.      // Task.Delay will throw an OperationCanceledException when canceled.      await Task.Delay(interval, ct);    }  }  catch (OperationCanceledException)  {    // Expect this exception to be thrown in normal circumstances or check    // the cancellation token. If the role instances are shutting down, a    // cancellation request will be signaled.    Trace.TraceInformation("Stopping service, cancellation requested");    // Re-throw the exception.    throw;  }}

Note

注意

The approach shown by the sample code is a common implementation of a background process. In a real world application you can follow this same structure, except that you should place your own processing logic in the body of the loop that waits for the cancellation request.

示例代码显示的方法是后台流程的常见实现。在实际的应用程序中,您可以遵循相同的结构,但是您应该将自己的处理逻辑放在等待取消请求的循环体中。

After the worker role has initialized the resources it uses, the Run method starts the two tasks concurrently, as shown here.

在工作者角色初始化其使用的资源之后,Run 方法将同时启动这两个任务,如下所示。

C# C #Copy 收到

...
// RoleEntry Run() is called after OnStart().  // Returning from Run() will cause a role instance to recycle.public override void Run(){  // Start worker tasks and add them to the task list.  foreach (var worker in workerTasks)    tasks.Add(worker(cts.Token));  Trace.TraceInformation("Worker host tasks started");  // The assumption is that all tasks should remain running and not return,   // similar to role entry Run() behavior.  try  {    Task.WaitAny(tasks.ToArray());  }  catch (AggregateException ex)  {    Trace.TraceError(ex.Message);    // If any of the inner exceptions in the aggregate exception     // are not cancellation exceptions then re-throw the exception.    ex.Handle(innerEx => (innerEx is OperationCanceledException));  }  // If there was not a cancellation request, stop all tasks and return from Run()  // An alternative to cancelling and returning when a task exits would be to   // restart the task.  if (!cts.IsCancellationRequested)  {    Trace.TraceInformation("Task returned without cancellation request");    Stop(TimeSpan.FromMinutes(5));  }}
...

In this example, the Run method waits for tasks to be completed. If a task is canceled, the Run method assumes that the role is being shut down and waits for the remaining tasks to be canceled before finishing (it waits for a maximum of five minutes before terminating). If a task fails due to an expected exception, the Run method cancels the task.

在此示例中,Run 方法等待任务完成。如果任务被取消,Run 方法假设该角色正在被关闭,并等待其余任务在完成之前被取消(在终止之前最多等待5分钟)。如果任务由于预期的异常而失败,则 Run 方法取消该任务。

Note

注意

Note that you could implement more comprehensive monitoring and exception handling strategies in the Run method such as restarting tasks that have failed, or including code that enables the role to stop and start individual tasks.

注意,您可以在 Run 方法中实现更全面的监视和异常处理策略,例如重新启动失败的任务,或者包含允许角色停止和启动单个任务的代码。

The Stop method shown in the following code is called when the fabric controller shuts down the role instance (it is invoked from the OnStop method). The code stops each task gracefully by cancelling it. If any task takes more than five minutes to complete, the cancellation processing in the Stop method ceases waiting and the role is terminated.

当结构控制器关闭角色实例(从 OnStop 方法调用)时,将调用以下代码中显示的 Stop 方法。代码通过取消每个任务来优雅地停止它。如果任何任务需要超过5分钟才能完成,Stop 方法中的取消处理将停止等待并终止角色。

C# C #Copy 收到

// Stop running tasks and wait for tasks to complete before returning // unless the timeout expires.private void Stop(TimeSpan timeout){  Trace.TraceInformation("Stop called. Canceling tasks.");  // Cancel running tasks.  cts.Cancel();  Trace.TraceInformation("Waiting for canceled tasks to finish and return");  // Wait for all the tasks to complete before returning. Note that the   // emulator currently allows 30 seconds and Azure allows five  // minutes for processing to complete.  try  {    Task.WaitAll(tasks.ToArray(), timeout);  }  catch (AggregateException ex)  {    Trace.TraceError(ex.Message);    // If any of the inner exceptions in the aggregate exception     // are not cancellation exceptions then re-throw the exception.    ex.Handle(innerEx => (innerEx is OperationCanceledException));  }}

Related Patterns and Guidance 相关模式及指引

The following patterns and guidance may also be relevant when implementing this pattern:

下列模式和指南在实现此模式时也可能有用:

  • Autoscaling Guidance 自动缩放导航. Autoscaling can be used to start and stop instances of service hosting computational resources, depending on the anticipated demand for processing. .根据预期的处理需求,可以使用自动缩放来启动和停止承载计算资源的服务实例
  • Compute Partitioning Guidance 计算分区指南. This guidance describes how to allocate the services and components in a cloud service in a way that helps to minimize running costs while maintaining the scalability, performance, availability, and security of the service. .本指南描述了如何分配云服务中的服务和组件,以帮助最小化运行成本,同时维护服务的可伸缩性、性能、可用性和安全性

Command and Query Responsibility Segregation (CQRS) Pattern 命令和查询责任分离(CQRS)模式

  • Article文章
  • 08/26/2015 2015年8月26日
  • 13 minutes to read还有13分钟

In this article在这篇文章中Event Sourcing and CQRS 事件源和 CQRS

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

Segregate operations that read data from operations that update data by using separate interfaces. This pattern can maximize performance, scalability, and security; support evolution of the system over time through higher flexibility; and prevent update commands from causing merge conflicts at the domain level.

将读取数据的操作与使用单独接口更新数据的操作分离。此模式可以最大限度地提高性能、可伸缩性和安全性; 通过更高的灵活性支持系统随时间的演变; 并防止更新命令导致域级别的合并冲突。

Context and Problem 背景与问题

In traditional data management systems, both commands (updates to the data) and queries (requests for data) are executed against the same set of entities in a single data repository. These entities may be a subset of the rows in one or more tables in a relational database such as SQL Server.

在传统的数据管理系统中,命令(数据更新)和查询(数据请求)都是针对单个数据存储库中的同一组实体执行的。这些实体可能是 SQL Server 等关系数据库中一个或多个表中行的子集。

Typically, in these systems, all create, read, update, and delete (CRUD) operations are applied to the same representation of the entity. For example, a data transfer object (DTO) representing a customer is retrieved from the data store by the data access layer (DAL) and displayed on the screen. A user updates some fields of the DTO (perhaps through data binding) and the DTO is then saved back in the data store by the DAL. The same DTO is used for both the read and write operations, as shown in Figure 1.

通常,在这些系统中,所有创建、读取、更新和删除(CRUD)操作都应用于实体的相同表示。例如,通过数据访问层(DAL)从数据存储中检索代表客户的数据传输对象(DTO)并显示在屏幕上。用户更新 DTO 的某些字段(可能通过数据绑定) ,然后 DAL 将 DTO 保存回数据存储区。读写操作使用相同的 DTO,如图1所示。

在这里插入图片描述

Figure 1 - A traditional CRUD architecture

图1-传统的 CRUD 架构

Traditional CRUD designs work well when there is only limited business logic applied to the data operations. Scaffold mechanisms provided by development tools can create data access code very quickly, which can then be customized as required.

当应用于数据操作的业务逻辑有限时,传统 CRUD 设计可以很好地工作。开发工具提供的脚手架机制可以非常快速地创建数据访问代码,然后可以根据需要进行自定义。

However, the traditional CRUD approach has some disadvantages:

然而,传统的 CRUD 方法有一些缺点:

  • It often means that there is a mismatch between the read and write representations of the data, such as additional columns or properties that must be updated correctly even though they are not required as part of an operation. 这通常意味着数据的读写表示之间存在不匹配,例如必须正确更新的附加列或属性,即使它们不是操作的一部分
  • It risks encountering data contention in a collaborative domain (where multiple actors operate in parallel on the same set of data) when records are locked in the data store, or update conflicts caused by concurrent updates when optimistic locking is used. These risks increase as the complexity and throughput of the system grows. In addition, the traditional approach can also have a negative effect on performance due to load on the data store and data access layer, and the complexity of queries required to retrieve information. 当记录被锁定在数据存储中时,它可能会遇到协作域中的数据争用(其中多个参与者在同一组数据上并行操作) ,或者当使用乐观锁定时,由并发更新引起的更新冲突。这些风险随着系统的复杂性和吞吐量的增加而增加。此外,由于数据存储和数据访问层的负载以及检索信息所需的查询的复杂性,传统方法还可能对性能产生负面影响
  • It can make managing security and permissions more cumbersome because each entity is subject to both read and write operations, which might inadvertently expose data in the wrong context. 它可能使安全性和权限的管理更加繁琐,因为每个实体都受到读和写操作的约束,这可能无意中在错误的上下文中暴露数据

Note

注意

For a deeper understanding of the limits of the CRUD approach see “CRUD, Only When You Can Afford It” on MSDN.

要更深入地理解 CRUD 方法的局限性,请参阅 MSDN 上的“ CRUD,只有当您能够负担得起的时候”。

Solution 解决方案

Command and Query Responsibility Segregation (CQRS) is a pattern that segregates the operations that read data (Queries) from the operations that update data (Commands) by using separate interfaces. This implies that the data models used for querying and updates are different. The models can then be isolated, as shown in Figure 2, although this is not an absolute requirement.

命令和查询责任分离(CQRS)是一种模式,它通过使用单独的接口将读取数据(查询)的操作与更新数据(命令)的操作分离开来。这意味着用于查询和更新的数据模型是不同的。然后可以隔离模型,如图2所示,尽管这不是绝对需求。

在这里插入图片描述
Figure 2 - A basic CQRS architecture

图2-一个基本的 CQRS 架构

Compared to the single model of the data (from which developers build their own conceptual models) that is inherent in CRUD-based systems, the use of separate query and update models for the data in CQRS-based systems considerably simplifies design and implementation. However, one disadvantage is that, unlike CRUD designs, CQRS code cannot automatically be generated by using scaffold mechanisms.

与基于 CRUD 的系统中固有的单一数据模型(开发人员从中建立自己的概念模型)相比,在基于 CQRS 的系统中对数据使用单独的查询和更新模型大大简化了设计和实现。然而,一个缺点是,与 CRUD 设计不同,CQRS 代码不能通过使用脚手架机制自动生成。

The query model for reading data and the update model for writing data may access the same physical store, perhaps by using SQL views or by generating projections on the fly. However, it is common to separate the data into different physical stores to maximize performance, scalability, and security; as shown in Figure 3.

用于读取数据的查询模型和用于写入数据的更新模型可以访问相同的物理存储,方法可能是使用 SQL 视图或动态生成投影。但是,通常将数据分隔到不同的物理存储中,以最大限度地提高性能、可伸缩性和安全性; 如图3所示。

在这里插入图片描述
Figure 3 - A CQRS architecture with separate read and write stores

图3-具有单独读写存储的 CQRS 体系结构

The read store can be a read-only replica of the write store, or the read and write stores may have a different structure altogether. Using multiple read-only replicas of the read store can considerably increase query performance and application UI responsiveness, especially in distributed scenarios where read-only replicas are located close to the application instances. Some database systems, such as SQL Server, provide additional features such as failover replicas to maximize availability.

读存储可以是写存储的只读副本,或者读和写存储可能具有完全不同的结构。使用读存储的多个只读副本可以大大提高查询性能和应用程序 UI 响应能力,特别是在只读副本位于应用程序实例附近的分布式场景中。某些数据库系统(如 SQLServer)提供了诸如故障转移副本之类的附加特性,以最大限度地提高可用性。

Separation of the read and write stores also allows each to be scaled appropriately to match the load. For example, read stores typically encounter a much higher load that write stores.

读和写存储的分离还允许适当地缩放每个存储以匹配负载。例如,读存储通常会遇到编写存储的高得多的负载。

When the query/read model contains denormalized information (see Materialized View Pattern), performance is maximized when reading data for each of the views in an application or when querying the data in the system.

当查询/读取模型包含非规范化信息时(参见实体化视图模式) ,在读取应用程序中每个视图的数据或查询系统中的数据时,性能最大化。

For more information about the CQRS pattern and its implementation, see the following resources:

有关 CQRS 模式及其实现的更多信息,请参见以下资源:

Issues and Considerations 问题及考虑

Consider the following points when deciding how to implement this pattern:

在决定如何实现此模式时,请考虑以下几点:

  • Dividing the data store into separate physical stores for read and write operations can increase the performance and security of a system, but it can add considerable complexity in terms of resiliency and eventual consistency. The read model store must be updated to reflect changes to the write model store, and it may be difficult to detect when a user has issued a request based on stale read data—meaning that the operation cannot be completed.

    将数据存储区划分为单独的物理存储区进行读写操作可以提高系统的性能和安全性,但是它会增加弹性和最终一致性方面的复杂性。必须更新读模型存储区以反映对写模型存储区的更改,而且当用户基于过期读数据发出请求时可能难以检测,这意味着无法完成操作。

    Note

    注意

    For a description of eventual consistency see the Data Consistency Primer.

    有关最终一致性的说明,请参阅数据一致性入门。

  • Consider applying CQRS to limited sections of your system where it will be most valuable, and learn from the experience.

    考虑将 CQRS 应用于系统中最有价值的有限部分,并从中学习经验。

  • A typical approach to embracing eventual consistency is to use event sourcing in conjunction with CQRS so that the write model is an append-only stream of events driven by execution of commands. These events are used to update materialized views that act as the read model. For more information see Event Sourcing and CQRS.

    接受最终一致性的一个典型方法是将事件源与 CQRS 结合使用,这样写模型就是一个由执行命令驱动的仅附加的事件流。这些事件用于更新充当读取模型的物化视图。有关更多信息,请参见事件源和 CQRS。

When to Use this Pattern 何时使用此模式

This pattern is ideally suited to:

这种模式非常适合:

  • Collaborative domains where multiple operations are performed in parallel on the same data. CQRS allows you to define commands with a sufficient granularity to minimize merge conflicts at the domain level (or any conflicts that do arise can be merged by the command), even when updating what appears to be the same type of data. 多个操作在同一数据上并行执行的协作域。CQRS 允许您定义具有足够粒度的命令,以最小化域级别的合并冲突(或者任何出现的冲突都可以通过命令合并) ,即使在更新看似相同类型的数据时也是如此
  • Use with task-based user interfaces (where users are guided through a complex process as a series of steps), with complex domain models, and for teams already familiar with domain-driven design (DDD) techniques. The write model has a full command-processing stack with business logic, input validation, and business validation to ensure that everything is always consistent for each of the aggregates (each cluster of associated objects that are treated as a unit for the purpose of data changes) in the write model. The read model has no business logic or validation stack and just returns a DTO for use in a view model. The read model is eventually consistent with the write model. 使用基于任务的用户界面(用户通过一系列步骤来完成复杂的过程) ,复杂的领域模型,以及已经熟悉领域驱动设计(DDD)技术的团队。写模型有一个完整的命令处理堆栈,其中包含业务逻辑、输入验证和业务验证,以确保在写模型中每个聚合(为了数据更改而被视为一个单元的每个相关对象集群)的一切始终保持一致。读取模型没有业务逻辑或验证堆栈,只是返回一个用于视图模型的 DTO。读模型最终与写模型保持一致
  • Scenarios where performance of data reads must be fine-tuned separately from performance of data writes, especially when the read/write ratio is very high, and when horizontal scaling is required. For example, in many systems the number of read operations is orders of magnitude greater that the number of write operations. To accommodate this, consider scaling out the read model, but running the write model on only one or a few instances. A small number of write model instances also helps to minimize the occurrence of merge conflicts. 必须将数据读取的性能与数据写入的性能分开进行微调的场景,特别是在读/写比率非常高以及需要水平伸缩的情况下。例如,在许多系统中,读操作的数量级大于写操作的数量。为了适应这种情况,可以考虑扩展读模型,但只在一个或几个实例上运行写模型。少量的写模型实例也有助于最小化合并冲突的发生
  • Scenarios where one team of developers can focus on the complex domain model that is part of the write model, and another less experienced team can focus on the read model and the user interfaces. 在这种情况下,一个开发团队可以关注作为编写模型一部分的复杂领域模型,而另一个经验不足的团队可以关注读模型和用户界面
  • Scenarios where the system is expected to evolve over time and may contain multiple versions of the model, or where business rules change regularly. 预计系统将随时间演变并可能包含模型的多个版本的场景,或者业务规则定期更改的场景
  • Integration with other systems, especially in combination with Event Sourcing, where the temporal failure of one subsystem should not affect the availability of the others. 与其他系统的集成,尤其是与事件源相结合,其中一个子系统的时间故障不应该影响其他子系统的可用性

This pattern might not be suitable in the following situations:

这种模式可能不适用于下列情况:

  • Where the domain or the business rules are simple. 域或业务规则简单的地方
  • Where a simple CRUD-style user interface and the related data access operations are sufficient. 一个简单的 CRUD 风格的用户界面和相关的数据访问操作就足够了
  • For implementation across the whole system. There are specific components of an overall data management scenario where CQRS can be useful, but it can add considerable and often unnecessary complexity where it is not actually required. 用于整个系统的实现。总体数据管理场景中有一些特定的组件,在这些组件中 CQRS 可能是有用的,但是在实际上并不需要它的地方,它可能会增加相当大的且往往是不必要的复杂性

Event Sourcing and CQRS 事件源和 CQRS

The CQRS pattern is often used in conjunction with the Event Sourcing pattern. CQRS-based systems use separate read and write data models, each tailored to relevant tasks and often located in physically separate stores. When used with Event Sourcing, the store of events is the write model, and this is the authoritative source of information. The read model of a CQRS-based system provides materialized views of the data, typically as highly denormalized views. These views are tailored to the interfaces and display requirements of the application, which helps to maximize both display and query performance.

CQRS 模式通常与事件源模式一起使用。基于 CQRS 的系统使用单独的读写数据模型,每个模型都适合相关任务,并且通常位于物理上独立的存储区中。当与事件源一起使用时,事件的存储是写模型,这是权威的信息源。基于 CQRS 的系统的读取模型提供数据的物化视图,通常是高度非规范化的视图。这些视图是根据应用程序的接口和显示需求量身定制的,这有助于最大限度地提高显示和查询性能。

Using the stream of events as the write store, rather than the actual data at a point in time, avoids update conflicts on a single aggregate and maximizes performance and scalability. The events can be used to asynchronously generate materialized views of the data that are used to populate the read store.

使用事件流作为写存储,而不是某个时间点的实际数据,可以避免单个聚合上的更新冲突,并最大限度地提高性能和可伸缩性。这些事件可用于异步生成用于填充读取存储区的数据的物化视图。

Because the event store is the authoritative source of information, it is possible to delete the materialized views and replay all past events to create a new representation of the current state when the system evolves, or when the read model must change. The materialized views are effectively a durable read-only cache of the data.

由于事件存储是权威的信息来源,因此可以删除物化视图并重播所有过去的事件,以便在系统发展时或在读取模型必须更改时创建当前状态的新表示。物化视图实际上是数据的持久只读缓存。

When using CQRS combined with the Event Sourcing pattern, consider the following:

当使用 CQRS 与事件采购模式相结合时,请考虑以下几点:

  • As with any system where the write and read stores are separate, systems based on this pattern are only eventually consistent. There will be some delay between the event being generated and the data store that holds the results of operations initiated by these events being updated. 与任何写存储和读存储分离的系统一样,基于此模式的系统最终只能保持一致。生成的事件与保存由这些事件发起的操作的结果的数据存储之间将有一些延迟
  • The pattern introduces additional complexity because code must be created to initiate and handle events, and assemble or update the appropriate views or objects required by queries or a read model. The inherent complexity of the CQRS pattern when used in conjunction with Event Sourcing can make a successful implementation more difficult, and requires relearning of some concepts and a different approach to designing systems. However, Event Sourcing can make it easier to model the domain, and makes it easier to rebuild views or create new ones because the intent of the changes in the data is preserved. 模式引入了额外的复杂性,因为必须创建代码来启动和处理事件,并组装或更新查询或读模型所需的适当视图或对象。当 CQRS 模式与事件源一起使用时,其固有的复杂性会使成功的实现变得更加困难,并且需要重新学习一些概念和设计系统的不同方法。但是,事件源可以使对域进行建模变得更加容易,并且使重新构建视图或创建新视图变得更加容易,因为数据中的更改意图被保留了
  • Generating materialized views for use in the read model or projections of the data by replaying and handling the events for specific entities or collections of entities may require considerable processing time and resource usage, especially if it requires summation or analysis of values over long time periods, because all the associated events may need to be examined. This may be partially resolved by implementing snapshots of the data at scheduled intervals, such as a total count of the number of a specific action that have occurred, or the current state of an entity. 通过重播和处理特定实体或实体集合的事件,生成用于数据读取模型或预测的具体化视图,可能需要相当长的处理时间和资源使用,特别是如果需要对长时期的值进行汇总或分析,因为可能需要检查所有相关事件。这可以通过按照计划的间隔实现数据快照来部分解决,例如已经发生的特定操作的总数,或者实体的当前状态

Note

注意

For more information see Event Sourcing Pattern and Materialized View Pattern, and the patterns & practices guide CQRS Journey on MSDN. In particular you should read the chapter Introducing Event Sourcing for a full exploration of the pattern and how it is useful with CQRS, and the chapter A CQRS and ES Deep Dive to understand more—including how aggregate partitioning can be used with CQRS in Microsoft Azure.

有关更多信息,请参见事件源模式和物化视图模式,以及模式和实践指南 MSDN 上的 CQRS 之旅。特别是你应该阅读事件源介绍一章来全面探索这个模式以及它如何在 CQRS 中发挥作用,以及 A 章 CQRS 和 ES Deep Dive 来了解更多——包括如何在微软 Azure 中使用 CQRS 进行聚合分区。

Example 例子

The following code shows some extracts from an example of a CQRS implementation, which uses different definitions for the read and the write models. The model interfaces do not dictate any features of the underlying data stores, and they can evolve and be fine-tuned independently because these interfaces are separated.

下面的代码显示了 CQRS 实现示例的一些摘录,该实现使用不同的读模型和写模型定义。模型接口不规定底层数据存储的任何特性,它们可以独立发展和调优,因为这些接口是分离的。

The following code shows the read model definition.

下面的代码显示了读模型定义。

C# C #Copy 收到

// Query interface
namespace ReadModel
{
  public interface ProductsDao
  {
    ProductDisplay FindById(int productId);
    IEnumerable<ProductDisplay> FindByName(string name);
    IEnumerable<ProductInventory> FindOutOfStockProducts();
    IEnumerable<ProductDisplay> FindRelatedProducts(int productId);
  }

  public class ProductDisplay
  {
    public int ID { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public decimal UnitPrice { get; set; }
    public bool IsOutOfStock { get; set; }
    public double UserRating { get; set; }
  }

  public class ProductInventory
  {
    public int ID { get; set; }
    public string Name { get; set; }
    public int CurrentStock { get; set; }
  }
}

The system allows users to rate products. The application code does this by using the RateProduct command shown in the following code.

该系统允许用户对产品进行评分。应用程序代码通过使用下面的代码中显示的 RateProduct 命令来实现这一点。

C# C #Copy 收到

public interface Icommand{  Guid Id { get; }}public class RateProduct : Icommand{  public RateProduct()  {    this.Id = Guid.NewGuid();  }  public Guid Id { get; set; }  public int ProductId { get; set; }  public int rating { get; set; }  public int UserId {get; set; }}

The system uses the ProductsCommandHandler class to handle commands sent by the application. Clients typically send commands to the domain through a messaging system such as a queue. The command handler accepts these commands and invokes methods of the domain interface. The granularity of each command is designed to mitigate the chance of conflicting requests. The following code shows an outline of the ProductsCommandHandler class.

系统使用 ProductsCommandHandler 类来处理应用程序发送的命令。客户端通常通过消息传递系统(如队列)向域发送命令。命令处理程序接受这些命令并调用域接口的方法。每个命令的粒度旨在减少发生冲突请求的可能性。下面的代码显示 ProductsCommandHandler 类的大纲。

C# C #Copy 收到

public class ProductsCommandHandler :     ICommandHandler<AddNewProduct>,    ICommandHandler<RateProduct>,    ICommandHandler<AddToInventory>,    ICommandHandler<ConfirmItemShipped>,    ICommandHandler<UpdateStockFromInventoryRecount>    {  private readonly IRepository<Product> repository;  public ProductsCommandHandler (IRepository<Product> repository)  {    this.repository = repository;  }  void Handle (AddNewProduct command)
  {    ...  }  void Handle (RateProduct command)  {    var product = repository.Find(command.ProductId);    if (product != null)    {      product.RateProuct(command.UserId, command.rating);      repository.Save(product);    }  }  void Handle (AddToInventory command)  {    ...  }  void Handle (ConfirmItemsShipped command)  {    ...  }  void Handle (UpdateStockFromInventoryRecount command)  {    ...  }}

The following code shows the ProductsDoman interface from the write model.

下面的代码显示了来自写模型的 ProductsDoman 接口。

C# C #Copy 收到

public interface ProductsDomain
{
  void AddNewProduct(int id, string name, string description, decimal price);
  void RateProduct(int userId int rating);
  void AddToInventory(int productId, int quantity);
  void ConfirmItemsShipped(int productId, int quantity);
  void UpdateStockFromInventoryRecount(int productId, int updatedQuantity);
}

Also notice how the ProductsDomain interface contains methods that have a meaning in the domain. Typically, in a CRUD environment these methods would have generic names such as Save or Update, and have a DTO as the only argument. The CQRS approach can be better tailored to suit the way that this organization carries out business and inventory management.

还要注意 ProductsDomain 接口如何包含在域中有意义的方法。通常,在 CRUD 环境中,这些方法具有通用名称,如 Save 或 Update,并且只有一个 DTO 参数。CQRS 方法可以更好地适应这个组织执行业务和库存管理的方式。

Related Patterns and Guidance 相关模式及指引

The following patterns and guidance may also be relevant when implementing this pattern:

下列模式和指南在实现此模式时也可能有用:

  • Data Consistency Primer 数据一致性入门. This guidance explains the issues that are typically encountered due to eventual consistency between the read and write data stores when using the CQRS pattern, and how these issues can be resolved. .本指南解释了在使用 CQRS 模式时,由于读写数据存储之间的最终一致性,通常会遇到的问题,以及如何解决这些问题
  • Data Partitioning Guidance 数据分区指南. This guidance describes how the read and write data stores used in the CQRS pattern can be divided into separate partitions that can be managed and accessed separately to improve scalability, reduce contention, and optimize performance. .本指南描述了如何将 CQRS 模式中使用的读写数据存储区划分为单独的分区,这些分区可以单独管理和访问,以提高可伸缩性、减少争用和优化性能
  • Event Sourcing Pattern 事件源模式. This pattern describes in more detail how Event Sourcing can be used with the CQRS pattern to simplify tasks in complex domains; improve performance, scalability, and responsiveness; provide consistency for transactional data; and maintain full audit trails and history that may enable compensating actions. .此模式更详细地描述了如何使用事件源与 CQRS 模式来简化复杂领域中的任务; 提高性能、可伸缩性和响应性; 为事务数据提供一致性; 以及维护可能支持补偿操作的完整审计跟踪和历史记录
  • Materialized View Pattern 实体化视图模式. The read model of a CQRS implementation may contain materialized views of the write model data, or the read model may be used to generate materialized views. .CQRS 实现的读模型可以包含写模型数据的物化视图,或者可以使用读模型来生成物化视图

Event Sourcing Pattern 事件源模式

  • Article文章
  • 08/26/2015 2015年8月26日
  • 15 minutes to read还有15分钟

In this article

在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

Use an append-only store to record the full series of events that describe actions taken on data in a domain, rather than storing just the current state, so that the store can be used to materialize the domain objects. This pattern can simplify tasks in complex domains by avoiding the requirement to synchronize the data model and the business domain; improve performance, scalability, and responsiveness; provide consistency for transactional data; and maintain full audit trails and history that may enable compensating actions.

使用仅追加存储记录描述对域中数据采取的操作的完整系列事件,而不是仅存储当前状态,这样存储就可以用于具体化域对象。通过避免同步数据模型和业务领域的需求,这种模式可以简化复杂领域中的任务; 提高性能、可伸缩性和响应性; 为事务数据提供一致性; 以及维护可能支持补偿操作的完整审计跟踪和历史记录。

Context and Problem 背景与问题

Most applications work with data, and the typical approach is for the application to maintain the current state of the data by updating it as users work with the data. For example, in the traditional create, read, update, and delete (CRUD) model a typical data process will be to read data from the store, make some modifications to it, and update the current state of the data with the new values—often by using transactions that lock the data.

大多数应用程序处理数据,典型的方法是应用程序在用户处理数据时通过更新数据来维护数据的当前状态。例如,在传统的创建、读取、更新和删除(CRUD)模型中,典型的数据处理将是从存储中读取数据,对其进行一些修改,并用新值更新数据的当前状态ーー通常使用锁定数据的事务。

The CRUD approach has some limitations:

CRUD 方法有一些局限性:

  • The fact that CRUD systems perform update operations directly against a data store may hinder performance and responsiveness, and limit scalability, due to the processing overhead it requires. CRUD 系统直接对数据存储执行更新操作的事实可能会阻碍性能和响应性,并限制可伸缩性,因为它需要处理开销
  • In a collaborative domain with many concurrent users, data update conflicts are more likely to occur because the update operations take place on a single item of data. 在有许多并发用户的协作域中,数据更新冲突更容易发生,因为更新操作发生在单个数据项上
  • Unless there is an additional auditing mechanism, which records the details of each operation in a separate log, history is lost. 除非有一个额外的审计机制,将每个操作的详细信息记录在单独的日志中,否则历史记录将丢失

Note

注意

For a deeper understanding of the limits of the CRUD approach see “CRUD, Only When You Can Afford It” on MSDN.

要更深入地理解 CRUD 方法的局限性,请参阅 MSDN 上的“ CRUD,只有当您能够负担得起的时候”。

Solution 解决方案

The Event Sourcing pattern defines an approach to handling operations on data that is driven by a sequence of events, each of which is recorded in an append-only store. Application code sends a series of events that imperatively describe each action that has occurred on the data to the event store, where they are persisted. Each event represents a set of changes to the data (such as AddedItemToOrder).

事件源模式定义了一种处理由一系列事件驱动的数据操作的方法,每个事件都记录在仅追加存储中。应用程序代码向事件存储区发送一系列事件,这些事件强制性地描述数据上发生的每个操作,并将这些操作保存在事件存储区中。每个事件表示对数据的一组更改(例如 AddedItemToOrder)。

The events are persisted in an event store that acts as the source of truth or system of record (the authoritative data source for a given data element or piece of information) about the current state of the data. The event store typically publishes these events so that consumers can be notified and can handle them if needed. Consumers could, for example, initiate tasks that apply the operations in the events to other systems, or perform any other associated action that is required to complete the operation. Notice that the application code that generates the events is decoupled from the systems that subscribe to the events.

事件保存在一个事件存储中,该事件存储充当关于数据当前状态的真相源或记录系统(给定数据元素或信息的权威数据源)。事件存储通常发布这些事件,以便可以通知使用者并在需要时处理它们。例如,使用者可以启动将事件中的操作应用于其他系统的任务,或者执行完成操作所需的任何其他相关操作。注意,生成事件的应用程序代码与订阅事件的系统是解耦的。

Typical uses of the events published by the event store are to maintain materialized views of entities as actions in the application change them, and for integration with external systems. For example, a system may maintain a materialized view of all customer orders that is used to populate parts of the UI. As the application adds new orders, adds or removes items on the order, and adds shipping information, the events that describe these changes can be handled and used to update the materialized view.

事件存储发布的事件的典型用途是在应用程序中的操作更改实体时维护实体的物化视图,并与外部系统集成。例如,系统可以维护用于填充 UI 部分的所有客户订单的物化视图。当应用程序添加新订单、添加或删除订单上的项目以及添加送货信息时,可以处理描述这些更改的事件并用于更新物化视图。

Note

注意

See the Materialized View pattern for more information.

有关更多信息,请参见物化视图模式。

In addition, at any point in time it is possible for applications to read the history of events, and use it to materialize the current state of an entity by effectively “playing back” and consuming all the events related to that entity. This may occur on demand in order to materialize a domain object when handling a request, or through a scheduled task so that the state of the entity can be stored as a materialized view to support the presentation layer.

此外,在任何时候,应用程序都可以读取事件的历史记录,并通过有效地“回放”和使用与该实体相关的所有事件来实现实体的当前状态。这可能是为了在处理请求时实现一个域对象而按需发生的,或者是通过一个预定的任务,以便实体的状态可以作为一个实体化视图存储以支持表示层。

Figure 1 shows a logical overview of the pattern, including some of the options for using the event stream such as creating a materialized view, integrating events with external applications and systems, and replaying events to create projections of the current state of specific entities.

图1显示了该模式的逻辑概述,包括使用事件流的一些选项,如创建物化视图、将事件与外部应用程序和系统集成,以及重播事件以创建特定实体当前状态的预测。

在这里插入图片描述
Figure 1 - An overview and example of the Event Sourcing pattern

图1-事件源模式的概述和示例

The Event Sourcing pattern provides many advantages, including the following:

事件源模式提供了许多优点,包括以下内容:

  • Events are immutable and so can be stored using an append-only operation. The user interface, workflow, or process that initiated the action that produced the events can continue, and the tasks that handle the events can run in the background. This, combined with the fact that there is no contention during the execution of transactions, can vastly improve performance and scalability for applications, especially for the presentation level or user interface. 事件是不可变的,因此可以使用仅追加操作进行存储。启动产生事件的操作的用户界面、工作流或流程可以继续,处理事件的任务可以在后台运行。这与事务执行期间没有争用的事实相结合,可以极大地提高应用程序的性能和可伸缩性,特别是对于表示级别或用户界面
  • Events are simple objects that describe some action that occurred, together with any associated data required to describe the action represented by the event. Events do not directly update a data store; they are simply recorded for handling at the appropriate time. These factors can simplify implementation and management. 事件是描述发生的某些操作的简单对象,以及描述由事件表示的操作所需的任何关联数据。事件不会直接更新数据存储区; 它们只是被记录以便在适当的时候进行处理。这些因素可以简化实现和管理
  • Events typically have meaning for a domain expert, whereas the complexity of the object-relational impedance mismatch might mean that a database table may not be clearly understood by the domain expert. Tables are artificial constructs that represent the current state of the system, not the events that occurred. 事件通常对领域专家有意义,而对象关系不匹配的复杂性可能意味着领域专家可能无法清楚地理解数据库表。表是人工构造,表示系统的当前状态,而不是发生的事件
  • Event sourcing can help to prevent concurrent updates from causing conflicts because it avoids the requirement to directly update objects in the data store. However, the domain model must still be designed to protect itself from requests that might result in an inconsistent state. 事件源有助于防止并发更新引起冲突,因为它避免了直接更新数据存储中的对象的要求。但是,域模型的设计仍然必须保护自己不受可能导致不一致状态的请求的影响
  • The append-only storage of events provides an audit trail that can be used to monitor actions taken against a data store, regenerate the current state as materialized views or projections by replaying the events at any time, and assist in testing and debugging the system. In addition, the requirement to use compensating events to cancel changes provides a history of changes that were reversed, which would not be the case if the model simply stored the current state. The list of events can also be used to analyze application performance and detect user behavior trends, or to obtain other useful business information. 事件的只追加存储提供了一个审计跟踪,可用于监视针对数据存储采取的操作,通过随时重播事件将当前状态重新生成物化视图或预测,并协助测试和调试系统。此外,使用补偿事件来取消更改的要求提供了被逆转的更改的历史记录,如果模型只是存储当前状态,则不会出现这种情况。事件列表还可用于分析应用程序性能和检测用户行为趋势,或获取其他有用的业务信息
  • The decoupling of the events from any tasks that perform operations in response to each event raised by the event store provides flexibility and extensibility. For example, the tasks that handle events raised by the event store are aware only of the nature of the event and the data it contains. The way that the task is executed is decoupled from the operation that triggered the event. In addition, multiple tasks can handle each event. This may enable easy integration with other services and systems that need only listen for new events raised by the event store. However, the event sourcing events tend to be very low level, and it may be necessary to generate specific integration events instead. 事件与执行响应事件存储所引发的每个事件的操作的任何任务的分离提供了灵活性和可扩展性。例如,处理事件存储区引发的事件的任务只知道事件的性质及其包含的数据。任务的执行方式与触发事件的操作解耦。此外,多个任务可以处理每个事件。这样可以方便地与其他只需侦听事件存储引发的新事件的服务和系统集成。然而,事件源事件往往是非常低级的,可能需要生成特定的集成事件来代替

Note

注意

Event sourcing is commonly combined with the CQRS pattern by performing the data management tasks in response to the events, and by materializing views from the stored events.

事件源通常与 CQRS 模式结合在一起,通过执行响应事件的数据管理任务,以及实现来自存储事件的视图。

Issues and Considerations 问题及考虑

Consider the following points when deciding how to implement this pattern:

在决定如何实现此模式时,请考虑以下几点:

  • The system will only be eventually consistent when creating materialized views or generating projections of data by replaying events. There is some delay between an application adding events to the event store as the result of handling a request, the events being published, and consumers of the events handling them. During this period, new events that describe further changes to entities may have arrived at the event store.

    该系统只有在创建具体视图或通过重播事件生成数据预测时才能最终保持一致。应用程序在处理请求、发布的事件和处理它们的事件的使用者之间向事件存储区添加事件会有一定的延迟。在此期间,描述对实体进一步更改的新事件可能已经到达事件存储区。

    Note

    注意

    See the Data Consistency Primer for information about eventual consistency.

    有关最终一致性的信息,请参阅数据一致性入门。

  • The event store is the immutable source of information, and so the event data should never be updated. The only way to update an entity in order to undo a change is to add a compensating event to the event store, much as you would use a negative transaction in accounting. If the format (rather than the data) of the persisted events needs to change, perhaps during a migration, it can be difficult to combine existing events in the store with the new version. It may be necessary to iterate through all the events making changes so that they are compliant with the new format, or add new events that use the new format. Consider using a version stamp on each version of the event schema in order to maintain both the old and the new event formats.

    事件存储区是不可变的信息源,因此永远不应更新事件数据。更新实体以撤消更改的唯一方法是向事件存储添加补偿事件,就像在会计中使用负事务一样。如果持久化事件的格式(而不是数据)需要更改(可能是在迁移过程中) ,则很难将存储中的现有事件与新版本结合起来。可能需要迭代所有进行更改的事件,以便它们与新格式兼容,或者添加使用新格式的新事件。考虑在事件模式的每个版本上使用一个版本标记,以便同时维护旧的和新的事件格式。

  • Multi-threaded applications and multiple instances of applications may be storing events in the event store. The consistency of events in the event store is vital, as is the order of events that affect a specific entity (the order in which changes to an entity occur affects its current state). Adding a timestamp to every event is one option that can help to avoid issues. Another common practice is to annotate each event that results from a request with an incremental identifier. If two actions attempt to add events for the same entity at the same time, the event store can reject an event that matches an existing entity identifier and event identifier.

    多线程应用程序和多个应用程序实例可能在事件存储中存储事件。事件存储中事件的一致性是至关重要的,影响特定实体的事件顺序也是如此(对实体进行更改的顺序会影响其当前状态)。为每个事件添加时间戳是一种有助于避免问题的选项。另一种常见的做法是使用增量标识符对请求产生的每个事件进行注释。如果两个操作试图同时为同一实体添加事件,则事件存储区可以拒绝与现有实体标识符和事件标识符匹配的事件。

  • There is no standard approach, or ready-built mechanisms such as SQL queries, for reading the events to obtain information. The only data that can be extracted is a stream of events using an event identifier as the criteria. The event ID typically maps to individual entities. The current state of an entity can be determined only by replaying all of the events that relate to it against the original state of that entity.

    没有标准的方法或现成的机制(如 SQL 查询)来读取事件以获取信息。唯一可以提取的数据是使用事件标识符作为条件的事件流。事件 ID 通常映射到各个实体。实体的当前状态只能通过针对该实体的原始状态重播与其相关的所有事件来确定。

  • The length of each event stream can have consequences on managing and updating the system. If the streams are large, consider creating snapshots at specific intervals such as a specified number of events. The current state of the entity can be obtained from the snapshot and by replaying any events that occurred after that point in time.

    每个事件流的长度可能对管理和更新系统产生影响。如果流很大,请考虑以特定的间隔创建快照,例如指定数量的事件。可以从快照中获取实体的当前状态,也可以通过重播在该时间点之后发生的任何事件来获取。

    Note

    注意

    For more information about creating snapshots of data, see Snapshot on Martin Fowler’s Enterprise Application Architecture website and Master-Subordinate Snapshot Replication on MSDN.

    有关创建数据快照的详细信息,请参阅 Martin Fowler 的企业应用程序架构网站上的快照和 MSDN 上的主从快照复制。

  • Even though event sourcing minimizes the chance of conflicting updates to the data, the application must still be able to deal with inconsistencies that may arise through eventual consistency and the lack of transactions. For example, an event that indicates a reduction in stock inventory might arrive in the data store while an order for that item is being placed, resulting in a requirement to reconcile the two operations; probably by advising the customer or creating a back order.

    即使事件源最大限度地减少了数据更新冲突的可能性,应用程序仍然必须能够处理由于最终一致性和缺乏事务而可能出现的不一致。例如,一个表明库存减少的事件可能在下订单时到达数据存储区,导致需要协调这两个操作; 可能是通过建议客户或创建延迟订单。

  • Event publication may be “at least once,” and so consumers of the events must be idempotent. They must not reapply the update described in an event if the event is handled more than once. For example, if multiple instances of a consumer maintain an aggregate of a property of some entity, such as the total number of orders placed, only one must succeed in incrementing the aggregate when an “order placed” event occurs. While this is not an intrinsic characteristic of event sourcing, it is the usual implementation decision.

    事件发布可能是“至少一次”,因此事件的使用者必须是幂等的。如果事件处理多次,则不能重新应用事件中描述的更新。例如,如果使用者的多个实例维护某个实体的属性的聚合,例如已下订单的总数,那么在发生“已下订单”事件时,只有一个实例必须成功地增加该聚合。虽然这不是事件源的固有特征,但它是通常的实现决策。

When to Use this Pattern 何时使用此模式

This pattern is ideally suited to the following scenarios:

这种模式非常适合下列情况:

  • When you want to capture “intent,” “purpose,” or “reason” in the data. For example, changes to a customer entity may be captured as a series of specific event types such as 当您希望在数据中捕获“意图”、“目的”或“原因”时。例如,对客户实体的更改可以被捕获为一系列特定的事件类型,例如Moved home 搬回家了, Closed account 封闭账户, or 或者Deceased 死了.
  • When it is vital to minimize or completely avoid the occurrence of conflicting updates to data. 在必须尽量减少或完全避免数据更新冲突的情况下
  • When you want to record events that occur, and be able to replay them to restore the state of a system; use them to roll back changes to a system; or simply as a history and audit log. For example, when a task involves multiple steps you may need to execute actions to revert updates and then replay some steps to bring the data back into a consistent state. 当您希望记录发生的事件并能够重播它们以恢复系统的状态时,可以使用它们回滚对系统的更改; 或者仅仅将其作为历史记录和审计日志。例如,当一个任务涉及多个步骤时,您可能需要执行操作来恢复更新,然后重播一些步骤来将数据恢复到一致的状态
  • When using events is a natural feature of the operation of the application, and requires little additional development or implementation effort. 当使用事件是应用程序操作的一个自然特性,并且几乎不需要额外的开发或实现工作
  • When you need to decouple the process of inputting or updating data from the tasks required to apply these actions. This may be to improve UI performance, or to distribute events to other listeners such as other applications or services that must take some action when the events occur. An example would be integrating a payroll system with an expenses submission website so that events raised by the event store in response to data updates made in the expenses submission website are consumed by both the website and the payroll system. 当需要将输入或更新数据的过程与应用这些操作所需的任务分离时。这可能是为了提高 UI 性能,也可能是为了将事件分发给其他侦听器,例如在事件发生时必须采取某些操作的其他应用程序或服务。一个例子是将发薪系统与费用提交网站结合起来,使网站和发薪系统都能使用活动存储根据费用提交网站上的数据更新提出的活动
  • When you want flexibility to be able to change the format of materialized models and entity data if requirements change, or—when used in conjunction with CQRS—you need to adapt a read model or the views that expose the data. 当您希望灵活性能够在需求发生变化时更改物化模型和实体数据的格式时,或者ーー当与 CQRS 一起使用时ーー需要调整读取模型或公开数据的视图
  • When used in conjunction with CQRS, and eventual consistency is acceptable while a read model is updated or, alternatively, the performance impact incurred in rehydrating entities and data from an event stream is acceptable. 当与 CQRS 结合使用时,在更新读取模型时,最终一致性是可以接受的,或者,在再水化实体和来自事件流的数据时引起的性能影响是可以接受的

This pattern might not be suitable in the following situations:

这种模式可能不适用于下列情况:

  • Small or simple domains, systems that have little or no business logic, or non-domain systems that naturally work well with traditional CRUD data management mechanisms. 小的或简单的域,很少或没有业务逻辑的系统,或者与传统 CRUD 数据管理机制自然协作良好的非域系统
  • Systems where consistency and real-time updates to the views of the data are required. 需要对数据视图进行一致性和实时更新的系统
  • Systems where audit trails, history, and capabilities to roll back and replay actions are not required. 不需要回滚和重播操作的审计跟踪、历史记录和能力的系统
  • Systems where there is only a very low occurrence of conflicting updates to the underlying data. For example, systems that predominantly add data rather than updating it. 对基础数据进行冲突性更新的发生率非常低的系统。例如,主要添加数据而不是更新数据的系统

Example 例子

A conference management system needs to track the number of completed bookings for a conference so that it can check whether there are seats still available when a potential attendee tries to make a new booking. The system could store the total number of bookings for a conference in at least two ways:

会议管理系统需要跟踪已完成的会议预订数量,以便在潜在与会者试图进行新的预订时检查是否还有座位可用。该系统至少可以通过两种方式存储会议的总预订量:

  • The system could store the information about the total number of bookings as a separate entity in a database that holds booking information. As bookings are made or cancelled, the system could increment or decrement this number as appropriate. This approach is simple in theory, but can cause scalability issues if a large number of attendees are attempting to book seats during a short period of time. For example, in the last day or so prior to the booking period closing. 该系统可以将有关预订总数的信息作为一个单独的实体存储在一个保存预订信息的数据库中。当预订完成或取消时,系统可以适当地增加或减少这个数字。这种方法在理论上很简单,但是如果大量的与会者试图在短时间内预订座位,就会导致可伸缩性问题。例如,在预订期结束前的最后一天左右
  • The system could store information about bookings and cancellations as events held in an event store. It could then calculate the number of seats available by replaying these events. This approach can be more scalable due to the immutability of events. The system only needs to be able to read data from the event store, or to append data to the event store. Event information about bookings and cancellations is never modified. 系统可以将有关预订和取消的信息存储为事件存储中的事件。然后,它可以通过重播这些事件来计算可用座位的数量。由于事件的不可变性,这种方法可以更具可伸缩性。系统只需要能够从事件存储区读取数据,或者向事件存储区追加数据。有关预订和取消的事件信息永远不会被修改

Figure 2 shows how the seat reservation sub-system of the conference management system might be implemented by using event sourcing.

图2显示了如何通过使用事件源实现会议管理系统的座位预订子系统。

在这里插入图片描述
Figure 2 - Using event sourcing to capture information about seat reservations in a conference management system

图2-使用事件源获取会议管理系统中有关座位预订的信息

The sequence of actions for reserving two seats is as follows:

保留两个席位的行动顺序如下:

  1. The user interface issues a command to reserve seats for two attendees. The command is handled by a separate command handler (a piece of logic that is decoupled from the user interface and is responsible for handling requests posted as commands).

    用户界面发出一个命令,为两名与会者保留座位。该命令由一个单独的命令处理程序(与用户界面分离的逻辑片段,负责处理作为命令发布的请求)处理。

  2. An aggregate containing information about all reservations for the conference is constructed by querying the events that describe bookings and cancellations. This aggregate is called SeatAvailability, and is contained within a domain model that exposes methods for querying and modifying the data in the aggregate.

    通过查询描述预订和取消的活动,构建了一个包含会议所有预订信息的聚合。此聚合称为 SeatUtiability,包含在一个域模型中,该模型公开用于查询和修改聚合中的数据的方法。

    Note

    注意

    Some optimizations to consider are using snapshots (so that you don’t need to query and replay the full list of events to obtain the current state of the aggregate), and maintaining a cached copy of the aggregate in memory.

    需要考虑的一些优化包括使用快照(这样您就不需要查询和重播完整的事件列表来获得聚合的当前状态) ,以及在内存中维护聚合的缓存副本。

  3. The command handler invokes a method exposed by the domain model to make the reservations.

    命令处理程序调用域模型公开的方法来进行预订。

  4. The SeatAvailability aggregate records an event containing the number of seats that were reserved. The next time the aggregate applies events, all the reservations will be used to compute how many seats remain.

    “可用座位数”聚合记录包含预留座位数的事件。下一次总体应用事件时,所有的预订将被用来计算还剩下多少席位。

  5. The system appends the new event to the list of events in the event store.

    系统将新事件追加到事件存储区中的事件列表中。

If a user wishes to cancel a seat, the system follows a similar process except that the command handler issues a command that generates a seat cancellation event and appends it to the event store

如果用户希望取消座位,系统会遵循类似的过程,只不过命令处理程序会发出一个命令,生成一个座位取消事件并将其附加到事件存储区

As well as providing more scope for scalability, using an event store also provides a complete history, or audit trail, of the bookings and cancellations for a conference. The events recorded in the event store are the definitive and only source of truth. There is no need to persist aggregates in any other way because the system can easily replay the events and restore the state to any point in time.

除了提供更大的可伸缩性范围外,使用事件存储还可以提供会议预订和取消的完整历史记录或审计线索。事件存储中记录的事件是确定的和唯一的真相来源。不需要以任何其他方式持久化聚合,因为系统可以轻松地重播事件并将状态恢复到任何时间点。

Note

注意

You can find more information about this example in the chapter Introducing Event Sourcing in the patterns & practices guide CQRS Journey on MSDN.

您可以在 MSDN 上的模式与实践指南 CQRS Journey 的介绍事件源一章中找到关于此示例的更多信息。

Related Patterns and Guidance 相关模式及指引

The following patterns and guidance may also be relevant when implementing this pattern:

下列模式和指南在实现此模式时也可能有用:

  • Command and Query Responsibility Segregation (CQRS) Pattern 命令和查询责任分离(CQRS)模式. The write store that provides the immutable source of information for a CQRS implementation is often based on an implementation of the Event Sourcing pattern. The Command and Query Responsibility Segregation pattern describes how to segregate the operations that read data in an application from the operations that update data by using separate interfaces. .为 CQRS 实现提供不可变信息源的写存储通常基于事件源模式的实现。命令和查询责任分离模式描述如何将应用程序中读取数据的操作与使用单独接口更新数据的操作分离
  • Materialized View Pattern 实体化视图模式. The data store used in a system based on event sourcing is typically not well suited to efficient querying. Instead, a common approach is to generate pre-populated views of the data at regular intervals, or when the data changes. The Materialized View pattern shows how this can be achieved. .基于事件源的系统中使用的数据存储通常不太适合高效的查询。相反,一种常见的方法是定期生成数据的预填充视图,或者在数据发生变化时生成。物化视图模式显示了如何实现这一点
  • Compensating Transaction Pattern 补偿事务模式. The existing data in an event sourcing store is not updated; instead new entries are added that transition the state of entities to the new values. To reverse a change, compensating entries are used because it is not possible to simply reverse the previous change. The Compensating Transaction pattern describes how to undo the work that was performed by a previous operation. .事件源存储中的现有数据不更新; 而是添加将实体状态转换为新值的新条目。要逆转更改,需要使用补偿条目,因为不可能简单地逆转以前的更改。补偿事务模式描述如何撤消以前操作执行的工作
  • Data Consistency Primer 数据一致性入门. When using event sourcing with a separate read store or materialized views, the read data will not be immediately consistent; instead it will be only eventually consistent. The Data Consistency Primer summarizes the issues surrounding maintaining consistency over distributed data. .当使用带有单独读存储或物化视图的事件源时,读数据不会立即保持一致; 相反,它只会最终保持一致。数据一致性入门总结了围绕在分布式数据上维护一致性的问题
  • Data Partitioning Guidance 数据分区指南. Data is often partitioned when using event sourcing in order to improve scalability, reduce contention, and optimize performance. The Data Partitioning****Guidance describes how to divide data into discrete partitions, and the issues that can arise. .在使用事件源时,为了提高可伸缩性、减少争用和优化性能,数据通常是分区的。数据分区 * * * * 指南描述了如何将数据划分为离散的分区,以及可能出现的问题

External Configuration Store Pattern 外部配置存储模式

  • Article文章
  • 08/26/2015 2015年8月26日
  • 11 minutes to read还有11分钟

In this article
在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述
Move configuration information out of the application deployment package to a centralized location. This pattern can provide opportunities for easier management and control of configuration data, and for sharing configuration data across applications and application instances.

将配置信息从应用程序部署包移到集中的位置。此模式可以提供更容易地管理和控制配置数据的机会,以及跨应用程序和应用程序实例共享配置数据的机会。

Context and Problem 背景与问题

The majority of application runtime environments include configuration information that is held in files deployed with the application, located within the application folders. In some cases it is possible to edit these files to change the behavior of the application after it has been deployed. However, in many cases, changes to the configuration require the application to be redeployed, resulting in unacceptable downtime and additional administrative overhead.

大多数应用程序运行时环境包含配置信息,这些信息保存在与应用程序一起部署的文件中,位于应用程序文件夹中。在某些情况下,可以编辑这些文件,以便在应用程序部署后更改其行为。但是,在许多情况下,对配置的更改需要重新部署应用程序,从而导致不可接受的停机时间和额外的管理开销。

Local configuration files also limit the configuration to a single application, whereas in some scenarios it would be useful to share configuration settings across multiple applications. Examples include database connection strings, UI theme information, or the URLs of queues and storage used by a related set of applications.

本地配置文件还将配置限制为单个应用程序,而在某些情况下,跨多个应用程序共享配置设置将非常有用。示例包括数据库连接字符串、 UI 主题信息或相关应用程序集使用的队列和存储的 URL。

Managing changes to local configurations across multiple running instances of the application, especially in a cloud-hosted scenario, may also be challenging. It may result in instances using different configuration settings while the update is being deployed.

跨应用程序的多个运行实例管理对本地配置的更改,特别是在云托管的场景中,也可能具有挑战性。在部署更新时,它可能导致实例使用不同的配置设置。

In addition, updates to applications and components may require changes to configuration schemas. Many configuration systems do not support different versions of configuration information.

此外,对应用程序和组件的更新可能需要更改配置模式。许多配置系统不支持不同版本的配置信息。

Solution 解决方案

Store the configuration information in external storage, and provide an interface that can be used to quickly and efficiently read and update configuration settings. The type of external store depends on the hosting and runtime environment of the application. In a cloud-hosted scenario it is typically a cloud-based storage service, but could be a hosted database or other system.

将配置信息存储在外部存储器中,并提供一个可用于快速有效地读取和更新配置设置的接口。外部存储的类型取决于应用程序的托管和执行期函式库。在云托管场景中,它通常是基于云的存储服务,但也可以是托管数据库或其他系统。

The backing store chosen for configuration information should be fronted by a suitable interface that provides consistent and easy to use access in a controlled way that enables reuse. Ideally, it should expose the information in a correctly typed and structured format. The implementation may also need to authorize users’ access in order to protect configuration data, and be flexible enough to allow multiple versions of the configuration (such as development, staging, or production, and multiple release versions of each one) to be stored.

选择用于配置信息的后备存储应该由一个合适的接口前置,该接口以一种可控的方式提供一致的、易于使用的访问,从而支持重用。理想情况下,它应该以正确类型和结构化的格式公开信息。实现可能还需要授权用户的访问,以保护配置数据,并且要足够灵活,允许存储配置的多个版本(如开发、登台或生产,以及每个版本的多个发布版本)。

Note

注意

Many built-in configuration systems read the data when the application starts up, and cache the data in memory to provide fast access and to minimize the impact on application performance. Depending on the type of backing store used, and the latency of this store, it might be advantageous to implement a caching mechanism within the external configuration store. For more information about implementing caching, see the Caching Guidance.

许多内置的配置系统在应用程序启动时读取数据,并在内存中缓存数据,以提供快速访问并尽量减少对应用程序性能的影响。根据所使用的后台存储的类型和此存储的延迟,在外部配置存储中实现缓存机制可能是有利的。有关实现缓存的详细信息,请参阅缓存指南。

Figure 1 shows an overview of this pattern.

图1显示了此模式的概述。

在这里插入图片描述
Figure 1 - An overview of the External Configuration Store pattern with optional local cache

图1-带有可选本地缓存的外部配置存储模式的概述

Issues and Considerations 问题及考虑

Consider the following points when deciding how to implement this pattern:

在决定如何实现此模式时,请考虑以下几点:

  • Choose a backing store that offers acceptable performance, high availability, robustness, and can be backed up as part of the application maintenance and administration process. In a cloud-hosted application, using a cloud storage mechanism is usually a good choice to meet these requirements. 选择一个能够提供可接受的性能、高可用性、健壮性,并且可以作为应用程序维护和管理过程的一部分进行备份的后备存储。在云托管的应用程序中,使用云存储机制通常是满足这些需求的不错选择
  • Design the schema of the backing store to allow flexibility in the types of information it can hold. Ensure that it provides for all configuration requirements such as typed data, collections of settings, multiple versions of settings, and any other features that the applications using it may require. The schema should be easy to extend as requirements change in order to support additional settings. 设计后台存储的模式,以允许其可以保存的信息类型具有灵活性。确保它提供了所有配置要求,例如类型化数据、设置集合、设置的多个版本以及使用它的应用程序可能需要的任何其他特性。模式应该很容易随着需求的变化而扩展,以支持其他设置
  • Consider the physical capabilities of the backing store, how it relates to the way that configuration information is stored, and the effects on performance. For example, storing an XML document containing configuration information will require either the configuration interface or the application to parse the document in order to read individual settings, and will make updating a setting more complicated, though caching the settings can help to offset slower read performance. 考虑后台存储的物理能力、它与配置信息存储方式的关系以及对性能的影响。例如,存储包含配置信息的 XML 文档将需要配置接口或应用程序解析文档以读取单个设置,并且将使更新设置变得更复杂,尽管缓存设置可以帮助抵消较慢的读取性能
  • Consider how the configuration interface will permit control of the scope and inheritance of configuration settings. For example, it may be a requirement to scope configuration settings at the organization, application, and the machine level; to support delegation of control over access to different scopes; and to prevent or allow individual applications to override settings. 考虑配置接口将如何允许控制配置设置的作用域和继承性。例如,可能需要在组织、应用程序和计算机级别确定配置设置的范围; 支持对访问不同范围的控制权限的委托; 以及防止或允许单个应用程序重写设置
  • Ensure that the configuration interface can expose the configuration data in the required formats such as typed values, collections, key/value pairs, or property bags. However, consider the balance between capabilities and complexity of the API in order to make it useful and yet as easy to use as possible. 确保配置接口可以以所需的格式公开配置数据,例如类型化值、集合、键/值对或属性袋。然而,请考虑 API 的功能和复杂性之间的平衡,以便使它变得有用并且尽可能容易使用
  • Consider how the configuration store interface will behave when settings contain errors, or do not exist in the backing store. It may be appropriate to return default settings and log errors. Also consider aspects such as the case sensitivity of configuration setting keys or names, the storage and handling of binary data, and the ways that null or empty values are handled. 考虑配置存储区接口在设置包含错误或后台存储区中不存在时的行为。返回默认设置和日志错误可能是合适的。还要考虑一些方面,比如配置设置键或名称的大小写敏感性、二进制数据的存储和处理,以及处理 null 或空值的方式
  • Consider how you will protect the configuration data to allow access only to the appropriate users and applications. This is likely to be a feature of the configuration store interface, but it is also necessary to ensure that the data in the backing store cannot be accessed directly without the appropriate permission. Ensure strict separation between the permissions required to read and to write configuration data. Also consider whether you need to encrypt some or all of the configuration settings, and how this will be implemented within the configuration store interface. 考虑如何保护配置数据,以便只允许对适当的用户和应用程序进行访问。这可能是配置存储区接口的一个特性,但是还需要确保在没有适当权限的情况下不能直接访问后备存储区中的数据。确保读写配置数据所需权限之间的严格分离。还要考虑是否需要对部分或全部配置设置进行加密,以及如何在配置存储接口中实现这些设置
  • Keep in mind that centrally stored configurations, which change application behavior during runtime, are critically important and should be deployed, updated, and managed using the same mechanisms as deploying application code. For example, changes that can affect more than one application must be carried out using a full test and staged deployment approach to ensure that the change is appropriate for all applications that use this configuration. If an administrator simply edits a setting to update one application, it could adversely impact other applications that use the same setting. 请记住,在运行期间改变应用程序行为的集中存储配置非常重要,应该使用与部署应用程序代码相同的机制进行部署、更新和管理。例如,可能影响多个应用程序的更改必须使用完全测试和分阶段部署方法来执行,以确保更改适合所有使用此配置的应用程序。如果管理员只是编辑设置来更新一个应用程序,则可能会对使用相同设置的其他应用程序产生不利影响
  • If an application caches configuration information, the application may need to be alerted if the configuration changes. It may be possible to implement an expiration policy over cached configuration data so that this information is automatically refreshed periodically and any changes picked up (and actioned). The 如果应用程序缓存配置信息,则在配置更改时可能需要通知应用程序。可以对缓存的配置数据实现过期策略,以便定期自动刷新这些信息,并接收(并执行)任何更改。那个Runtime Reconfiguration pattern 运行时重新配置模式 described elsewhere in this guide may be relevant to your scenario. 本指南其他地方所描述的可能与您的方案有关

When to Use this Pattern 何时使用此模式

This pattern is ideally suited for:

这种模式非常适合:

  • Configuration settings that are shared between multiple applications and application instances, or where a standard configuration must be enforced across multiple applications and application instances. 在多个应用程序和应用程序实例之间共享的配置设置,或者必须跨多个应用程序和应用程序实例执行标准配置的配置设置
  • Where the standard configuration system does not support all of the required configuration settings, such as storing images or complex data types. 标准配置系统不支持所有必需的配置设置,如存储映像或复杂数据类型
  • As a complementary store for some of the settings for applications, perhaps allowing applications to override some or all of the centrally-stored settings. 作为应用程序某些设置的补充存储,可能允许应用程序覆盖部分或全部中央存储设置
  • As a mechanism for simplifying administration of multiple applications, and optionally for monitoring use of configuration settings by logging some or all types of access to the configuration store. 作为一种简化多个应用程序管理的机制,还可以通过记录对配置存储区的部分或全部访问类型来监视配置设置的使用

Example 例子

In a Microsoft Azure hosted application, a typical choice for storing configuration information externally is to use Azure storage. This is resilient, offers high performance, and is replicated three times with automatic failover to offer high availability. Azure tables provide a key/value store with the capability to use a flexible schema for the values. Azure blob storage provides a hierarchical container-based store that can hold any type of data in individually named blobs.

在 MicrosoftAzure 托管应用程序中,在外部存储配置信息的典型选择是使用 Azure 存储。这是弹性,提供了高性能,并与自动故障转移复制三次,以提供高可用性。Azure 表提供了一个键/值存储,能够为值使用灵活的模式。Azure blob 存储提供了一个基于容器的分层存储,可以将任何类型的数据保存在单独命名的 blob 中。

The following example shows how a configuration store can be implemented over Azure blob storage to store and expose configuration information. The BlobSettingsStore class abstracts blob storage for holding configuration information, and implements the ISettingsStore interface shown in the following code.

下面的示例演示如何在 Azure blob 存储上实现配置存储,以存储和公开配置信息。BlobSettingsStore 类抽象 blob 存储以保存配置信息,并实现以下代码所示的 ISettingsStore 接口。

Note

注意

This code is provided in the ExternalConfigurationStore.Cloud project in the ExternalConfigurationStore solution. This solution is available for download with this guidance.

此代码在 ExternalConfigurationStore 中提供。ExternalConfigurationStore 解决方案中的云项目。此解决方案可通过本指南下载。

C# C #Copy 收到

public interface IsettingsStore{  string Version { get; }  Dictionary<string, string> FindAll();  void Update(string key, string value);}

This interface defines methods for retrieving and updating configuration settings held in the configuration store, and includes a version number that can be used to detect whether any configuration settings have been modified recently. When a configuration setting is updated, the version number changes. The BlobSettingsStore class uses the ETag property of the blob to implement versioning. The ETag property of a blob is updated automatically each time the blob is written.

此接口定义用于检索和更新配置存储区中保存的配置设置的方法,并包括可用于检测最近是否修改了任何配置设置的版本号。更新配置设置时,版本号发生更改。BlobSettingsStore 类使用 blob 的 ETag 属性来实现版本控制。每次写入 blob 时,blob 的 ETag 属性都会自动更新。

Note

注意

Note that, by design, this simple solution exposes all configuration settings as string values rather than typed values.

注意,根据设计,这个简单的解决方案将所有配置设置公开为字符串值,而不是类型化值。

The ExternalConfigurationManager class provides a wrapper around a BlobSettingsStore object. An application can use this class to store and retrieve configuration information. This class uses the Microsoft Reactive Extensions library to expose any changes made to the configuration through an implementation of the IObservable interface. If a setting is modified by calling the SetAppSetting method, the Changed event is raised and all subscribers to this event will be notified.

ExternalConfigurationManager 类为 BlobSettingsStore 对象提供了一个包装器。应用程序可以使用此类存储和检索配置信息。此类使用 Microsoft 反应扩展库公开通过 IObserver 接口的实现对配置所做的任何更改。如果通过调用 SetAppSet 方法修改设置,将引发 Changed 事件,并通知此事件的所有订阅方。

Note that all settings are also cached in a Dictionary object inside the ExternalConfigurationManager class for fast access. The SetAppSetting method updates this cache, and the GetSetting method that an application can use to retrieve a configuration setting reads the data from the cache (if the setting is not found in the cache, it is retrieved from the BlobSettingsStore object instead).

请注意,所有设置都缓存在 ExternalConfigurationManager 类中的 Dictionary 对象中,以便快速访问。SetAppSet 方法更新这个缓存,应用程序可以用来检索配置设置的 GetSet 方法从缓存中读取数据(如果在缓存中没有找到该设置,则从 BlobSettingsStore 对象中检索该数据)。

The GetSettings method invokes the CheckForConfigurationChanges method to detect whether the configuration information in blob storage has changed by examining the version number and comparing it with the current version number held by the ExternalConfigurationManager object. If one or more changes have occurred, the Changed event is raised and the configuration settings cached in the Dictionary object are refreshed. This is an application of the Cache-Aside pattern.

GetSettings 方法调用 CheckForConfigurationChanges 方法,通过检查版本号并将其与 ExternalConfigurationManager 对象持有的当前版本号进行比较,来检测 blob 存储中的配置信息是否已更改。如果发生了一个或多个更改,则引发 Changed 事件并刷新 Dictionary 对象中缓存的配置设置。这是 Cache-Side 模式的一个应用程序。

The following code sample shows how the Changed event, the SetAppSettings method, the GetSettings method, and the CheckForConfigurationChanges method are implemented

下面的代码示例演示如何实现 Changed 事件、 SetAppSettings 方法、 GetSettings 方法和 CheckForConfigurationChanges 方法

C# C #Copy 收到

public class ExternalConfigurationManager : IDisposable{  // An abstraction of the configuration store.  private readonly ISettingsStore settings;  private readonly ISubject<KeyValuePair<string, string>> changed;  ...  private Dictionary<string, string> settingsCache;  private string currentVersion;  ...  public ExternalConfigurationManager(ISettingsStore settings, ...)  {    this.settings = settings;    ...  }  ...  public IObservable<KeyValuePair<string, string>> Changed  {    get { return this.changed.AsObservable(); }  }  ...  public void SetAppSetting(string key, string value)  {    ...    // Update the setting in the store.    this.settings.Update(key, value);    // Publish the event.    this.Changed.OnNext(         new KeyValuePair<string, string>(key, value));    // Refresh the settings cache.    this.CheckForConfigurationChanges();  }  public string GetAppSetting(string key)  {    ...    // Try to get the value from the settings cache.      // If there is a miss, get the setting from the settings store.    string value;    if (this.settingsCache.TryGetValue(key, out value))    {      return value;    }    // Check for changes and refresh the cache.    this.CheckForConfigurationChanges();    return this.settingsCache[key];  }  ...  private void CheckForConfigurationChanges()  {    try    {      // Assume that updates are infrequent. Lock to avoid      // race conditions when refreshing the cache.      lock (this.settingsSyncObject)      {          {        var latestVersion = this.settings.Version;        // If the versions differ, the configuration has changed.        if (this.currentVersion != latestVersion)        {          // Get the latest settings from the settings store and publish the changes.          var latestSettings = this.settings.FindAll();          latestSettings.Except(this.settingsCache).ToList().ForEach(                                kv => this.changed.OnNext(kv));          // Update the current version.          this.currentVersion = latestVersion;          // Refresh settings cache.          this.settingsCache = latestSettings;        }      }    }    catch (Exception ex)    {      this.changed.OnError(ex);    }  }}

Note

注意

The ExternalConfigurationManager class also provides a property named Environment. The purpose of this property is to support varying configurations for an application running in different environments, such as staging and production.

ExternalConfigurationManager 类还提供一个名为 Environment 的属性。此属性的用途是支持在不同环境(比如暂存和生产环境)中运行的应用程序的不同配置。

An ExternalConfigurationManager object can also query the BlobSettingsStore object periodically for any changes (by using a timer). The StartMonitor and StopMonitor methods illustrated in the code sample below start and stop the timer. The OnTimerElapsed method runs when the timer expires and invokes the CheckForConfigurationChanges method to detect any changes and raise the Changed event, as described earlier.

ExternalConfigurationManager 对象还可以定期查询 BlobSettingsStore 对象以获取任何更改(通过使用计时器)。下面的代码示例中演示的 StartMonitor 和 StopMonitor 方法启动和停止计时器。OnTimerElapsed 方法在计时器过期时运行,并调用 CheckForConfigurationChanges 方法来检测任何更改并引发 Changed 事件,如前所述。

C# C #Copy 收到

public class ExternalConfigurationManager : IDisposable{  ...  private readonly ISubject<KeyValuePair<string, string>> changed;  private readonly Timer timer;  private ISettingsStore settings;  ...  public ExternalConfigurationManager(ISettingsStore settings,                                       TimeSpan interval, ...)  {    ...    // Set up the timer.    this.timer = new Timer(interval.TotalMilliseconds)    {      AutoReset = false;    };    this.timer.Elapsed += this.OnTimerElapsed;    this.changed = new Subject<KeyValuePair<string, string>>();    ...      }  ...  public void StartMonitor()  {    if (this.timer.Enabled)    {      return;    }    lock (this.timerSyncObject)    {      if (this.timer.Enabled)      {        return;      }      this.keepMonitoring = true;      // Load the local settings cache.      this.CheckForConfigurationChanges();      this.timer.Start();    }  }  public void StopMonitor()  {    lock (this.timerSyncObject)    {      this.keepMonitoring = false;      this.timer.Stop();    }  }  private void OnTimerElapsed(object sender, EventArgs e)  {    Trace.TraceInformation(          "Configuration Manager: checking for configuration changes.");    try    {      this.CheckForConfigurationChanges();    }    finally    {      ...      // Restart the timer after each interval.      this.timer.Start();      ...    }      }    ...}

The ExternalConfigurationManager class is instantiated as a singleton instance by the ExternalConfiguration class shown below.

ExternalConfigurationManager 类由下面所示的 ExternalConfiguration 类实例化为单例实例。

C# C #Copy 收到

public static class ExternalConfiguration{  private static readonly Lazy<ExternalConfigurationManager> configuredInstance                             = new Lazy<ExternalConfigurationManager>(    () =>    {      var environment = CloudConfigurationManager.GetSetting("environment");      return new ExternalConfigurationManager(environment);    });  public static ExternalConfigurationManager Instance  {    get { return configuredInstance.Value; }  }}

The following code is taken from the WorkerRole class in the ExternalConfigurationStore.Cloud project. It shows how the application uses the ExternalConfiguration class to read and update a setting.

下面的代码取自 ExternalConfigurationStore 中的 WorkerRole 类。云计划。它显示应用程序如何使用 ExternalConfiguration 类读取和更新设置。

C# C #Copy 收到

public override void Run(){  // Start monitoring for configuration changes.  ExternalConfiguration.Instance.StartMonitor();  // Get a setting.  var setting = ExternalConfiguration.Instance.GetAppSetting("setting1");  Trace.TraceInformation("Worker Role: Get setting1, value: " + setting);  Thread.Sleep(TimeSpan.FromSeconds(10));  // Update a setting.  Trace.TraceInformation("Worker Role: Updating configuration");  ExternalConfiguration.Instance.SetAppSetting("setting1", "new value");  this.completeEvent.WaitOne();}

The following code, also from the WorkerRole class, shows how the application subscribes to configuration events.

下面的代码(也来自 WorkerRole 类)显示应用程序如何订阅配置事件。

C# C #Copy 收到

public override bool OnStart(){   ...  // Subscribe to the event.  ExternalConfiguration.Instance.Changed.Subscribe(     m => Trace.TraceInformation("Configuration has changed. Key:{0} Value:{1}",           m.Key, m.Value),     ex => Trace.TraceError("Error detected: " + ex.Message));
  ...
}

Related Patterns and Guidance 相关模式及指引

The following pattern may also be relevant when implementing this pattern:

在实现此模式时,下列模式也可能是相关的:

  • Runtime Reconfiguration Pattern 运行时重新配置模式. In addition to storing configuration externally, it is useful to be able to update configuration settings and have the changes applied without restarting the application. The Runtime Reconfiguration pattern describes how to design an application so that it can be reconfigured without requiring redeployment or restarting. .除了在外部存储配置外,能够更新配置设置并在不重新启动应用程序的情况下应用更改也很有用。运行时重新配置模式描述如何设计应用程序,以便可以重新配置它,而不需要重新部署或重新启动
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值