1.前言
上一篇我们讨论了云计算设计模式之领导选拔模式,讨论了分布式环境中的多任务处理系统如何协调任务执行顺序和共享资源调度.这一篇我们来讨论下通过数据预处理来提升数据查询性能的模式,这种方式与之前我们提到的索引表模式的目的是一致的,都是优化数据查询性能.
2.概念
在SQL Server 数据库中,数据读取存在逻辑读和物理读的概念.逻辑读是从缓存中进行读取,而物理读是从硬盘上读取.通常展示在界面上的数据来自来自于数据库中多张表连接查询的结果.如果连接表很多,并且数据量很大,那么每次查询性能势必非常低下,那么如何解决这个问题呢?
官方的做法是建立一个将要展示在界面上的数据模型,通过Task轮询的方式把需要展示的数据生成存入该模型中,页面上查询的时候,直接从这个数据表中读取数据,而不必再从数据库中做多表联合查询,这样就能够提升性能.如下图所示:
如上图所示,通过Task提前计算并存储查询需要的数据,存储数据是只读模式,没有其他入口可以修改数据.
在使用这个模式的时候有以下几个问题需要考虑:
1)需要考虑何时及如何更新View中的数据。
2)在使用了事件溯源模式(Event sourcing Pattern)的系统中,存储的是针对数据进行操作的时间,这时候就需要一个View来表示数据的当前状态了,使用这种模式最恰当了.
3)需要考虑View中的数据一致性的影响,并且需要考虑何时更新View,特别是通过消息队列来更新View.
4)考虑View应该存储在哪里。View可以与原始数据存储在不同的地方。
5)如果Materialized View仅仅只是为了提升查询性能,表现当前数据的状态,那么View最好存储在缓存中或者其他瞬时存在的位置,如果丢失则不可重建。
6)View中的数据项尽可能全,在查询的时候能够兼顾到尽量多的查询情形.
7)如果查询机制允许,尽量为View创建索引表,具体参见索引表模式。
关于何时使用及何时不应该使用这种模式,官方说法如下:
This pattern is ideally suited for:
- Creating materialized views over data that is difficult to query directly, or where queries must be very complex in order to extract data that is stored in a normalized, semi-structured, or unstructured way.
- Creating temporary views that can dramatically improve query performance, or can act directly as source views or data transfer objects (DTOs) for the UI, for reporting, or for display.
- Supporting occasionally connected or disconnected scenarios where connection to the data store is not always available. The view may be cached locally in this case.
- Simplifying queries and exposing data for experimentation in a way that does not require knowledge of the source data format. For example, by joining different tables in one or more databases, or one or more domains in NoSQL stores, and then formatting the data to suit its eventual use.
- Providing access to specific subsets of the source data that, for security or privacy reasons, should not be generally accessible, open to modification, or fully exposed to users.
- Bridging the disjoint when using different data stores based on their individual capabilities. For example, by using a cloud store that is efficient for writing as the reference data store, and a relational database that offers good query and read performance to hold the materialized views.
This pattern might not be suitable in the following situations:
- The source data is simple and easy to query.
- The source data changes very quickly, or can be accessed without using a view. The processing overhead of creating views may be avoidable in these cases.
- Consistency is a high priority. The views may not always be fully consistent with the original data.
下面的图很清晰地描述了这种模式:
4.相关阅读
The following patterns and guidance may also be relevant when implementing this pattern:
- Data Consistency Primer. It is necessary to maintain the summary information held in a materialized view so that it reflects the underlying data values. As the data values change, it may not be feasible to update the summary data in real time, and instead an eventually consistent approach must be adopted. The Data Consistency primer summarizes the issues surrounding maintaining consistency over distributed data, and describes the benefits and tradeoffs of different consistency models.
- Command and Query Responsibility Segregation (CQRS) Pattern. You may be able to use this pattern to update the information in a materialized view by responding to events that occur when the underlying data values change.
- Event Sourcing Pattern. You can use this pattern in conjunction with the CQRS pattern to maintain the information in a materialized view. When the data values on which a materialized view is based are modified, the system can raise events that describe these modifications and save them in an event store.
- Index Table Pattern. The data in a materialized view is typically organized by a primary key, but queries may need to retrieve information from this view by examining data in other fields. You can use the Index Table pattern to create secondary indexes over data sets for data stores that do not support native secondary indexes.