6.需要考虑的问题
分片模式需要考虑以下几点:
1) 分片机制是对其他形式的分区机制的一种补充,比如垂直分区和函数分区.举例来说,一个分片的数据可能是已经被垂直分区后的数据,多分片的数据可以作为函数分区的一种实现.这个理解起来可能比较抽象,了解垂直分区和函数分区,请点击:Data Partitioning Guidance
2) 确保操作各个分片的负载是均衡的,即对于各个分片的操作的I/O值都比较接近.由于数据是被插入和删除的,所以必须周期性地重新在各个分区之间均衡负载,确保负载在各个分区之间均衡分布,以此来减少某些分区的性能瓶颈.
3) 使用稳定的数据作为分片键值.分区的映射函数的key值必须是几乎不会变更的,否则会引起大批量的数据迁移,这样就会造成很大的工作量及性能问题,甚至可能导致问题.
4) 确保分区的键值是唯一的.不要使用自动生成的值作为key值,这样很可能会无法定位到正确的数据分区.
5) 设计一种满足所有查询情形的分区映射策略是不现实的.只要设计的分区映射策略能够满足常见的查询策略即可.
6) 分区机制尽量让查询针对一个数据分区进行.对单一的分区进行的数据查询的效率远远高于针对多个数据分区进行联合查询.
7) 针对多个数据分区进行查询时,建议采用并行任务的方式进行.两个并行运行的任务查询的结果最终进行一个Union得到最终的结果,尽管这种方式带来一定的复杂度,但是对于性能方面的提升还是很明显的.
8) 对于大多数应用来说,大量的小数据量分区的效率比少量的大数据量分区效率高,并且扩展性更好.
9) 确保每个数据分区需要获取的资源对于快速扩展是支持的.数据分区所依赖的资源的耦合性应该较低,这样对扩展的支持性好一些.
10) 考虑复制引用数据.(暂时不是特别理解)
Consider replicating reference data to all shards. If an operation that retrieves data from a shard also references static or slow-moving data as part of the same query, add this data to the shard. The application can then fetch all of the data for the query easily, without having to make an additional round trip to a separate data store.
11) 考虑不同的数据分区之间的数据一致性.
12) 考虑配置和管理大批量的数据分区的复杂度.
13) 数据分区可以根据地理位置进行划分,数据分区可以设置在距离用户比较近的地方.
7.何时使用该模式
1) 数据存储的性能要求高于单个服务器的性能.
2) 需要通过在数据存储内通过减少冲突来提升性能.
8.案例
下面的案例中,使用多个数据库模拟多个数据分区.Getshards方法返回一个包含ShardInformation的集合,ShardInformation包含了数据库连接字符串等信息.
private IEnumerable<ShardInformation> GetShards()
{
// This retrieves the connection information from a shard store
// (commonly a root database).
return new[]
{
new ShardInformation
{
Id = 1,
ConnectionString = ...
},
new ShardInformation
{
Id = 2,
ConnectionString = ...
}
};
}
下面的代码展示了application如何使用shardinformation集合的信息来并行查询数据.查询的结果存储在
ConcurrentBag这个对象中.
// Retrieve the shards as a ShardInformation[] instance.
var shards = GetShards();
var results = new ConcurrentBag<string>();
// Execute the query against each shard in the shard list.
// This list would typically be retrieved from configuration
// or from a root/master shard store.
Parallel.ForEach(shards, shard =>
{
// NOTE: Transient fault handling is not included,
// but should be incorporated when used in a real world application.
using (var con = new SqlConnection(shard.ConnectionString))
{
con.Open();
var cmd = new SqlCommand("SELECT ... FROM ...", con);
Trace.TraceInformation("Executing command against shard: {0}", shard.Id);
var reader = cmd.ExecuteReader();
// Read the results in to a thread-safe data structure.
while (reader.Read())
{
results.Add(reader.GetString(0));
}
}
});
Trace.TraceInformation("Fanout query complete - Record Count: {0}",
results.Count);
9.相关阅读
The following patterns and guidance may also be relevant when implementing this pattern:
- Data Consistency Primer. It may be necessary to maintain consistency for data distributed across different shards. The Data Consistency Primer summarizes the issues surrounding maintaining consistency over distributed data, and describes the benefits and tradeoffs of different consistency models.
- Data Partitioning Guidance. Sharding a data store can introduce a range of additional issues. The Data Partitioning Guidance describes these issues in relation to partitioning data stores in the cloud to improve scalability, reduce contention, and optimize performance.
- Index Table Pattern. Sometimes it is not possible to completely support queries just through the design of the shard key. The Index Table pattern enables an application to quickly retrieve data from a large data store by specifying a key other than the shard key.
- Materialized View Pattern. To maintain the performance of some query operations, it may be beneficial to create materialized views that aggregate and summarize data, especially if this summary data is based on information that is distributed across shards. The Materialized View pattern describes how to generate and populate these views.
- The article Shard Lessons on the Adding Simplicity blog.
- The page Database Sharding on the CodeFutures web site.
- The article Scalability Strategies Primer: Database Sharding on Max Indelicato's blog.
- The article Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes on Dare Obasanjo's blog.