How to implement MySQL Sharding

本文主要告诉你如何选择需要做sharding的表和如何选择sharding key。

Database Sharding has proven itself a very successful strategy for scaling relational databases. Almost every large web-site/SaaS solution uses sharding when writing to its relational database. The reason is pretty simple – relational database technology is showing its age and just can’t meet today’s requirements: a massive number of operations/second, a lot of open connections (since there are many application servers talking to the database), huge amounts of data, and a very high write ratio (anything over 10% is high when it comes to relational databases).

Many sites and blogs posts explain what sharding is, for example here and here. But how do you shard your application? Actually, the flow is quite simple, and consists of just four steps:

  1. Analyze your schema to find the best sharding configuration.
  2. Start multiple database instances.
  3. Split the data between the databases according to the sharding configuration.
  4. Update your application code to support your sharding configuration.

(Alas – it’s incredibly difficult to implement sharding.)

Required Data

To build a sharding configuration, you’ll need the following data:

  1. Table list and size
    Big tables are great candidates for sharding since usually many SQL commands are executed on them (otherwise they wouldn’t be so big).
  2. List of foreign keys
    Foreign keys can help you understand dependencies between tables. Tables that depend on one another must be divided in a smart enough way, otherwise, your data integrity will be lost.
  3. SQL Query log (preferably one that was gathered after running a complete coverage test of the application).
    Some SQL operations are incredibly difficult to implement in a sharded environment (we’ll talk about this in our third post). You need the SQL log to understand whether you can implement sharding on some tables. The SQL log can also give a good indication of which tables are accessed heavily and which are not.

If you need instructions on how to collect this information, you can read our documentation here.

Of course, this section contains an assumption: that the database is well defined and already contains data. This is the easiest course of action. If you have a new database, and you are not sure yet how many rows each table will contain or what the SQL query log will show, you’ll just have to make some kind of educated guess.

Table Policies

Before you begin to implement sharding, it’s important to understand that not every table in the schema will be sharded. Since sharding limits your SQL capabilities (no join between sharded tables, uniqueness, auto-increment columns, etc.), you will enforce limits on your application that will be very difficult to overcome in code.

Usually, some tables will just be replicated across all the shards. As a matter of fact, most tables will be replicated (in ScaleBase, for the sake of discussion, we call these tables Global tables), and only some tables will be sharded. You can read more about table types here.

Deciding which Tables to Shard

The algorithm for choosing which tables to shard is not a very complex one:

  1. Look at the biggest tables in your schema.
    There is no exact number of tables to look at. Most schemas have several big tables, and the rest are very small. Table sizes are not divided evenly.
  2. Look at the SQL log – are there joins between those tables?
    1. If there are, take the smaller table, and make it a global table.
    2. If not, those tables will serve as your shard environment.
    3. Look at the SQL log
      1. If the tables are not accessed frequently, make them global tables.
      2. Look at the most accessed tables (especially those with many write commands) and mark them as sharded. Go to step 2 to make sure they can be sharded.

Choose the sharding key

Once you have decided which tables should be sharded (all the rest should be global tables), the choice of sharding keys is rather straightforward, as most will use the table primary key as the shard key. Of course, if multiple tables are sharded, and there is a foreign key relationship between these tables, then the foreign key will serve as the shard key for some tables.

Many people attempt to shard based on customer_id or a resource id, but I have seen how this usually fails in production environments. It is very hard to know in advance which customers belong together in the same database, and since customers can suddenly increase their traffic, this might create an unbalanced situation in which some shards are very busy while others are relaxed (see the details of last year’s FourSquare outage for some possible results of unbalanced sharding).

As with database partitioning, there are multiple algorithms available for sharding: hash , list, or range. 

Range partitioning 
Selects a partition by determining if the partitioning key is inside a certain range. An example could be a partition for all  rows where the  column  zipcode has a value between  70000 and  79999.
List partitioning 
A partition is assigned a list of values. If the partitioning key has one of these values, the partition is chosen. For example all rows where the column  Country is either  IcelandNorway, SwedenFinland or  Denmark could build a partition for the  Nordic countries.
Hash partitioning 
The value of a  hash function determines membership in a partition. Assuming there are four partitions, the hash function could return a value from 0 to 3.

Usually you’ll use list and range for multi-tenancy – saving customer information across different databases and maybe even different data centers. I’ll touch on that subject in a future post. But hash will probably give you the best results when it comes to sharding, as statistically it ensures that data is evenly distributed across all shards.


Ref:  http://www.scalebase.com/how-to-implement-mysql-sharding/

http://www.scalebase.com/how-to-implement-mysql-sharding-%E2%80%93-part-2/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值