How to implement MySQL Sharding

最新推荐文章于 2021-06-17 11:55:22 发布

macyang

最新推荐文章于 2021-06-17 11:55:22 发布

阅读量2.1k

点赞数

分类专栏： mysql 文章标签： sharding mysql database application table sql

本文链接：https://blog.csdn.net/macyang/article/details/7184415

版权

mysql 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

本文主要告诉你如何选择需要做sharding的表和如何选择sharding key。

Database Sharding has proven itself a very successful strategy for scaling relational databases. Almost every large web-site/SaaS solution uses sharding when writing to its relational database. The reason is pretty simple – relational database technology is showing its age and just can’t meet today’s requirements: a massive number of operations/second, a lot of open connections (since there are many application servers talking to the database), huge amounts of data, and a very high write ratio (anything over 10% is high when it comes to relational databases).

Many sites and blogs posts explain what sharding is, for example here and here. But how do you shard your application? Actually, the flow is quite simple, and consists of just four steps:

Analyze your schema to find the best sharding configuration.
Start multiple database instances.
Split the data between the databases according to the sharding configuration.
Update your application code to support your sharding configuration.

(Alas – it’s incredibly difficult to implement sharding.)

Required Data

To build a sharding configuration, you’ll need the following data:

Table list and size
Big tables are great candidates for sharding since usually many SQL commands are executed on them (otherwise they wouldn’t be so big).
List of foreign keys
Foreign keys can help you understand dependencies between tables. Tables that depend on one another must be divided in a smart enough way, otherwise, your data integrity will be lost.
SQL Query log (preferably one that was gathered after running a complete coverage test of the application).
Some SQL operations are incredibly difficult to implement in a sharded environment (we’ll talk about this in our third post). You need the SQL log to understand whether you can implement sharding on some tables. The SQL log can also give a good indication of which tables are accessed heavily and which are not.

If you need instructions on how to collect this information, you can read our documentation here.

Of course, this section contains an assumption: that the database is well defined and already contains data. This is the easiest course of action. If you have a new database, and you are not sure yet how many rows each table will contain or what the SQL query log will show, you’ll just have to make some kind of educated guess.

Table Policies

Before you begin to implement sharding, it’s important to understand that not every table in the schema will be sharded. Since sharding limits your SQL capabilities (no join between sharded tables, uniqueness, auto-increment columns, etc.), you will enforce limits on your application that will be very difficult to overcome in code.

Usually, some tables will just be replicated across all the shards. As a matter of fact, most tables will be replicated (in ScaleBase, for the sake of discussion, we call these tables Global tables), and only some tables will be sharded. You can read more about table types here.

Deciding which Tables to Shard

The algorithm for choosing which tables to shard is not a very complex one:

Look at the biggest tables in your schema.
There is no exact number of tables to look at. Most schemas have several big tables, and the rest are very small. Table sizes are not divided evenly.
Look at the SQL log – are there joins between those tables?
1. If there are, take the smaller table, and make it a global table.
2. If not, those tables will serve as your shard environment.
3. Look at the SQL log
  1. If the tables are not accessed frequently, make them global tables.
  2. Look at the most accessed tables (especially those with many write commands) and mark them as sharded. Go to step 2 to make sure they can be sharded.

Choose the sharding key

Once you have decided which tables should be sharded (all the rest should be global tables), the choice of sharding keys is rather straightforward, as most will use the table primary key as the shard key. Of course, if multiple tables are sharded, and there is a foreign key relationship between these tables, then the foreign key will serve as the shard key for some tables.

Many people attempt to shard based on customer_id or a resource id, but I have seen how this usually fails in production environments. It is very hard to know in advance which customers belong together in the same database, and since customers can suddenly increase their traffic, this might create an unbalanced situation in which some shards are very busy while others are relaxed (see the details of last year’s FourSquare outage for some possible results of unbalanced sharding).

As with database partitioning, there are multiple algorithms available for sharding: hash , list, or range.

Range partitioning

Selects a partition by determining if the partitioning key is inside a certain range. An example could be a partition for all rows where the column zipcode has a value between 70000 and 79999.

List partitioning

A partition is assigned a list of values. If the partitioning key has one of these values, the partition is chosen. For example all rows where the column Country is either Iceland, Norway, Sweden, Finland or Denmark could build a partition for the Nordic countries.

Hash partitioning

The value of a hash function determines membership in a partition. Assuming there are four partitions, the hash function could return a value from 0 to 3.

Usually you’ll use list and range for multi-tenancy – saving customer information across different databases and maybe even different data centers. I’ll touch on that subject in a future post. But hash will probably give you the best results when it comes to sharding, as statistically it ensures that data is evenly distributed across all shards.

Ref: http://www.scalebase.com/how-to-implement-mysql-sharding/

http://www.scalebase.com/how-to-implement-mysql-sharding-%E2%80%93-part-2/