Under the hood: MySQL Pool Scanner (MPS)

最新推荐文章于 2024-04-29 14:43:36 发布

西门仙忍

最新推荐文章于 2024-04-29 14:43:36 发布

阅读量564

点赞数

分类专栏：数据库架构

数据库架构专栏收录该内容

17 篇文章 1 订阅

订阅专栏

Under the hood: MySQL Pool Scanner (MPS)

由 Shlomo Priymak 于 2013年10月22日上午 9:58 发布

Facebook has one of the largest MySQL database clusters in the world. This cluster comprises many thousands of servers across multiple data centers on two continents.

Operating a cluster of this size with a small team is achieved by automating nearly everything a conventional MySQL Database Administrator (DBA) might do so that the cluster can almost run itself. One of the core components of this automation is a system we call MPS, short for “MySQL Pool Scanner.”

MPS is a sophisticated state machine written mostly in Python. It replaces a DBA for many routine tasks and enables us to perform maintenance operations in bulk with little or no human intervention.

A closer look at a single database node
Every one of the thousands of database servers we have can house a certain number of MySQL instances. An instance is a separate MySQL process, listening on a separate port with its own data set. For simplicity, we'll assume exactly two instances per server in our diagrams and examples.

The entire data set is split into thousands of shards, and every instance holds a group of such shards, each in its own database schema. A Facebook user's profile is assigned to a shard when the profile is created and every shard holds data relating to thousands of users.

It’s more easily explained by a diagram of a single database server:

Every instance has a few copies of its data on other instances that are hosted on different servers, and usually in a different data center. This is done to achieve two main goals:

1. High Availability – If a server goes down, we have the data available elsewhere, ready to be served.
2. Performance – Different geographical regions have their own replicas so that reads are served locally.

The way we achieve this is through simple MySQL master/slave replication. Every instance is part of a replica set. A replica set has a master and multiple slaves. All writes to a replica set must occur on the master. The slaves subscribe to a stream of these write events, and the events are replayed on them as soon as they arrive. Since the master and the slaves have nearly identical data, a read can occur on any one of the instances in the replica set.

Here is a diagram of a simple replica set, where each server hosts only one instance, and the other instance is empty (we call these spares):

A server is essentially a container for instances, so in reality things can get much more complicated.
For example, a single server hosting a master instance may also be hosting a slave instance for a different master, like so:

There are two important "building block" operations MPS relies on:

1. Creating a copy/placing a server

The first building block is an operation that creates a copy of an instance on a different host. We use a modified version of Xtrabackup to perform most copy operations. A replacement is the same operation if we remove an instance after a copy successfully completes.

First, the system allocates a new spare instance for the operation. We choose one of the slaves or the master and copy its data to the freshly allocated spare instance. This diagram shows a replacement operation, where an instance is removed when the copy is complete:

2. Promoting a master instance

The second building block is the action of promoting a different instance to be the master in a replica set.

During a promotion, we select a target for the promotion, stop writes to the replica set, change the slaves to replicate from the new master, and resume writes. In the diagram, we show a deletion operation in which the old master is discarded after the promotion is completed successfully. For simplicity, the replica set below consists of only three instances:

These two operations (which are usually complex procedures for most companies running MySQL) have been completely automated to a point where MPS can run them hundreds or thousands of times a day in a fast and safe manner, without any human intervention.

Host management and states
Now that we've got the basics out of the way, we can dive into more abstract concepts that utilize these building blocks.

MPS works with a repository that holds the current state and metadata for all our database hosts, and current and past MPS copy operations. This registry is managed by the database servers themselves so that it scales with the database cluster and MPS doesn't need a complex application server installation. MPS itself is in fact stateless, running on its own pool of hosts and relying on the repository for state management. States are processed separately and in parallel.

When a server "wakes up" in a datacenter (for example, a fresh rack has been connected and provisioned), it will start running a local agent every few minutes. This agent performs the following steps:

Collect data about itself. (Where am I? What hardware do I have? What software versions am I running?)
Triage the host for problems. (Did I wake up in an active cluster? Are my disks OK? Are my flash cards healthy?)
Make sure the server is registered and contains up-to-date metadata in the central repository.
On the first run, place instances on the server in an initial "reimage" state if there is no current record of this server. This is where new servers start their lives in MPS.
So every few minutes, every healthy server "checks in" to this central repository and updates how it’s doing, allowing things like data use and system health to be kept in sync.

The smallest unit MPS manages at the moment is an instance. Each instance can be in various states. The important states are as follows:

Production: Instance is serving production traffic.
Spare: Instance is ready to be copied to or allocated to some other task.
Spare allocated: Instance has been chosen as the target for a copy, and a copy is in progress.
Spare deallocated: Temporary triaging state. Instance has been removed from production and is pending triaging and cleanup. No instances stay here for more than a few minutes.
Drained: The instance is not being used, and is being reserved for testing, data center maintenance, etc. An operator intervention is required to take a host out of this state.
Reimage: Servers with all instances in this state are being reimaged or are in the process of being repaired. Servers in this state are handed off and managed by a co-system called Windex, which was discussed in a previous post.

An instance may move between states due to MPS executing an action or an operator intervention. This state diagram shows the main states and the actions that may cause an instance to move between those states.

The diagram above describes only a small subset of possible paths an instance may take in MPS. The state changes described here are the ones that result from simple copy and maintenance operations. There are many other reasons for instances to change states, and hardcoding all the options and checks would create software that is difficult and complex to maintain. Meet "problems," another fundamental concept in MPS.

A "problem" is a property that is attached to an instance. If all instances on a host have this problem, we consider it to be attached to the server itself. Another way to think of problems is like tags. MPS consults a decision matrix that helps it make decisions about instances with a specific problem. It is basically a map between tuples: (state, problem) - (action, state).

It is easier to understand with some examples:

(production, low-space) - (replace, spare deallocated): Replace an instance in production with limited space, moving it to a different server.
(spare de-allocated, old-kernel) - (move, reimage): If an instance happened to move through this state, it has no production data on it, so why not reimage it?
(production, master-in-fallback-location) - (promote, production): We should promote this master instance to the correct location, and leave the instance in the production state.

The various states and "problems" in MPS allow us to create a flexible and maintainable infrastructure to manage a server's life cycle.

Examples of common failure resolution and maintenance operations
In a large data center, there are tens or hundreds of server failures a day. Here are a few examples of common day-to-day failures that MPS takes care of without human intervention:

Broken slave instances are detected and disabled until they are replaced in the background.
Broken master instances are demoted so that healthy replicas take the place of their fallen brethren and get replaced in the background.
Instances on servers that might run out of space due to growth are moved to underutilized servers.

With thousands of servers, site-wide maintenance tasks like upgrading to a new kernel, changing partition sizes, or upgrading firmware on controllers become very complex. The same goes for localized operations such as moving some racks or allocating test servers for our engineering teams. Here are some common maintenance operations an operator can ask MPS to perform with a single command:

Drain any number of database racks for maintenance and take them out of production. Most such operations complete in less than 24 hours.
Re-image thousands of machines (to perform kernel upgrades, for example) at a specified concurrency. MPS will replace each machine and then send it to Windex.
Allocate any number of spares to be used for a new project or testing. Want 200 servers to run tests? No problem.
Create a copy of the entire Facebook data set at a new data center at a specified concurrency--building out our new Lulea data center, for example!

Automating away the mundane tasks with MPS allow us to better scale the number of servers we manage, and frees up the MySQL Operations team to work on more exciting challenges.

Shlomo Priymak is a MySQL database engineer on the MySQL operations team.

赞 · 评论 · 分享

热门评论

Serge-Emmanuel Eddy M'Tenon Yavo 、 Kaiypov Emil 、 Halimi Abd Hadi 和其他 505 人赞了
141次分享
Dan Alvizu Have you ever had an instance use so much space, moving it to different server was not an option? Do you break shards at that point? Or can you keep adding disk space to it?

5 · 2013年10月22日上午 10:21
- 1 条回复
Kshitij S Sakharkar Wow

2 · 2013年10月22日上午 10:04
Michel Miotto Barbosa https://blogs.oracle.com/jag/resource/Fallacies.html

1 · 2013年11月7日下午 4:50
Mohammad Khaje thanks.

2013年10月22日下午 3:57
Aaron Rice awesome

1 · 2013年10月22日上午 10:27
Anna Marie Love

1 · 2013年10月22日上午 10:26
Darren Doucet Brilliant & FB should be commended for such openness! I can see potentially why some "FB dark areas in the globe" exist (i.e. they may not like such openness).

As I've said, I believe FB is one of the greatest events in human history; and I have some optimism by the key leaders in charge. ...查看更多

1 · 2013年10月22日下午 10:07 · 已编辑
Cindy Taylor Please help me with the basics-accessing info from one site and transferring it. Basically where can I go to to computers for Dummies?? I know how to access facebook and e-mail,that's about it. Can anyone point be to a beginners section on my HP8?? Thank you.

2014年2月5日上午 9:26
Allen Shatzer Does facebook run its MySQL instances on physical servers or VMs? If Physical servers, what type of servers are you using?

2013年11月8日上午 10:44
Vidya Poulose How Facebook manages its MySQL DB

2013年10月24日下午 11:06
Niyigena Sadi amezing

2013年10月23日上午 11:14
Abhishek Prabhune No doubt this is a really good architecture but replication of a db-of a magnitude as FBs is still going to be an issue -i.e. while trying to do it real time- resulting in dirty reads..Which might create issues like if I log in from CA I would see a di...查看更多

2013年10月22日下午 8:32
- 3条回复
Vaidik Kapoor Wow!

2013年10月22日下午 7:48
邱國光 cool. there must be a very high bandwidth of network connecting these mysql database server nodes to support massive volume of data replications.

2013年10月22日下午 6:21 · 已编辑
Joselito Natiola Amazing

2013年10月22日下午 3:10
Kyle Dostoevski slicing

2013年10月22日下午 12:45
Christian Wessels How many state changes are handled by MPS about per minute/hour/day? I'd like to imagine how many DBAs would be necessary to accomplish that workload.

2013年10月22日下午 12:20
Bart Bart Thanks for that very nice and informative writeup.

2013年10月22日上午 11:54
Nitish Lokhande worth reading..!

2013年10月22日上午 11:52
Bill Taylor Absolutely awesome. The automation, redundancy and scalability is mind blowing. Best of all they use MySQL Great post.

2013年10月22日上午 11:27
Matías Solina amazing!

2013年10月22日上午 10:46
Michael Slimshady Kuuull

2013年10月22日上午 10:30
Mike Aro Python… So. Sweet.

2013年10月22日上午 10:30
Paulo Bu This is so cool!

2013年10月22日上午 10:26
Akash Mehta Python with mysql.... will lead u in this world..

2013年10月22日上午 10:21
Niraj Oza Awesome...

2013年10月22日上午 10:16
Kaiypov Emil ANTI-MILITARY "SPAM" Dear Facebook Engineering Teamhttps://www.facebook.com/Engineering! Please support Global truce - Pause all wars anti-war initiative! Best wishes from Bishkek, Kyrgyzstan!!! Launch of the First Anti-military satellite. ...

西门仙忍

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Under the hood: MySQL Pool Scanner (MPS)

Under the hood: MySQL Pool Scanner (MPS)由 Shlomo Priymak 于 2013年10月22日上午 9:58 发布Facebook has one of the largest MySQL database clusters in the world. This cluster comprises many thousands of
复制链接

扫一扫