Implementing High Availability in MySQL

最新推荐文章于 2023-09-16 02:34:34 发布

unixboy_xujf

最新推荐文章于 2023-09-16 02:34:34 发布

阅读量124

点赞数

Mysql 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

MySQL provides a built-in data replication functionality for maintaining identical copies of its data to one or more backend servers, thus providing a simple High Availability mechanism. On the other hand, the Open Source community has several projects to implement failover techniques, being one of them Heartbeat.

This article will show you how to implement a clustered, highly available and inexpensive solution based on GNU/Linux and combining MySQL as the database engine and Heartbeat as the failover mechanism. The configuration will consist of a 2-node active/passive cluster.

I assume you have MySQL up and running on both nodes and that your are working with MySQL 4.0.13 or above. If not, please refer to MySQL manual here and download a recent copy here .

How replication works in MySQL

Replication in MySQL is very simple: one machine acts as the master server and one or more machines act as the backup servers (the replica servers). The master server keeps all changes made to its databases in binary log files, so the backup server(s) can read these files and apply the changes to its own copy of the data.

In more detail, the binary log file records all the changes (UPDATE, DELETE, INSERT...) made to the master''s databases since the first time the replication was configured and started. The master also creates and maintains an index file to keep track of the binary logs created. Upon connecting, the slave server(s) obtains new updates from the binary log and aplies them to its copy of the data.

Note : As MySQL suggests, visit their website often to check the latest changes and improvements to its database replication implementation.

How Heartbeat works

Heartbeat is a piece of software that provides High Availability features such as monitoring the availability of the machines in the cluster, transferring the virtual IPs (more on this later) in case of failures and starting and stopping services.

The Heartbeat software running on the slave server periodically checks the health of the master server by listening to its heartbeats sent via null modem cable and/or a crossover ethernet cable. Note that in the best scenario slave''s main task is nothing but to monitor the health of its master. In case of a crash the slave will not receive the heartbeats from the master and then it will take over the virtual IPs and the services offered by the master.

The overall picture

Next figure shows the picture of our cluster.

As previously stated, our configuration will consist of a 2-node active/passive cluster: dbserv1, the master server and dbserv2, the slave server. Both machines are linked via serial COM port /dev/ttyS0 (null modem cable) and a crossover ethernet cable (eth0), through which they send its heartbeats to each other.

The 192.168.1.103 IP address at eth1:0 is the floating IP address, the virtual IP. This is the service IP where the master listens to and that will be transferred to the slave in case of a failure in the master. Requests from the application servers will be made through the virtual IP.

Both servers have another IP address that can be used to administer the machines: 192.168.1.101 and 192.168.1.102. Bear in mind that the virtual IP (192.168.1.103) is set up by Heartbeat, meaning that if it is not up and running in the active server there will be no access to the virtual service.

Setting up replication

Create a replication user on the master:

     mysql -u root -p

At MySQL prompt type:

     GRANT REPLICATION SLAVE ON *.* TO replica@"%" IDENTIFIED BY ''replica_passwd'';

Stop MySQL on both the master server and the slave server. Take a snapshot of your databases from the master.

     /etc/init.d/mysql stop
     tar cvzf mysqldb.tgz /path/to/your/databases

In my configuration I would...

     /etc/init.d/mysql stop
     tar cvzf mysqldb.tgz /var/mysql-data/*

Copy the data to the slave
```
     scp /path/to/mysqldb.tgz admin@dbserv2:/path/to/your/databases
```
If you are using InnoDB tables, copy your tablespace file(s) and associated log files to the slave. In my case, the tablespace is called ibdata and the log files are those ib_* . So:
```
     scp /var/mysql-data/ibdata admin@dbserv2:/var/mysql-data
     scp /var/log/mysql/ib_* admin@dbserv2:/var/log/mysql
```

Activate the binary log and assign a unique ID to the master:

     vi /etc/my.cnf

Then add/change the following

     [mysqld]
     .....
     # Enable binary logs. Path to bin log is optional
     log-bin=/var/log/mysql/dbserv1
     # If the binary log exceeds 10M, rotate the logs
     max_binlog_size=10M
     # Set master server ID
     server-id=1
     .....

Now you can start mysqld on the master. Watch the logs to see if there are problems.

     /etc/init.d/mysql start

     vi /etc/my.cnf

Then add/change the following:

     server-id=2
     # This is eth0. Take a look at figure 1
     master-host=192.168.100.1
     master-user=replica
     master-password=replica_passwd
     # Port that master server is listening to
     master-port=3306
     # Number of seconds before retrying to connect to master. Defaults to 60 secs
     #master-connect-retry

Uncompress the databases
```
     cd /path/to/your/databases
     tar xvzf mysqldb.tgz

     chown -R mysql.mysql /path/to/your/databases
```
Make sure your tablespace file(s) and associated files are in place (/path/to/your/databases in our example).
Start mysqld on the slave. Watch the logs to see if there are problems.
```
     /etc/init.d/mysql start
```

Check if replication is working. For example, log in on the master, create a database and see if it is replicated on the slave:

     mysql -u root -p

     create database replica_test;
     show databases;

     +----------------+
     | Database       |
     +----------------+
     | replica_test   |
     | mysql          |
     | test           |
     | tmp            |
     +----------------+

     mysql -u root -p
     show databases;

     +----------------+
     | Database       |
     +----------------+
     | replica_test   |
     | mysql          |
     | test           |
     | tmp            |
     +----------------+

If you have problems, please refer to MySQL manual here .

Installing and setting up Heartbeat

Download a recent copy of Heartbeat from here and then as usual....

     configure
     make
     make install

or:

     rpm -Uhv heartbeat-1.0.4-1.i386.rpm

if you downloaded the RPM based package.

Configuring heartbeat

There are three files involved in the configuration of heartbeat:

ha.cf: the main configuration file that describes the machines involved and how they behave.
haresources: this configuration file specifies virtual IP (VIP) and services handled by heartbeat.
authkeys: specifies authentication keys for the servers.

Sample /etc/ha.d/ha.cf

     # Time between heartbeats in seconds
     keepalive 1
     # Node is pronounced dead after 15 seconds
     deadtime 15
     # Prevents the master node from re-acquiring cluster resources after a failover
     nice_failback on
     # Device for serial heartbeat
     serial  /dev/ttyS0
     # Speed at which to run the serial line (bps)
     baud    19200
     # Port for udp (default)
     udpport 694
     # Use a udp heartbeat over the eth0 interface
     udp     eth0

     debugfile /var/log/ha/ha.debug
     logfile /var/log/ha/ha.log

     # First node of the cluster (must be uname -a)
     node    dbserv1
     # Second node of the cluster (must be uname -a)
     node    dbserv2

Sample /etc/ha.d/haresources

     dbserv1 Ipaddress::192.168.1.103::eth1

This tells Heartbeat to set up 192.168.1.103 as the virtual IP (VIP). See figure above.

Sample /etc/ha.d/authkeys

     auth 1
     1 crc
     2 sha1 HI!
     3 md5 Hello!

This file determines the authentication keys. Must be mode 600. As I assume that our network is relatively secure I configure crc as the authentication method. There is also md5 and sha1 available.

Now start heartbeat on dbserv1 and the on dbserv2, watch the logs, then stop heartbeat on the first node and see what happens on the second node. Start again heartbeat on the first node and stop it on the second and see the logs. If all is okay, you have a 2-node cluster up and running.

What we have

At this point we have a 2-node cluster with certain degree of availability and fault tolerance. Despite this could be a valid solution for non-critical environments, in really critical environments this configuration should be improved.

Advantages

The cluster is fault tolerant
The cluster is relatively secure
There is no single point of failure (comments?)
Automatic fail over mechanism
Proven and solid OpenSource software for production environment (my experience)
Simple and easy to install and configure
Easy to administer
Inexpensive

Disadvantages

Our cluster presents almost one serious problem in critical environments (i.e. 99,99% availability). As you know, when the master node fails, the standby node takes over the service and the virtual IP address. In this scenario, when the master comes back online again, it will act as the stand-by node (remember nice_failback on from /etc/ha.d/ha.cf ?). As our configuration has not implemented a two-way replication mechanism, the actual master is not generating binary logs and the actual slave is not configured to act as such. There are means to avoid this disadvantage, but this is your homework ;-). Let me know your progress.

As usual, comments are very welcome.

References

原文出处：http://www.karkomaonline.com/article.php/2004012416185184/print

-------------------

Some of the useful command :

1. show mysql process:

mysql> SHOW PROCESSLIST\G
*************************** 1. row ***************************
     Id: 2
   User: root
   Host: localhost:32931
     db: NULL
Command: Binlog Dump
   Time: 94
  State: Has sent all binlog to slave; waiting for binlog to
         be updated
   Info: NULL

And on the slave :

mysql> SHOW PROCESSLIST\G
*************************** 1. row ***************************
     Id: 10
   User: system user
   Host:
     db: NULL
Command: Connect
   Time: 11
  State: Waiting for master to send event
   Info: NULL
*************************** 2. row ***************************
     Id: 11
   User: system user
   Host:
     db: NULL
Command: Connect
   Time: 11
  State: Has read all relay log; waiting for the slave I/O
         thread to update it
   Info: NULL

2. Show status:

mysql> SHOW MASTER STATUS;
mysql> SHOW SLAVE STATUS;

3. STATUS CODES:

6.3.1. Replication Master Thread States

The following list shows the most common states you may see in the State column for the master's Binlog Dump thread. If you don't see any Binlog Dump threads on a master server, this means that replication is not running — that is, that no slaves are currently connected.

Sending binlog event to slave

Binary logs consist of events, where an event is usually an update plus some other information. The thread has read an event from the binary log and is now sending it to the slave.

Finished reading one binlog; switching to next binlog

The thread has finished reading a binary log file and is opening the next one to send to the slave.

Has sent all binlog to slave; waiting for binlog to be updated

The thread has read all outstanding updates from the binary logs and sent them to the slave. The thread is now idle, waiting for new events to appear in the binary log resulting from new updates occurring on the master.

Waiting to finalize termination

A very brief state that occurs as the thread is stopping.

6.3.2. Replication Slave I/O Thread States

The following list shows the most common states you see in the State column for a slave server I/O thread. Beginning with MySQL 4.1.1, this state also appears in the Slave_IO_State column displayed by the SHOW SLAVE STATUS statement. This means that you can get a good view of what is happening by using only SHOW SLAVE STATUS.

Connecting to master

The thread is attempting to connect to the master.

Checking master version

A state that occurs very briefly, immediately after the connection to the master is established.

Registering slave on master

A state that occurs very briefly immediately after the connection to the master is established.

Requesting binlog dump

A state that occurs very briefly, immediately after the connection to the master is established. The thread sends to the master a request for the contents of its binary logs, starting from the requested binary log filename and position.

Waiting to reconnect after a failed binlog dump request

If the binary log dump request failed (due to disconnection), the thread goes into this state while it sleeps, then tries to reconnect periodically. The interval between retries can be specified using the --master-connect-retry option.

Reconnecting after a failed binlog dump request

The thread is trying to reconnect to the master.

Waiting for master to send event

The thread has connected to the master and is waiting for binary log events to arrive. This can last for a long time if the master is idle. If the wait lasts for slave_read_timeout seconds, a timeout occurs. At that point, the thread considers the connection to be broken and make an attempt to reconnect.

Queueing master event to the relay log

The thread has read an event and is copying it to the relay log so that the SQL thread can process it.

Waiting to reconnect after a failed master event read

An error occurred while reading (due to disconnection). The thread is sleeping for master-connect-retry seconds before attempting to reconnect.

Reconnecting after a failed master event read

The thread is trying to reconnect to the master. When connection is established again, the state becomes Waiting for master to send event.

Waiting for the slave SQL thread to free enough relay log space

You are using a non-zero relay_log_space_limit value, and the relay logs have grown until their combined size exceeds this value. The I/O thread is waiting until the SQL thread frees enough space by processing relay log contents so that it can delete some relay log files.

Waiting for slave mutex on exit

A state that occurs briefly as the thread is stopping.

6.3.3. Replication Slave SQL Thread States

The following list shows the most common states you may see in the State column for a slave server SQL thread:

Reading event from the relay log

The thread has read an event from the relay log so that the event can be processed.

Has read all relay log; waiting for the slave I/O thread to update it

The thread has processed all events in the relay log files, and is now waiting for the I/O thread to write new events to the relay log.

Waiting for slave mutex on exit

A very brief state that occurs as the thread is stopping.

The State column for the I/O thread may also show the text of a statement. This indicates that the thread has read an event from the relay log, extracted the statement from it, and is executing it.

5. Replication user account is restore in my.ini(.cnf) and mysql/data/master.info file

Delete master.info file is repliation account was changed in my.ini