Linux MySQL GTID MHA 搭建
环境准备
准备四台机器cluster_manager:83.47、node1:83.48、node2:83.45、node3:83.50
其中三台机器已安装mysql
修改主机名
vi /etc/sysconfig/network ,HOSTNAME=node1,node2,node3,cluster_manager
vi /etc/hosts
192.168.83.47 cluster_manager
192.168.83.48 cluster_node1
192.168.83.45 cluster_node2
192.168.83.50 cluster_node3
192.168.83.49 cluster_vip
重启reboot使其生效
部署MHA
在所有节点(node和manger)安装MHA Node
- 下载rpm包:
curl http://www.mysql.gr.jp/frame/modules/bwiki/index.php\?plugin\=attach\&pcmd\=open\&file\=mha4mysql-node-0.56-0.el6.noarch.rpm\&refer\=matsunobu -o mha4mysql-nod
- 安装依赖:
yum install perl-DBD-MySQL -y
- 安装mha-node包
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
在cluster_manager节点安装MHA Manager
- 下载rpm包:
curl http://www.mysql.gr.jp/frame/modules/bwiki/index.php\?plugin\=attach\&pcmd\=open\&file\=mha4mysql-manager-0.56-0.el6.noarch.rpm\&refer\=matsunobu -o mha4mysql-manager-0.56-0.el6.noarch.rpm
- 安装依赖:
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
- 安装mha-manager包
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
SSH密钥登录方式设置
MHA Manager内部使用SSH连接到各个MySQL服务器,最新从库节点上的MHA Node也需要使用SSH (scp)把relay log文件发给各个从库节点,故需要各台服务器见需要配置SSH密钥登录方式。
manager和node节点都要执行
ssh-keygen -t rsa (不需设置密码)
ssh-copy-id -i /root/.ssh/id_rsa.pub root@ip (需要将id_rsa.pub发送给其他各个节点)
配置MySQL GTID 复制
- 修改nod MySQL 配置文件
如下:
[mysqld]
#GTID:
gtid_mode=on
enforce_gtid_consistency=on
server_id=3
#binlog
log-bin=slave-binlog
log-slave-updates=1
binlog_format=row #强烈建议,其他格式可能造成数据不一致
log_bin_trust_function_creators = 1
#relay log
skip_slave_start=1
注意:不同节点的server_id不可以相同
- 重启mysql 服务
service mysqld restart
- 在每个node节点mysql中创建用于复制的用户
grant replication slave on *.* to 'repl'@'%' identified by 'Dingjia123!' with grant option;
grant all privileges on *.* to 'root'@'%' identified by 'Dingjia123!' with grant option;
flush privileges;
- 在两个slave node节点上执行change 语句,再启动slave.
change master to master_host='192.168.83.45',master_user='repl',master_password='Dingjia123!',master_port=3306,master_auto_position=1;
start slave;
show slave status\G;
- 复制配置完成
配置MHA
创建MHA的工作目录,并且创建相关配置文件
mkdir -p /usr/local/mha
mkdir -p /etc/mha
cat /etc/mha/mha.conf
[server default]
# mysql user and password
user=root
password=Dingjia123!
# ssh login user
ssh_user=root
# manager directory on the manager
manager_workdir=/usr/local/mha
manager_log=/usr/local/mha/manager.log
#master_ip_failover_script= /usr/local/bin/master_ip_failover
#working directory on mysql servers
remote_workdir=/usr/local/mha
repl_user=repl
repl_password=Dingjia123!
ping_interval=3
[server1]
hostname=192.168.83.48
port=3306
candidate_master=1
check_repl_delay=0
[server2]
hostname=192.168.83.45
port=3306
#candidate_master=1
#check_repl_delay=0
[server3]
hostname=192.168.83.50
port=3306
测试节点间SSH登录
masterha_check_ssh --conf=/etc/mha/mha.conf
- 反馈信息显示All SSH connection tests passed successfully.才是SSH登录配置正确,否则需要根据错误信息需要配置。
检查复制配置
- 使用masterha_check_repl检查当前的MySQL集群拓扑结构是否符合MHA的要求。
masterha_check_repl --conf=/etc/mha/mha.conf
- 注意最后一句为MySQL Replication Health is OK.,如果是MySQL Replication Health is NOT OK!,则需要根据反馈的错误信息修改配置。
- -
启动MHA Manager监控
nohup masterha_manager --conf=/etc/mha/mha.conf &
检查MHA Manager状态
- 通过master_check_status脚本查看Manager的状态:
masterha_check_status --conf=/etc/mha/mha.conf
关闭MHA Manage监控
- 关闭很简单,使用masterha_stop命令完成。
masterha_stop --conf=/etc/mha/mha.conf
配置VIP
- vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式(即不需要keepalived或者heartbeat类似的软件)。
keepalived方式管理虚拟ip
keepalived配置方法如下:
- 下载软件进行并进行安装(两台master,准确的说一台是master,另外一台是备选master,在没有切换以前是slave):
tar xf keepalived-1.2.12.tar.gz
cd keepalived-1.2.12
./configure --prefix=/usr/local/keepalived
可能报错,安装相应包即可
yum install gcc openssl openssl-devel
make && make install
cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/
cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
mkdir /etc/keepalived
cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
cp /usr/local/keepalived/sbin/keepalived /usr/sbin/
- 配置keepalived的配置文件,在master上配置(192.168.83.45)
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id MySQL-HA
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.83.42
}
}
- 其中router_id MySQL HA表示设定keepalived组的名称,将192.168.0.88这个虚拟ip绑定到该主机的eth1网卡上,并且设置了状态为backup模式,将keepalived的模式设置为非抢占模式(nopreempt),priority 150表示设置的优先级为150。下面的配置略有不同,但是都是一个意思。
- 在候选master上配置(192.168.83.48)
cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id MySQL-HA
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.83.42
}
}
- 启动keepalived服务,在master上启动并查看日志
/etc/init.d/keepalived start
tail -f /var/log/messages
发现已经将虚拟ip 192.168.83.42绑定了网卡eth0上。
- 查看绑定情况
ip addr | grep eth1
- 在另外一台服务器,候选master上启动keepalived服务
注意:
上面两台服务器的keepalived都设置为了BACKUP模式,在keepalived中2种模式,分别是master->backup模式和backup->backup模式。这两种模式有很大区别。在master->backup模式下,一旦主库宕机,虚拟ip会自动漂移到从库,当主库修复后,keepalived启动后,还会把虚拟ip抢占过来,即使设置了非抢占模式(nopreempt)抢占ip的动作也会发生。在backup->backup模式下,当主库宕机后虚拟ip会自动漂移到从库上,当原主库恢复和keepalived服务启动后,并不会抢占新主的虚拟ip,即使是优先级高于从库的优先级别,也不会发生抢占。为了减少ip漂移次数,通常是把修复好的主库当做新的备库。
MHA引入keepalived(MySQL服务进程挂掉时通过MHA 停止keepalived):
要想把keepalived服务引入MHA,我们只需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。
编辑脚本/usr/local/bin/master_ip_failover,修改后如下
在MHA Manager修改脚本修改后的内容如下
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.83.42';
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
#`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
- 集群配置文件中增加
master_ip_failover_script= /usr/local/bin/master_ip_failover
- 再检查集群状态
masterha_check_repl --conf=/etc/mha/mha.conf
/usr/local/bin/master_ip_failover添加或者修改的内容意思是当主库数据库发生故障时,会触发MHA切换,MHA Manager会停掉主库上的keepalived服务,触发虚拟ip漂移到备选从库,从而完成切换。当然可以在keepalived里面引入脚本,这个脚本监控mysql是否正常运行,如果不正常,则调用该脚本杀掉keepalived进程。
自动Failover(必须先启动MHA Manager,否则无法自动切换,
杀掉主库mysql进程,模拟主库发生故障,进行自动failover操作。查看MHA切换日志,
看到最后的Master failover to 192.168.83.42 completed successfully.说明备选master现在已经上位了。
MHA的切换过程
- 配置文件检查阶段,这个阶段会检查整个集群配置文件配置
- 宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作(这个我这里还没有实现,需要研究)
- 复制dead maste和最新slave相差的relay log,并保存到MHA Manger具体的目录下
- 识别含有最新更新的slave
- 应用从master保存的二进制日志事件(binlog events)
- 提升一个slave为新的master进行复制
- 使其他的slave连接新的master进行复制
最后启动MHA Manger监控,查看集群里面现在谁是master
修复宕机的Master
通常情况下自动切换以后,原master可能已经废弃掉,待原master主机修复后,如果数据完整的情况下,可能想把原来master重新作为新主库的slave,这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。下面是提取相关日志的命令:
grep -i "All other slaves should start" /usr/local/mha/manager.log
获取上述信息以后,就可以直接在修复后的master上执行change master to相关操作,重新作为从库了。
- 修复old master
reset master;
shop slave;
change master to master_host='192.168.83.48',master_user='root',master_password='password!',master_port=3306,master_log_file='master-binlog.000002',master_log_pos=2787098;
or
change master to master_host='192.168.83.48',master_user='root',master_password='password!',master_port=3306,master_auto_position=1;
start slave;
- 查主从库状态
主:
mysql> show master status;
+------------------+----------+--------------+------------------+------------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+------------------------------------------------------------------------------------+
| mysql_bin.000001 | 299256 | | | 1b0bf24f-42b3-11e8-8fbd-080027b9b16f:1,
3862faeb-42b3-11e8-964d-0800277b6835:1-374 |
+------------------+----------+--------------+------------------+------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
从:
mysql> show slave status\G;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 28
Current database: sam
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.83.48
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql_bin.000001
Read_Master_Log_Pos: 299256
Relay_Log_File: cluster_node2-relay-bin.000002
Relay_Log_Pos: 1131
Relay_Master_Log_File: mysql_bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 299256
Relay_Log_Space: 1346
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: 1b0bf24f-42b3-11e8-8fbd-080027b9b16f
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b0bf24f-42b3-11e8-8fbd-080027b9b16f:1
Executed_Gtid_Set: 1b0bf24f-42b3-11e8-8fbd-080027b9b16f:1
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.01 sec)
ERROR:
No query specified