一,基本知识点
二 规划图:
三 集群描述:
机器规划和描述:
1. 从架构上来说MHA管理节点实时监控集群主从复制的一致性状态,当1.55主发送故障。从1.66会自动提升为主,vip切换到1.66上面.是业务不断线,达到了高可用。
2. 从数据线一致性来说:处于崩溃的master会以二进制日志事件保存,采用差异的中继日志方式,解析最新的slave上的中继日志relny log的日志头,从其他的从服务器确定出差异的位置,组成完整的二进制日志。其他的slave重新连接到新的master进行复制。
3. 从转移速度:MHA能做到在0~30秒之内自动完成数据库的故障切换操作
角色
ip地址
主机名
server_id
类型
Monitor host
172.16.1.38
yunwei
-
监控复制组
Master
172.16.1.55
info-slave
10
写入
Candicate master
172.16.1.66
info-master
20
读
Slave
172.16.1.116
apache2-filing
102
读
vip
172.16.1.45
-
-
-
atlas
172.16.1.118
Redis2
-
中间件
四.修改主从结构
采用GTID的主从复制,GTID用来代替传统复制方法,不再使用MASTER_LOG_FILE+MASTER_LOG_POS开启复制。而是使用MASTER_AUTO_POSTION=1的方式开始复制.MySQL-5.6.5开始支持的,MySQL-5.6.10后开始完善.
在传统的slave端,binlog是不用开启的,但是在GTID中slave端的binlog是必须开启的,目的是记录执行过的GTID(强制).
GTID = source_id:transaction_id.
更简单的实现failover,不用以前那样在需要找log_file和log_pos.
更简单的搭建主从复制.
比传统的复制更加安全.
GTID是连续的没有空洞的,保证数据的一致性,零丢失.
五.修改1.55主库的上my.cnf 配置开启GTID
添加如下参数:
1.gtid_mode=ON ### 开启GTID模块
enforce_gtid_consistency
log-slave-updates
binlog_format= row
skip-name-resolve
2.查看
3.重启数据库
4. 修改1.66 1.116 从库上修改的 my.cnf 配置
1.添加如下参数:
gtid_mode=on
enforce_gtid_consistency
log-slave-updates
skip-name-resolve
binlog_format=mixed
六.主上授权
grant replication slave on *.* to repl@'172.16.1.%' identified by '123456';
七.从连接主库
1.change master to master_host='172.16.1.117',master_user='repl',master_password='123456',master_auto_position=1;
2.start slave
3.mysql -uroot -p密码 -e 'show slave status\G' | egrep 'Slave_IO|Slave_SQL'
4.配置MHA
1. 172.16.1.38 部署MHA
2.所有节点安装 yum -y install perl-DBD-MySQL
所有节点安装rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
管理节点38 安装
yum -y install perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
在117主库上给mha 配置管理账号
grant all privileges on *.* to mha@'172.16.1.%'identified by 'mha';
5.所有的机器上执行命令软连接
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
创建配置文件目录
mkdir -p /etc/mha
创建日志目录
mkdir -p /var/log/mha/app1
编辑mha配置文件 vim /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/mha1/manager
manager_workdir=/var/log/mha/mha1
master_binlog_dir=/data/mysqldata
master_ip_failover_script=/usr/local/bin/master_ip_failover
#report_script=/usr/local/bin/send_report
password=mha
ping_interval=2
repl_password=123456
repl_user=slave
ssh_user=root
user=mha
[server1]
hostname=172.16.1.55
port=3306
[server2]
candidate_master=1 ###指定切换的主库
check_repl_delay=0
hostname=172.16.1.66
port=3306
[server3]
hostname=172.16.1.116
port=3306
配置ssh信任点四台机器执行所有的节点
*****************分发密钥***********************
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.16.1.55
ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.16.1.66
ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.16.1.116
ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.16.1.118
检查SHH、主从复制:
[root@mysql-db03 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf#ssh检查命令
Wed Apr 4 14:58:48 2018 - [debug] Connecting via SSH from root@172.16.1.55(172.16.1.55:22) to root@172.16.1.66(172.16.1.66:22)..
reverse mapping checking getaddrinfo for bogon [172.16.1.66] failed - POSSIBLE BREAK-IN ATTEMPT!
Wed Apr 4 14:58:48 2018 - [debug] ok.
Wed Apr 4 14:58:48 2018 - [debug] Connecting via SSH from root@172.16.1.55(172.16.1.55:22) to root@172.16.1.116(172.16.1.116:22)..
reverse mapping checking getaddrinfo for bogon [172.16.1.116] failed - POSSIBLE BREAK-IN ATTEMPT!
Wed Apr 4 14:58:49 2018 - [debug] ok.
Wed Apr 4 14:58:49 2018 - [debug]
Wed Apr 4 14:58:49 2018 - [debug] Connecting via SSH from root@172.16.1.66(172.16.1.66:22) to root@172.16.1.55(172.16.1.55:22)..
reverse mapping checking getaddrinfo for bogon [172.16.1.55] failed - POSSIBLE BREAK-IN ATTEMPT!
Wed Apr 4 14:58:49 2018 - [debug] ok.
Wed Apr 4 14:58:49 2018 - [debug] Connecting via SSH from root@172.16.1.66(172.16.1.66:22) to root@172.16.1.116(172.16.1.116:22)..
reverse mapping checking getaddrinfo for bogon [172.16.1.116] failed - POSSIBLE BREAK-IN ATTEMPT!
Wed Apr 4 14:58:49 2018 - [debug] ok.
Wed Apr 4 14:58:50 2018 - [debug]
Wed Apr 4 14:58:49 2018 - [debug] Connecting via SSH from root@172.16.1.116(172.16.1.116:22) to root@172.16.1.55(172.16.1.55:22)..
reverse mapping checking getaddrinfo for bogon [172.16.1.55] failed - POSSIBLE BREAK-IN ATTEMPT!
Wed Apr 4 14:58:49 2018 - [debug] ok.
Wed Apr 4 14:58:49 2018 - [debug] Connecting via SSH from root@172.16.1.116(172.16.1.116:22) to root@172.16.1.66(172.16.1.66:22)..
reverse mapping checking getaddrinfo for bogon [172.16.1.66] failed - POSSIBLE BREAK-IN ATTEMPT!
Wed Apr 4 14:58:50 2018 - [debug] ok.
Wed Apr 4 14:58:50 2018 - [info] All SSH connection tests passed successfully.
[root@mysql-db03 ~]#masterha_check_repl --conf=/etc/mha/app1.cnf #主从复制检测
Wed Apr 4 15:00:48 2018 - [info] Checking replication health on 172.16.1.66..
Wed Apr 4 15:00:48 2018 - [info] ok.
Wed Apr 4 15:00:48 2018 - [info] Checking replication health on 172.16.1.116..
Wed Apr 4 15:00:48 2018 - [info] ok.
Wed Apr 4 15:00:48 2018 - [info] Checking master_ip_failover_script status:
Wed Apr 4 15:00:48 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.1.55 --orig_master_ip=172.16.1.55 --orig_master_port=3306
Checking the Status of the script..OK
Wed Apr 4 15:00:48 2018 - [info] OK.
Wed Apr 4 15:00:48 2018 - [warning] shutdown_script is not defined.
Wed Apr 4 15:00:48 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK. --->检测MHA 集群监控值
启动MHA
nohup masterha_manager --conf=/etc/mha/mha1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mha1/manager.log 2>&1 &
六.配置VIP漂移
=======================================
=============配置VIP漂移===============
=======================================
1、从源码包中将需要的脚本拷贝出来
[root@mha-03 ~]# tar xf mha4mysql-manager-0.56.tar.gz
[root@mha-03 ~]# cd mha4mysql-manager-0.56
[root@mha-03 mha4mysql-manager-0.56]# cd samples/scripts/
[root@mha-03 scripts]# ll
总用量 32
-rwxr-xr-x. 1 4984 users 3648 4月 1 2014 master_ip_failover #这就是管理虚拟IP的脚本
-rwxr-xr-x. 1 4984 users 9870 4月 1 2014 master_ip_online_change
-rwxr-xr-x. 1 4984 users 11867 4月 1 2014 power_manager
-rwxr-xr-x. 1 4984 users 1360 4月 1 2014 send_report
2、将管理脚本拷贝到执行路径下
[root@mha-03 scripts]# cp master_ip_failover /usr/local/bin/ #复制到/usr/local/bin目录
[root@mha-03 scripts]# ll /usr/local/bin/master_ip_failover
-rwxr-xr-x. 1 root root 3648 10月 23 11:02 /usr/local/bin/master_ip_failover
3、在 /etc/mha/mha1.cnf 配置文件中添加脚本
master_ip_failover_script=/usr/local/bin/master_ip_failover #添加脚本位置
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port
);
my $vip = '172.16.1.45/24';
my $key = '0';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; --->###根据网卡的设置改具体的参数
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master:$orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip(); ###############
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script..OK \n";
`ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
先停止masterha_stop --conf=/etc/mha/app1.cnf
检查脚本状态
Tue Mar 27 17:22:30 2018 - [info] Checking master_ip_failover_script status:
Tue Mar 27 17:22:30 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.1.117 --orig_master_ip=172.16.1.117 --orig_master_port=3307
配置vip
/sbin/ifconfig eth0:0 172.16.1.45/24
七.配置binlog-server远程实时备份1.55主库上binlog日志
=========配置binlog-server=================
===========================================
添加binlog-server
创建存放binlog日志的目录
[root@mha-03 ~]# mkdir /data/mysql/binlog/ -p
拉取主库的binlog日志:---->一定要在当前目录下执行这条命令
[root@mha-03 binlog]# mysqlbinlog -R --host=172.16.1.55 --user=mha --password=123123 --raw --stop-never mysql-bin.000001 &
拉取主库binlog日志的进程运行:
[root@mha-03 binlog]# ps -ef | grep mysqlbinlog | grep -v grep
root 9314 3444 0 15
mysqlbinlog -R --host=172.16.1.55 --user=mha --password=x x --raw --stop-never mysql-bin.000014:25 pts/1 00:00:00
添加的模块:
[binlog1] #添加binlog模块
no_master=1 #不允许切换为主
hostname=mha-03.kun.cc #binlog日志服务器地址
master_binlog_dir=/data/mysql/binlog/ #binlog存放位置优先级比全局的高
mysqlbinlog -R --host=172.16.1.117 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &
测试masterha_check_repl --conf=/etc/mha/app1.cnf 检测当前主从一致性状态
nohup masterha_manager --conf=/etc/mha/app1.cnf -remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mha1/manager.log 2>&1 & ##后台运行mha
编写mha监控脚本
#!/bin/bash
function check_pid {
pid=`ps -ef | grep /usr/bin/masterha_manager | grep -v grep | awk '{print $2}'`
echo $pid
if [ $pid -gt 0 ];then ##不存在
echo "MHA PID is runing" >> /var/log/monitor.log
else
echo "MHA监控PID" | mail -s "MHA故障,进程已经宕机" 172771552@qq.com
masterha_manager --conf=/etc/mha/app1.cnf -remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mha1/manager.log 2>&1 & ##启动进程
fi
}
function main {
for i in {1..10}
do
check_pid
done
break
}
main
八.部署Atlas读写分离中间件
1.在1.55主库上授权
create user 'szx'@'172.16.1.118' identified by 'Pass@word';
grant all on *.* to 'szx'@'172.16.1.118'identified by 'Pass@word';
2.cd /usr/local/mysql-proxy/bin/
3../encrypt Pass@word 创建Atlas管理账号 --> atlas 授权程序连接入口密码Pass@word
4./usr/local/mysql-proxy/conf
修改atlas 配置文件
5.[mysql-proxy]
6#管理接口的用户名
7.admin-username = user
#管理接口的密码
admin-password = pwd
#Atlas后端连接的MySQL主库的IP和端口,可设置多项,用逗号分隔
proxy-backend-addresses = 172.16.1.45:3306 ---> 配置虚拟vip 达到主从切换 vip指定主库
#Atlas后端连接的MySQL从库的IP和端口,@后面的数字代表权重,用来作负载均衡,若省略则默认为1,可设置多项,用逗号分隔
proxy-read-only-backend-addresses = 172.16.1.166:3306@1,172.16.1.116:3306
#用户名与其对应的加密过的MySQL密码,密码使用PREFIX/bin目录下的加密程序encrypt加密,下行的user1和user2为示例,将其替换为你的MySQL的用户名和加密密码!
pwds = root:/iZxz+0GRoA=,szx:xWQZYJHSQrrmTh762ogNww== ---> 这里添加授权密码
daemon = true
#设置Atlas的运行方式,设为true时Atlas会启动两个进程,一个为monitor,一个为worker,monitor在worker意外退出后会自动将其重启,设为false时只有worker,没有monitor,一般开发调试时设为false,线上运行时设为true,true后面不能有空格。
keepalive = true
#工作线程数,对Atlas的性能有很大影响,可根据情况适当设置
event-threads = 32
#日志级别,分为message、warning、critical、error、debug五个级别
log-level = debug
#日志存放的路径
log-path = /usr/local/mysql-proxy/log
#SQL日志的开关,可设置为OFF、ON、REALTIME,OFF代表不记录SQL日志,ON代表记
录SQL日志,REALTIME代表记录SQL日志且实时写入磁盘,默认为OFF
sql-log = ON
#慢日志输出设置。当设置了该参数时,则日志只输出执行时间超过sql-log-slow(单位:ms)的日志记录。不设置该参数则输出全部日志。
#sql-log-slow = 10
#实例名称,用于同一台机器上多个Atlas实例间的区分
#instance = test
#Atlas监听的工作接口IP和端口
proxy-address = 0.0.0.0:3306
#Atlas监听的管理接口IP和端口
admin-address = 0.0.0.0:1234
#tables = person.mt.id.3
charset = utf8
#client-ips = 127.0.0.1, 192.168.1
#lvs-ips = 192.168.1.1
启动atlas /usr/local/mysql-proxy/bin/mysql-proxyd test start
进入atlas管理界面
mysql -uuser -ppwd -P1234 -h172.16.1.118
查看后端
+-------------+-------------------+-------+------+
| backend_ndx | address | state | type |
+-------------+-------------------+-------+------+
| 1 | 172.16.1.45:3306 | up | rw |
| 2 | 172.16.1.66:3306 | up | ro |
| 3 | 172.16.1.116:3306 | up | ro |
+-------------+-------------------+-------+------+
Rw 代表读写 ro 代表只读
授权程序连接sql线程
Atlas上面./encrypt 设置允许连接的密码
配置文件添加允许连接的用户密码
pwds = root:/iZxz+0GRoA=,szx:xWQZYJHSQrrmTh762ogNww==
测试连接atlas
mysql -uszx -h172.16.1.118 -pPass@word 程序连接授权的用户和密码
修改程序连接数据库的文件
sed -i 's/172.16.1.55/172.16.1.118/g' DbConfig.php
看到atlas的sql日志
[04/04/2018 15:32:14] C:172.16.1.68:46490 S:172.16.1.66:3306 OK 0.970 "SHOW FULL COLUMNS FROM `MESSAGE_REMIND`"
[04/04/2018 15:32:12] C:172.16.1.64:57374 S:172.16.1.66:3306 OK 12.151 "SELECT * FROM `LOG_USER_LOGIN` `t` WHERE SessionID='9nvn3721s78280qq9iepdmonb2' and UserId='2633574085583' LIMIT 1"
[04/04/2018 15:31:10] C:172.16.1.70:34640 S:172.16.1.66:3306 OK 0.724 "SHOW FULL COLUMNS FROM `MESSAGE_REMIND`"
[04/04/2018 15:31:10] C:172.16.1.70:34640 S:172.16.1.116:3306 OK 0.194 "SHOW CREATE TABLE `MESSAGE_REMIND`"
[04/04/2018 15:31:10] C:172.16.1.70:34640 S:172.16.1.45:3306 OK 1.406 "UPDATE `LOG_USER_LOGIN` SET `Id`='41980', `UserId`='2627214699994', `LoginName`='15011436643',
九.问题
如果发现如下错误:
1.Can't exec "mysqlbinlog": No such file or directory at /usr/local/share/perl5/MHA/BinlogManager.pm line 99.
mysqlbinlog version not found!
解决方法如下,添加软连接(所有节点)
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
2. 查看日志
tail -n20 /var/log/mha/mha1/manager
3.查看当前集群状态,显示主库
masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:20386) is running(0:PING_OK), master:192.168.1.55
3. 当主库切换了,恢复从库使用授权重新恢复主从一致性
change master to master_host='master_ip',master_user='repl',master_password='123456',master_auto_position=1;
4. 程序连接端口可以暂时使用55库,作为程序连接
mysql -h172.16.1.55 -uszx -pPass@word
5. Atlas 上开启了防火墙,安全加固 --> 只允许web主机连接
ACCEPT all -- 172.16.1.68/24 0.0.0.0/0
ACCEPT all -- 172.16.1.64/24 0.0.0.0/0
ACCEPT all -- 172.16.1.92/24 0.0.0.0/0