mysql mha+keepalived


1、 配置hosts文件
Vi /etc/hosts
10.10.6.86   dg
10.10.6.87   dg2
10.10.6.90   dgt

2、 三台机器配置互信关系
ssh-keygen -t dsa
ssh-keygen -t rsa

3、搭建主从从

 mysql架构:
              dg2从库(备主)6.87
             /
 dg主库 6.86
(vip6.81)    \
              dgt从库6.90
     

MHA 架构

 6.90(manage节点)   6.86(node 节点)    6.87(node节点)

keepalive 部署在mysql 主库和备主上,当主库down机后,会自动漂移到备主库上,缩短停机时间

主库损坏后 
dg2 主库----dgt从库    


MHA也可以扩展到如下的多节点集群:
优点:
1、可以进行故障的自动检测和转移
2、可扩展性较好,可以根据需要扩展MySQL的节点数量和结构
3、相比于双节点的MySQL复制,三节点/多节点的MySQL发生不可用的概率更低


缺点:
1、至少需要三节点,相对于双节点需要更多的资源
2、逻辑较为复杂,发生故障后排查问题,定位问题更加困难
3、数据一致性仍然靠原生半同步复制保证,仍然存在数据不一致的风险
4、可能因为网络分区发生脑裂现象。



4修改参数配置文件/etc/my.cnf
主库参数文件
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
#skip-grant-tables
log_bin = mysql-bin
port = 3306
character-set-server = utf8
lower_case_table_names = 1
log_error = /var/lib/mysql/mysql-error.log
datadir=/var/lib/mysql  
socket=/var/lib/mysql/mysql.sock  
key_buffer_size = 100M
join_buffer_size = 128M
sort_buffer_size = 24M
read_rnd_buffer_size = 24M
innodb_buffer_pool_size = 48M
innodb_log_file_size = 24M
innodb_log_files_in_group = 4
innodb_log_buffer_size = 8M
innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
log-bin-trust-function-creators = 1
sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
binlog-ignore-db = mysql,information_schema
sync_binlog = 1
skip-external-locking
skip-host-cache
skip-name-resolve
collation-server = utf8_general_ci
long_query_time = 1
slow_query_log_file = /var/lib/mysql/mysql-slow.log
max_connections = 500
user=mysql  
replicate-ignore-db = mysql    
server-id           = 1  
log_bin             = /var/lib/mysql/mysql-bin.log
log_bin_index       = /var/lib/mysql/mysql-bin.log.index
relay_log           = /var/lib/mysql/mysql-bin.relay
relay_log_index     = /var/lib/mysql/mysql-bin.relay.index
expire_logs_days    = 10
max_binlog_size     = 100M


从库参数文件在此基础上需要修改的是
server-id=2 
read_only=1 
relay_log_purge=0
binlog_ignore_db= information_schema,mysql  //主库从库必须一致
relay_log_purge=0 

****************************************
插曲:关于relay_log_purge
MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,
采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,
这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。
定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。
为了避免复制延时,需要暂时为中继日志创建硬链接,
因为在linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)


手工清理relay log
purge_relay_logs --user=root --password=123456 --port=3306 -disable_relay_log_purge --workdir=/data/ 


--workdir                         //指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除
--disable_relay_log_purge         //默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。


purge_relay_logs的功能
  a、为relay日志创建硬链接(最小化批量删除大文件导致的性能问题)
  b、SET GLOBAL relay_log_purge=1; FLUSH LOGS; SET GLOBAL relay_log_purge=0;
  c、删除relay log(rm –f  /path/to/archive_dir/*)
  
定制清理relay log cronjob
pureg_relay_logs脚本在不阻塞SQL线程的情况下自动清理relay log。对于不断产生的relay log直接将该脚本部署到
crontab以实现按天或按小时定期清理。  
pureg_relay_logs脚本在不阻塞SQL线程的情况下自动清理relay log。对于不断产生的relay log直接将该脚本部署到crontab以实现按天或按小时定期清理。
$ crontab -l  
# purge relay logs at 5am  
0 5 * * * /usr/bin/purge_relay_logs --user=root --password=PASSWORD --disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1   
********************************************


5、锁定主库,做数据导出
flush tables with read lock;
show master status;
+------------------+----------+--------------+--------------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB         | Executed_Gtid_Set |
+------------------+----------+--------------+--------------------------+-------------------+
| mysql-bin.000001 |      154 |              | mysql,information_schema |                   |
+------------------+----------+--------------+--------------------------+-------------------+
1 row in set (0.03 sec)


[root@dg mha]#  mysqldump -uroot -p --all-databases --master-data  > /tmp/db_all.sql
 UNLOCK TABLES; 

6、传输数据到备库

[root@dg mha]# scp /tmp/db_all.sql  dg2:/tmp/                                                                                                                                                                                  100%  757KB 756.7KB/s   00:00    
[root@dg mha]# scp /tmp/db_all.sql  dgt:/tmp/

7、建立传输用户
主库:
GRANT REPLICATION SLAVE ON *.* TO 'rep'@'10.10.6.%' IDENTIFIED BY 'bc.123456'; 

备主:
GRANT REPLICATION SLAVE ON *.* TO 'rep'@'10.10.6.%' IDENTIFIED BY 'bc.123456'; 


8、修改检查点

从库:
mysql -uroot -p < /tmp/db_all.sql
CHANGE MASTER TO master_host='10.10.6.86', master_port=3306, master_user='rep',master_password='bc.123456', master_log_file='mysql-bin.000001', master_log_pos=154; 


Startup slave;
status slave;

show slave status\G;


9配置MHA


软件下载可以从http://download.csdn.net/detail/wll_1017/9793651

mha  软件安装
[root@dgt mha]# yum localinstall  mha4mysql-manager-0.55-0.el6.noarch.rpm
[root@dg mha]# yum localinstall  mha4mysql-node-0.54-0.el6.noarch.rpm
[root@dg2 mha]# yum localinstall  mha4mysql-node-0.54-0.el6.noarch.rpm


成功安装后,会在/usr/bin目录下生成如下一系列命令工具:

[root@dgt bin]# ls -lrt masterha*
-rwxr-xr-x 1 root root 1739 Dec 13  2012 masterha_stop
-rwxr-xr-x 1 root root 3879 Dec 13  2012 masterha_secondary_check
-rwxr-xr-x 1 root root 2373 Dec 13  2012 masterha_master_switch
-rwxr-xr-x 1 root root 2165 Dec 13  2012 masterha_master_monitor
-rwxr-xr-x 1 root root 2517 Dec 13  2012 masterha_manager
-rwxr-xr-x 1 root root 3201 Dec 13  2012 masterha_conf_host
-rwxr-xr-x 1 root root 1865 Dec 13  2012 masterha_check_status
-rwxr-xr-x 1 root root 1779 Dec 13  2012 masterha_check_ssh
-rwxr-xr-x 1 root root 1995 Dec 13  2012 masterha_check_repl


 Manager工具:
/usr/bin/masterha_check_repl ――检查MySQL Replication是否正常;
/usr/bin/masterha_conf_host ――添加或删除配置的Server信息;
/usr/bin/masterha_master_switch ――用于手动Master切换;
/usr/bin/masterha_check_ssh ――检查各个Node之间SSH登录是否正常;
/usr/bin/masterha_manager ――开启MHA
/usr/bin/masterha_secondary_check ――检查多路由配置;
/usr/bin/masterha_check_status ――检查MHA是否开启并正常运行;
/usr/bin/masterha_master_monitor ――手动开启监控,启动MHA时会自动启动监控
/usr/bin/masterha_stop ――关闭MHA

Node工具:
/usr/bin/save_binary_logs ――保存和复制master的二进制日志;
/usr/bin/apply_diff_relay_logs ――识别差异的中继日志事件并应用于其它Slave;
/usr/bin/filter_mysqlbinlog ――去除不必要的Rollback事件(MHA已不再使用该工具);
/usr/bin/purge_relay_logs ――清除中继日志(不会阻塞SQL线程);
备注:Node端工具通常由MHA Manager的脚本触发调用,无需DBA操作。

配置keepalive 

具体详见

http://blog.csdn.net/wll_1017/article/details/65635218

mha 参数文件设置

[server default]
user=root                      #mysql用戶名
password=bc.123456   #mysql密码
ssh_user=root                #ssh免密钥登录的帐号名
repl_user=rep                       #mysql复制帐号
repl_password=bc.123456   #mysql复制密码
ping_interval=1                       #ping间隔,用来检测master是否正常,默认是3秒,尝试三次没有回应的时候自动进行failover
manager_workdir=/mha/app1     #数据目录,主要该目录的权限,需要有创建的权限
manager_log=/mha/app1/manager.log     #日志文件
remote_workdir=/mha/app1/app1     #另外2台机子在运行时候需要创建的目录,注意ssh-keygen帐号的权限问题
master_ip_failover_script=/mha/app1/master_ip_failover
[server1]
hostname=10.10.6.86
candidate_master=1           //主库
master_binlog_dir=/var/lib/mysql #binlog目录 
report_script=/usr/local/bin/send_report   
master_ip_failover_script=/mha/app1/master_ip_failover.sh
master_ip_online_change_script=/mha/app1/master_ip_online_change
[server2]
hostname=10.10.6.87
candidate_master=1   //备主
[server3]
hostname=10.10.6.90
no_master=1     //不会成为主库


其他:
设置no_master=1使主机不能成为新master
candidate_master=1 #master机宕掉后,优先启用这台作为新master
master_binlog_dir=/data/ndb #mysql数据库目录
ping_interval=1 #ping间隔,用来检测master是否正常
检查SSH公钥免密码登录(管理节点) 
masterha_check_ssh --conf=/etc/app1.cnf  
出现“All SSH connection tests passed successfully.”即为成功 



配置masterha_default.cnf文件
这是全局配置文件,缺省为/etc/masterha_default.cnf,适用于一个Manager管理多套MySQL Replication的情况,在[server_default]下定义一些多套复
制环境通用的Global Scope类型的参数。本例只有一套MySQL Replication,所以也可不用配置该文件,而是在对应的应用配置文件(appl.conf)下的[server_default]
中定义相关参数。
执行MHA相关命令时,会在/etc目录下搜索该配置文件,若找不到,虽然不会有什么错误,但会给出一个警告,如“[warning] Global configuration file 
/etc/masterha_default.cnf not found.”。
为此可以在/etc目录下创建一个名为masterha_default.cnf的空文件,本例不打算这么做,而是在其中配置一些通用的[server_default]类参数,如下:


# vi /etc/masterha_default.cnf
[server default]
user= root
password=Az.123456 --mysql密码
ssh_user=root
repl_user=rep
repl_password=Az.123456
ping_interval=3
ping_type =SELECT

开启mha服务

(1)检查MySQL复制(管理节点)
/usr/bin/masterha_check_repl --conf=/mha/scripts/app1.cnf  
出现“MySQL Replication Health is OK. ”即为成功 

(2)查看manager状态: 
 #/usr/bin/masterha_check_status --conf=/mha/scripts/app1.cnf  
app1 is stopped(2:NOT_RUNNING).

(3) 查看帮助
#  /usr/bin/masterha_manager --help
Usage:
    masterha_manager --global_conf=/etc/masterha_default.cnf
    --conf=/usr/local/masterha/conf/app1.cnf

(4) 开启mha服务    
/usr/bin/masterha_manager --conf=/mha/scripts/app1.cnf   &
为了使MHA持续运行在服务器端,可通过如下命令使其不挂起运行在后台:
nohup /usr/bin/masterha_manager --conf=/mha/scripts/app1.cnf   &
MHA开启之后,会在其工作目录下生成如下两个文件: appl.master_status.health manager.log

nohup /usr/bin/masterha_manager --conf=/mha/scripts/app1.cnf  --remove_dead_master_conf & 


(5)对于一个正在运行的MHA,可通过masterha_stop命令关闭,如下:
[root@dgt scripts]# /usr/bin/masterha_stop --conf=/mha/scripts/app1.cnf
  
(6)手工切换主和备主  
masterha_master_switch 工具进行主从切换,在手动切换的同时需要保证没有启用MHA自动切换功能。
或者是主动关闭主库/etc/init.d/mysqld stop
# masterha_master_switch --master_state=alive --conf=/etc/masterha/app1.cnf


 masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=10.10.6.87 --new_master_port=3306  --orig_master_is_new_slave --running_updates_limit=10000
 
--orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动
--running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),
但是切换的时间长短是由recover 时relay 日志的大小决定 



执行上面的命令:

错误:

[root@dgt ~]# /usr/bin/masterha_check_repl --conf=/mha/scripts/app1.cnf  
Sun Mar 19 10:10:47 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Mar 19 10:10:47 2017 - [info] Reading application default configurations from /mha/scripts/app1.cnf..
Sun Mar 19 10:10:47 2017 - [info] Reading server configurations from /mha/scripts/app1.cnf..
Sun Mar 19 10:10:47 2017 - [info] MHA::MasterMonitor version 0.55.
Sun Mar 19 10:10:47 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln255] Got MySQL error when connecting 10.10.6.87(10.10.6.87:3306) :1045:Access denied for user 'root'@'10.10.6.90' (using password: YES), but this is not mysql crash. Check MySQL server settings.
 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 251
Sun Mar 19 10:10:47 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln255] Got MySQL error when connecting 10.10.6.90(10.10.6.90:3306) :1045:Access denied for user 'root'@'10.10.6.90' (using password: YES), but this is not mysql crash. Check MySQL server settings.
 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 251
Sun Mar 19 10:10:47 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln255] Got MySQL error when connecting 10.10.6.86(10.10.6.86:3306) :1045:Access denied for user 'root'@'10.10.6.90' (using password: YES), but this is not mysql crash. Check MySQL server settings.
 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 251
Sun Mar 19 10:10:47 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln263] Got fatal error, stopping operations
Sun Mar 19 10:10:47 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 300
Sun Mar 19 10:10:47 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Sun Mar 19 10:10:47 2017 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!
    
解决问题:
MySQL的用户权限的配置:
用户:root(所有节点)。
grant all privileges on *.* to root@'10.10.6.%' identified by 'bc.123456';     
    
  
检查MySQL复制(管理节点
[root@dgt ~]# /usr/bin/masterha_check_repl --conf=/mha/scripts/app1.cnf  
Sun Mar 19 11:04:11 2017 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Sun Mar 19 11:04:11 2017 - [info] Reading application default configurations from /mha/scripts/app1.cnf..
Sun Mar 19 11:04:11 2017 - [info] Reading server configurations from /mha/scripts/app1.cnf..
Sun Mar 19 11:04:11 2017 - [info] MHA::MasterMonitor version 0.55.
Sun Mar 19 11:04:12 2017 - [info] Dead Servers:
Sun Mar 19 11:04:12 2017 - [info] Alive Servers:
Sun Mar 19 11:04:12 2017 - [info]   10.10.6.86(10.10.6.86:3306)
Sun Mar 19 11:04:12 2017 - [info]   10.10.6.87(10.10.6.87:3306)
Sun Mar 19 11:04:12 2017 - [info]   10.10.6.90(10.10.6.90:3306)
Sun Mar 19 11:04:12 2017 - [info] Alive Slaves:
Sun Mar 19 11:04:12 2017 - [info]   10.10.6.87(10.10.6.87:3306)  Version=5.7.17-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Sun Mar 19 11:04:12 2017 - [info]     Replicating from 10.10.6.86(10.10.6.86:3306)
Sun Mar 19 11:04:12 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Mar 19 11:04:12 2017 - [info]   10.10.6.90(10.10.6.90:3306)  Version=5.7.17-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Sun Mar 19 11:04:12 2017 - [info]     Replicating from 10.10.6.86(10.10.6.86:3306)
Sun Mar 19 11:04:12 2017 - [info]     Not candidate for the new Master (no_master is set)
Sun Mar 19 11:04:12 2017 - [info] Current Alive Master: 10.10.6.86(10.10.6.86:3306)
Sun Mar 19 11:04:12 2017 - [info] Checking slave configurations..
Sun Mar 19 11:04:12 2017 - [info] Checking replication filtering settings..
Sun Mar 19 11:04:12 2017 - [info]  binlog_do_db= , binlog_ignore_db= information_schema,mysql
Sun Mar 19 11:04:12 2017 - [info]  Replication filtering check ok.
Sun Mar 19 11:04:12 2017 - [info] Starting SSH connection tests..
Sun Mar 19 11:04:13 2017 - [info] All SSH connection tests passed successfully.
Sun Mar 19 11:04:13 2017 - [info] Checking MHA Node version..
Sun Mar 19 11:04:13 2017 - [info]  Version check ok.
Sun Mar 19 11:04:13 2017 - [info] Checking SSH publickey authentication settings on the current master..
Sun Mar 19 11:04:13 2017 - [info] HealthCheck: SSH to 10.10.6.86 is reachable.
Sun Mar 19 11:04:14 2017 - [info] Master MHA Node version is 0.54.
Sun Mar 19 11:04:14 2017 - [info] Checking recovery script configurations on the current master..
Sun Mar 19 11:04:14 2017 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/mha/app1/app1/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000001 
Sun Mar 19 11:04:14 2017 - [info]   Connecting to root@10.10.6.86(10.10.6.86).. 
  Creating /mha/app1/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000001
Sun Mar 19 11:04:14 2017 - [info] Master setting check done.
Sun Mar 19 11:04:14 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Mar 19 11:04:14 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.10.6.87 --slave_ip=10.10.6.87 --slave_port=3306 --workdir=/mha/app1/app1 --target_version=5.7.17-enterprise-commercial-advanced-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Sun Mar 19 11:04:14 2017 - [info]   Connecting to root@10.10.6.87(10.10.6.87:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-bin.000014
    Temporary relay log file is /var/lib/mysql/mysql-bin.000014
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Mar 19 11:04:14 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.10.6.90 --slave_ip=10.10.6.90 --slave_port=3306 --workdir=/mha/app1/app1 --target_version=5.7.17-enterprise-commercial-advanced-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Sun Mar 19 11:04:14 2017 - [info]   Connecting to root@10.10.6.90(10.10.6.90:22).. 
Creating directory /mha/app1/app1.. done.
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-bin.000012
    Temporary relay log file is /var/lib/mysql/mysql-bin.000012
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Mar 19 11:04:15 2017 - [info] Slaves settings check done.
Sun Mar 19 11:04:15 2017 - [info] 
10.10.6.86 (current master)
 +--10.10.6.87
 +--10.10.6.90


Sun Mar 19 11:04:15 2017 - [info] Checking replication health on 10.10.6.87..
Sun Mar 19 11:04:15 2017 - [info]  ok.
Sun Mar 19 11:04:15 2017 - [info] Checking replication health on 10.10.6.90..
Sun Mar 19 11:04:15 2017 - [info]  ok.
Sun Mar 19 11:04:15 2017 - [warning] master_ip_failover_script is not defined.
Sun Mar 19 11:04:15 2017 - [warning] shutdown_script is not defined.
Sun Mar 19 11:04:15 2017 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.

开启服务

/usr/bin/masterha_manager --conf=/mha/scripts/app1.cnf   &

查看manager状态: 
[root@dgt ~]# /usr/bin/masterha_check_status --conf=/mha/scripts/app1.cnf 
app1 is stopped(2:NOT_RUNNING).

可以看到mha管理节点的进程

[root@dgt scripts]# ps -ef |grep mha
root      46618  45964  1 19:51 pts/1    00:00:00 perl /usr/bin/masterha_manager --conf=/mha/scripts/app1.cnf
root      46728  45964  0 19:52 pts/1    00:00:00 grep mha

杀死数据库进程前,主库可以发现vip 在10.10.6.86

[root@dg app1]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:aa:7e:a4 brd ff:ff:ff:ff:ff:ff
    inet 10.10.6.86/24 brd 10.10.6.255 scope global eth0
    inet 10.10.6.81/24 scope global secondary eth0

在主库杀死数据库进程

[root@dg mha]# ps -ef |grep mysql
root      23651      1  0 18:37 pts/0    00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql     24299  23651  0 18:37 pts/0    00:00:02 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/mysql-error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
root      25369   2369  0 19:57 pts/0    00:00:00 grep mysql
[root@dg mha]# kill -9 23651
[root@dg mha]# kill -9 24299

查看vip 是否还在主库,发现已经不在主库端,飘到从库
[root@dg mha]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:aa:7e:a4 brd ff:ff:ff:ff:ff:ff
    inet 10.10.6.86/24 brd 10.10.6.255 scope global eth0                 

 

从库发现vip 在10.10.6.87

[root@dg2 app1]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:88:33:c8 brd ff:ff:ff:ff:ff:ff
    inet 10.10.6.87/24 brd 10.10.6.255 scope global eth0
    inet 10.10.6.81/24 scope global secondary eth0


在mha的管理节点上10.10.6.90上可以看到 mha管理进程以及退出

[root@dgt scripts]#   Creating /mha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000015
Fri Mar 24 19:57:19 2017 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Fri Mar 24 19:57:19 2017 - [info] Reading application default configurations from /mha/scripts/app1.cnf..
Fri Mar 24 19:57:19 2017 - [info] Reading server configurations from /mha/scripts/app1.cnf..


mha管理进程已经退出

[root@dgt scripts]# ps -ef |grep ma
root      48713  45964  0 20:25 pts/1    00:00:00 grep ma


再重新开启MHA程序,这里需要 注意:
一旦发生切换管理进程(Manager)将会退出,无法进行再次测试,需将故障数据库解决掉之后,重新change加入到MHA环境中来或
则在切换的时候就加上--remove_dead_master_conf参数,并且要保证app1.failover.complete不存在或则加上--ignore_last_failover
参数忽略,才能再次开启管理进程。

mysql已经成功的切换到备份机上,这时我还注意到一个问题 就是这个切换过程不会立即切换,需要花费几秒时间,
也就是说数据在这个空档是不能写入的,这对于要求数据的查询和写入实时性要求较高的企业带来了困难,如何解决这个问题,
主要有两个思路:
1 通过脚本实现failover(故障转移)
2 通过keepalived实现虚拟IP 虚拟IP的地址随着master的改变而漂移


其中MHA用到的脚本

[root@dgt scripts]# cat  master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
  
use Getopt::Long;
  
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
  
my $vip = '10.10.6.81';
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop";
  
GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);
  
exit &main();
  
sub main {
  
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
  
    if ( $command eq "stop" || $command eq "stopssh" ) {
  
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
  
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        #`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
  
# A simple system call that enable the VIP on the new master
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
  
sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}


[root@dgt scripts]# cat master_ip_online_change
#!/usr/bin/env perl  
use strict;  
use warnings FATAL =>'all';  
  
use Getopt::Long;  
  
my $vip = '10.1.5.21/24';  # Virtual IP  
my $key = "1";  
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";  
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";  
my $exit_code = 0;  
  
my (  
  $command,              $orig_master_is_new_slave, $orig_master_host,  
  $orig_master_ip,       $orig_master_port,         $orig_master_user,  
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,  
  $new_master_ip,        $new_master_port,          $new_master_user,  
  $new_master_password,  $new_master_ssh_user,  
);  
GetOptions(  
  'command=s'                => \$command,  
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,  
  'orig_master_host=s'       => \$orig_master_host,  
  'orig_master_ip=s'         => \$orig_master_ip,  
  'orig_master_port=i'       => \$orig_master_port,  
  'orig_master_user=s'       => \$orig_master_user,  
  'orig_master_password=s'   => \$orig_master_password,  
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,  
  'new_master_host=s'        => \$new_master_host,  
  'new_master_ip=s'          => \$new_master_ip,  
  'new_master_port=i'        => \$new_master_port,  
  'new_master_user=s'        => \$new_master_user,  
  'new_master_password=s'    => \$new_master_password,  
  'new_master_ssh_user=s'    => \$new_master_ssh_user,  
);  
  
  
exit &main();  
  
sub main {  
  
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";  
  
if ( $command eq "stop" || $command eq "stopssh" ) {  
  
        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.  
        # If you manage master ip address at global catalog database,  
        # invalidate orig_master_ip here.  
        my $exit_code = 1;  
        eval {  
            print "\n\n\n***************************************************************\n";  
            print "Disabling the VIP - $vip on old master: $orig_master_host\n";  
            print "***************************************************************\n\n\n\n";  
&stop_vip();  
            $exit_code = 0;  
        };  
        if ($@) {  
            warn "Got Error: $@\n";  
            exit $exit_code;  
        }  
        exit $exit_code;  
}  
elsif ( $command eq "start" ) {  
  
        # all arguments are passed.  
        # If you manage master ip address at global catalog database,  
        # activate new_master_ip here.  
        # You can also grant write access (create user, set read_only=0, etc) here.  
my $exit_code = 10;  
        eval {  
            print "\n\n\n***************************************************************\n";  
            print "Enabling the VIP - $vip on new master: $new_master_host \n";  
            print "***************************************************************\n\n\n\n";  
&start_vip();  
            $exit_code = 0;  
        };  
        if ($@) {  
            warn $@;  
            exit $exit_code;  
        }  
        exit $exit_code;  
}  
elsif ( $command eq "status" ) {  
        print "Checking the Status of the script.. OK \n";  
        `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`;  
        exit 0;  
}  
else {  
&usage();  
        exit 1;  
}  
}  
  
# A simple system call that enable the VIP on the new master  
sub start_vip() {  
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;  
}  
# A simple system call that disable the VIP on the old_master  
sub stop_vip() {  
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;  
}  
  
sub usage {  
print  
"Usage: master_ip_failover ?command=start|stop|stopssh|status -orig_master_host=host -orig_master_ip=ip -orig_master_port=po  
rt -new_master_host=host ?new_master_ip=ip ?new_master_port=port\n";  


[root@dgt scripts]# cat report_script
#!/usr/bin/perl
use strict;   
use warnings FATAL => 'all';    
use Mail::Sender;    
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded   
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );    
my $smtp='smtp.163.com';    
my $mail_from='xxxx';    
my $mail_user='xxxxx';    
my $mail_pass='xxxxx';    
my $mail_to=['xxxx','xxxx'];    
GetOptions(    
  'orig_master_host=s' => \$dead_master_host,    
  'new_master_host=s'  => \$new_master_host,    
  'new_slave_hosts=s'  => \$new_slave_hosts,    
  'subject=s'          => \$subject,    
  'body=s'             => \$body,    
);
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
sub mailToContacts {   
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;    
    open my $DEBUG, "> /tmp/monitormail.log"    
        or die "Can't open the debug      file:$!\n";    
    my $sender = new Mail::Sender {    
        ctype       => 'text/plain; charset=utf-8',    
        encoding    => 'utf-8',    
        smtp        => $smtp,    
        from        => $mail_from,    
        auth        => 'LOGIN',    
        TLS_allowed => '0',    
        authid      => $user,    
        authpwd     => $passwd,    
        to          => $mail_to,    
        subject     => $subject,    
        debug       => $DEBUG    
    };
    $sender->MailMsg(   
        {   msg   => $msg,    
            debug => $DEBUG    
        }    
    ) or print $Mail::Sender::Error;    
    return 1;    
}
# Do whatever you want here
exit 0;



  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值