mysql 互为主备 宕机 数据丢失_mysql group replication 主节点宕机恢复

一、mysql group replication 生来就要面对两个问题:

一、主节点宕机如何恢复。

二、多数节点离线的情况下、余下节点如何继续承载业务。

在这里我们只讨论第一个问题、也就是说当主结点宕机之后、我们怎么把它从新加入到高可用集群中去。这个问题又可以细分成

两种情况:

1、温和打击:主结点的数据还在、宕机期间集群中的其它结点的binlog日志也都还在

这种情况下重新启动mysql group replication 就可修复问题。

2、毁灭打击:主结点的数据都没有了

这种情况下要从其余结点备份恢复宕机结点、然后再重启mysql group replication 就可修复问题。

详细的修复步骤请看后面的例子

二、环境介绍:

环境简介

主机名         ip地址        mgr角色

mtls17        10.186.19.17      primary

mtls18        10.186.19.18      seconde

mtls19        10.186.19.19      seconde

集群状态:

mysql> select * fromreplication_group_members;+---------------------------+--------------------------------------+-------------+-------------+--------------+

| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |

| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |

| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

3 rows in set (0.00sec)

mysql> show global status like 'group_replication_primary_member';+----------------------------------+--------------------------------------+

| Variable_name | Value |

+----------------------------------+--------------------------------------+

| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |

+----------------------------------+--------------------------------------+

1 row in set (0.00 sec)

说明:

由上面的信息可以看出mtls17上的mysql为集群当前的primary结点、并且集群的各结点的状态正常。

三、情况下的故障模拟 + 解决:

1、模拟mtls17结点宕机

ps -ef | grepmysql

mysql24125 1 0 00:04 ? 00:00:14 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf

root26125 22481 0 00:36 pts/0 00:00:00 grep --color=auto mysql

[root@mtls17 data]#kill -9 24125[root@mtls17 data]#ps -ef | grepmysql

root26128 22481 0 00:37 pts/0 00:00:00 grep --color=auto mysql

2、查看余下两个结点的情况

mysql> melect * fromreplication_group_members;+---------------------------+--------------------------------------+-------------+-------------+--------------+

| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |

| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

2 rows in set (0.00sec)

mysql> show global status like 'group_replication_primary_member';+----------------------------------+--------------------------------------+

| Variable_name | Value |

+----------------------------------+--------------------------------------+

| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |

+----------------------------------+--------------------------------------+

1 row in set (0.00 sec)

由上面可以看出在mtls17结点上的mysql被kill掉之后、余下的两个结点组成了新的集群、并且mtls18上的mysql

成为了primary

3、解决primary宕机恢复的问题

systemctl start mysql

[root@mtls17 data]# mysql-uroot -pmtls0352

mysql: [Warning] Using a password on the command line interface can be insecure.

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connectionid is 3Server version:5.7.20-log MySQL Community Server (GPL)

Copyright (c)2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type'help;' or '\h' for help. Type '\c' to clearthe current input statement.

mysql>start group_replication;

Query OK,0 rows affected (4.03sec)

mysql>

4、检查问题是否正常解决

select * fromreplication_group_members;+---------------------------+--------------------------------------+-------------+-------------+--------------+

| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |

| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |

| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

3 rows in set (0.00sec)

mysql> show global status like 'group_replication_primary_member';+----------------------------------+--------------------------------------+

| Variable_name | Value |

+----------------------------------+--------------------------------------+

| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |

+----------------------------------+--------------------------------------+

1 row in set (0.00 sec)

总论:之前的主结点在宕机之后、通过重启服务、重启mysql-group-replication成功的解决了问题。

四、模拟primary结点上的数据已经丢失的情况下、如果恢复结点:

1、退出服务、删除数据

[root@mtsl18 ~]# ps -ef | grepmysql

mysql10843 1 0 00:04 ? 00:00:19 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf

root13290 9197 0 00:50 pts/0 00:00:00 grep --color=auto mysql

[root@mtsl18~]# kill -9 10843[root@mtsl18~]# rm -rf /database/mysql/data/3306[root@mtsl18~]# ps -ef | grepmysql

root13339 9197 0 00:50 pts/0 00:00:00 grep --color=auto mysql

这个实验是接着情况一做下去的、所以primary在mtls18上、所以我们在mtls18上做退出服务、删除数据的动作

2、查看集群的状态:

mysql> select * fromreplication_group_members;+---------------------------+--------------------------------------+-------------+-------------+--------------+

| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |

| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

2 rows in set (0.00sec)

mysql> show global status like 'group_replication_primary_member';+----------------------------------+--------------------------------------+

| Variable_name | Value |

+----------------------------------+--------------------------------------+

| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |

+----------------------------------+--------------------------------------+

1 row in set (0.01 sec)

说明:当mtls18宕机后primary就从mtls18切到了mtls17上去了

3、通过meb备份mtls19用于还原宕机的mtls18

mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp \

--host=localhost --user=root --password=mtls0352 \

--backup-dir=/tmp/ --backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging \

backup-to-image

MySQL Enterprise Backup version4.1.0 Linux-2.6.39-400.215.10.el5uek-x86_64 [2017/03/01]

Copyright (c)2003, 2017, Oracle and/or its affiliates. All Rights Reserved.171201 01:01:36 MAIN INFO: A thread created with Id '140141436434240'

171201 01:01:36MAIN INFO: Starting with following command line ...

mysqlbackup--defaults-file=/etc/my.cnf --with-timestamp --host=localhost--user=root --password=xxxxxxxx --backup-dir=/tmp/

--backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging

backup-to-image171201 01:01:36MAIN INFO:171201 01:01:36 MAIN INFO: MySQL server version is '5.7.20-log'.......

........171201 01:01:40MAIN INFO: Full Image Backup operation completed successfully.171201 01:01:40MAIN INFO: Backup image created successfully.171201 01:01:40 MAIN INFO: Image Path = /tmp/2017-12-01T12:30:00.mbi171201 01:01:40 MAIN INFO: MySQL binlog position: filename mysql-bin.000002, position 1082

-------------------------------------------------------------Parameters Summary-------------------------------------------------------------Start LSN :2609664End LSN :2610075

-------------------------------------------------------------mysqlbackup completed OK!

4、传输备份到mtls18

scp /tmp/2017-12-01T12:30:00.mbi mtls18:/tmp/

5、还原备份

mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/tmp/2017-12-01T12:30:00.mbi \

> --backup-dir=/tmp/ --datadir=/database/mysql/data/3306/ \

> copy-back-and-apply-log

MySQL Enterprise Backup version4.1.0 Linux-2.6.39-400.215.10.el5uek-x86_64 [2017/03/01]

Copyright (c)2003, 2017, Oracle and/or its affiliates. All Rights Reserved.171201 01:09:59 MAIN INFO: A thread created with Id '140530650736448'

171201 01:09:59MAIN INFO: Starting with following command line ...

mysqlbackup--defaults-file=/etc/my.cnf--backup-image=/tmp/2017-12-01T12:30:00.mbi --backup-dir=/tmp/

--datadir=/database/mysql/data/3306/ copy-back-and-apply-log171201 01:09:59MAIN INFO:

IMPORTANT: Please check that mysqlbackup run completes successfully.

.....

.....171201 01:10:08 PCR1 INFO: The first data file is '/database/mysql/data/3306/ibdata1'and the new created log files are at'/database/mysql/data/3306/'

171201 01:10:08 MAIN INFO: MySQL server version is '5.7.20-log'

171201 01:10:08 MAIN INFO: Restoring ...5.7.20-log version171201 01:10:08 MAIN INFO: Apply-log operation completed successfully.171201 01:10:08MAIN INFO: Full Backup has been restored successfully.

mysqlbackup completed OK!

6、重启mtls18上的mysql

[root@mtsl18 tmp]# chown -R mysql:mysql /database/mysql/data/3306[root@mtsl18 tmp]# systemctl start mysql

[root@mtsl18 tmp]#ps -ef | grepmysql

mysql14205 1 24 01:11 ? 00:00:01 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf

root14237 9197 0 01:11 pts/0 00:00:00 grep --color=auto mysql

7、重启mysql group replication

mysql -uroot -pmtls0352

mysql:[Warning] Using a password onthe command line interface can be insecure.

Welcometo the MySQL monitor. Commands end with ; or\g.

Your MySQL connection idis 4Server version:5.7.20-logMySQL Community Server (GPL)

Copyright (c)2000, 2017, Oracle and/or its affiliates. Allrights reserved.

Oracleis a registered trademark of Oracle Corporation and/orits

affiliates. Other names may be trademarksoftheir respective

owners.

Type'help;' or '\h' for help. Type '\c' to clear the currentinput statement.

mysql>reset master;

Query OK,0 rows affected (0.10sec)

mysql>reset slave;

Query OK,0 rows affected (0.00sec)

mysql> set sql_log_bin=0;

Query OK,0 rows affected (0.00sec)

mysql> source /database/mysql/data/3306/backup_gtid_executed.sql ;

Query OK,0 rows affected (0.10sec)

mysql> set sql_log_bin=1;

Query OK,0 rows affected (0.00sec)

mysql> change master to

-> master_user='mgr_usr',

-> master_password='mgr10352'

-> for channel 'group_replication_recovery';

Query OK,0 rows affected, 2 warnings (0.21sec)

mysql>start group_replication;

Query OK,0 rows affected (3.46 sec)

8、检查集群的状态是否正常

mysql> select * fromreplication_group_members;+---------------------------+--------------------------------------+-------------+-------------+--------------+

| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |

| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |

| group_replication_applier | 85f82fce-d65e-11e7-9e92-1e1b3511358e | mtsl18 | 3306 | ONLINE |

+---------------------------+--------------------------------------+-------------+-------------+--------------+

3 rows in set (0.01sec)

mysql> show global status like 'group_replication_primary_member';+----------------------------------+--------------------------------------+

| Variable_name | Value |

+----------------------------------+--------------------------------------+

| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |

+----------------------------------+--------------------------------------+

1 row in set (0.01 sec)

五、总结:

对于两种primary宕故障的修复总结:

1、数据没有丢、binlog日志也没有丢 那直接重启mysql group replication 就行、它会自动修复问题。

2、数据丢失的情况、先备份还原-->重启mysql group replication 就行。

对于mysql group replication 维护操作复杂性的总结:

总的来说mysql group replication 对dba还是比较友好的、几个小小的操作就能恢复故障的集群。

六、我写的关于mysql group replication 的相关文章

2、mysql group replication 在mysql-5.7.20版本下的可用性报告:http://www.cnblogs.com/JiangLe/p/7809229.html

4、mysql group replication 多数据结点丢失情况下的恢复

5、我写的全自动化安装mysql-group-replication 开源工具 https://github.com/Neeky/mysqltools

----

已标记关键词 清除标记
表情包
插入表情
评论将由博主筛选后显示,对所有人可见 | 还能输入1000个字符
相关推荐
©️2020 CSDN 皮肤主题: 1024 设计师:白松林 返回首页