环境准备
三台Linux:CentOS Linux release 7.7.1908 (Core)
10.20.178.72
10.20.178.253
10.20.178.115
离线安装包:galera离线一键安装包
离线安装
每台Linux服务器皆要执行如下步骤
- 环境配置
- 关闭防火墙或放行三台服务之间网络访问
- 关闭seLinux
vi /etc/selinux/config
SELINUX=disabled
重启系统reboot
- 修改主机名
vim /etc/host
10.20.178.72 galera0
10.20.178.253 galera1
10.20.178.115 galera2
- 执行sh install-galera.sh
- 启动服务,修改root密码,创建同步账号密码
- 启动,通过日志确认root初始密码
[root@10-20-178-72 ~]# systemctl start mysqld
[root@10-20-178-72 ~]# grep password /var/log/mysqld.log
2020-11-12T11:52:41.875366Z 1 [Note] A temporary password is generated for root@localhost: ,0p*Cc6XxnFW
- 按需设置root密码
mysqladmin -u root -p',0p*Cc6XxnFW' password 'your-password'
*PS:如果修改密码失败提示:Your password does not satisfy the current policy requirements,说明密码强度不够。*需要去数据库将密码级别调整一下,再改:
mysql> SHOW VARIABLES LIKE 'validate_password%';
+--------------------------------------+--------+
| Variable_name | Value |
+--------------------------------------+--------+
| validate_password_check_user_name | OFF |
| validate_password_dictionary_file | |
| validate_password_length | 8 |
| validate_password_mixed_case_count | 1 |
| validate_password_number_count | 1 |
| validate_password_policy | MEDIUM |
| validate_password_special_char_count | 1 |
+--------------------------------------+--------+
7 rows in set (0.01 sec)
mysql> set global validate_password_policy=LOW;
Query OK, 0 rows affected (0.00 sec)
- 创建同步账号
create user galera identified by 'galeraPassword';
grant all on *.* to galera with grant option;
- 修改my.cnf配置
vim /etc/my.cnf
galera0:
server-id=0
binlog_format=row
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name='mysql_galera_cluster'
ep_cluster_address='gcomm://10.20.178.72,10.20.178.253,10.20.178.115'
wsrep_node_name='galera0'
wsrep_node_address='10.20.178.72'
wsrep_sst_auth=galera:'galeraPassword'
wsrep_sst_method=rsync
galera1:
server-id=1
binlog_format=row
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_cluster_name='mysql_galera_cluster'
ep_cluster_address='gcomm://10.20.178.72,10.20.178.253,10.20.178.115'
wsrep_node_name='galera1'
wsrep_node_address='10.20.178.253'
wsrep_sst_auth=galera:'galeraPassword'
wsrep_sst_method=rsync
galera2:
server-id=2
binlog_format=row
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_cluster_name='mysql_galera_cluster'
ep_cluster_address='gcomm://10.20.178.72,10.20.178.253,10.20.178.115'
wsrep_node_name='galera2'
wsrep_node_address='10.20.178.115'
wsrep_sst_auth=galera:'galeraPassword'
wsrep_sst_method=rsync
- 集群服务启动
- 停止所有mysql服务
systemctl stop mysqld
- 选择一台服务器,启动bootstrap节点
mysqld --user=mysql --wsrep-new-cluster --wsrep-cluster-address='gcomm://' &
- 依次启动其他mysql服务
systemctl start mysqld
- 确认是否启动成功:
ss -auntpl | grep -E '3306|4567'
tcp LISTEN 0 128 *:4567 *:* users:(("mysqld",pid=13324,fd=12))
tcp LISTEN 0 80 [::]:3306 [::]:* users:(("mysqld",pid=13324,fd=42))
- 重启bootstrap服务
systemctl restart mysqld
- 登陆任意一台mysql确认服务状态
mysql> show status like 'wsrep_incoming%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_incoming_addresses | 10.20.178.253:3306,10.20.178.115:3306,10.20.178.72:3306 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.01 sec)
测试:略
故障修复
- WSREP: Failed to read 'ready ’ from: wsrep_sst_rsync
原因:seLinux没关闭,导致无法通过rsync传递状态?
解决:
永久关闭:vi /etc/selinux/config
将SELINUX=enforcing改为SELINUX=disabled之后,reboot重启系统
临时关闭:setenforce 0
- 集群崩溃
原因:可能是网络长期断开或者集群大部分服务崩溃
解决:等网络恢复之后,停止所有运行中的服务,然后选择数据最完整(最后退出集群)的节点以Bootstrap启动。如何知道哪个节点才是最后退出集群的节点呢? - edit the grastate.dat file manually and set safe_to_bootstrap to 1
原因:当前启动节点不是最后退出集群的节点
解决:查找grastate.dat文件,找到safe_to_bootstrap为1的节点。如果确定某一台的数据可用来作为标准数据,来给集群其他机器同步,则直接将safe_to_bootstrap改为1,也行。启动命令mysqld --user=mysql --wsrep-new-cluster --wsrep-cluster-address='gcomm://' &
,然后再启动其他节点,等所有节点都启动完成,再通过服务重启bootstrap节点 - WSREP has not yet prepared node for application use
原因:可能出现了脑裂,或者集群节点数量不够,比如3节点集群只启动了一个节点
解决:启动其他服务,如果是脑裂,则可以按上面的集群恢复方案处理。