ORCH可视化高可用集群

1. ORCH介绍

Orchestrator(orch):go编写的MySQL高可用性和复制拓扑管理工具,支持复制拓扑结构的调整,自动故障转移和手动主从切换等。后端数据库用MySQL或SQLite存储元数据,并提供Web界面展示MySQL复制的拓扑关系及状态,通过Web可更改MySQL实例的复制关系和部分配置信息,同时也提供命令行和api接口,方便运维管理。相对比MHA来看最重要的是解决了管理节点的单点问题,其通过raft协议保证本身的高可用。GitHub的一部分管理也在用该工具进行管理。

功能

① 自动发现MySQL的复制拓扑,并且在web上展示。

② 重构复制关系,可以在web进行拖图来进行复制关系变更。

③ 检测主异常,并可以自动或手动恢复,通过Hooks进行自定义脚本。

④ 支持命令行和web界面管理复制。

2. ORCH部署规划

节点规划表(3306是ORCH后端数据库,8026是mysql主从架构)

IP地址主机名安装软件数据库端口
172.31.0.101Wl01orchestrator、mysql3306、8026
172.31.0.102Wl02orchestrator、mysql3306、8026
172.31.0.103Wl03orchestrator、mysql3306、8026

各软件版本

软件名版本下载地址
MySQL8.0.26https://downloads.mysql.com/archives/community/
Orchestratorversion: 3.2.6https://github.com/openark/orchestrator

mysql数据库目录规划

MySQL目录作用路径
basedir/usr/loca/mysql-8026
datadir/mysql-8026/8026/data/
errorlog/mysql-8026/8026/log/error.log
binlogdir/mysql-8026/8026/binlog/
relaylogdir/mysql-8026/8026/relaylog/
Tmpdir/mysql-8026/8026/tmp
pid/mysql-8026/8026/run/mysql-8026.pid
socket/mysql-8026/8026/run/mysql-8026.sock

orchestrator数据库目录规划

orchestrator目录作用路径
basedir/usr/loca/mysql-8026
datadir/mysql-8026/3306/data/
errorlog/mysql-8026/3306/log/error.log
binlogdir/mysql-8026/3306/binlog/
relaylogdir/mysql-8026/3306/relaylog/
Tmpdir/mysql-8026/3306/tmp
pid/mysql-8026/3306/run/mysql-8026.pid
socket/mysql-8026/3306/run/mysql-8026.sock

3. 环境准备

3.1 环境准备(所有节点)

#安装依赖软件
[root@wl01 ~]# yum install -y gcc gcc-c++ ncurses-devel.x86_64 libaio bison gcc-c++.x86_64 perl perl-devel libssl-dev autoconf openssl-devel openssl numactl wget *libncurses.so.5*

#配置环境变量
#将准备好的mysql-8.0.26二进制压缩包上传到/opt目录
[root@wl01 ~]# cd /opt/
[root@wl01 opt]# tar xf mysql-8.0.26-linux-glibc2.12-x86_64.tar.xz
[root@wl01 opt]# ln -s /opt/mysql-8.0.26-linux-glibc2.12-x86_64 /usr/local/mysql-8026
[root@wl01 opt]# ll /usr/local/mysql-8026
lrwxrwxrwx 1 root root 40 Jan 14 16:59 /usr/local/mysql-8026 -> /opt/mysql-8.0.26-linux-glibc2.12-x86_64
[root@wl01 opt]# echo "export PATH=/usr/local/mysql-8026/bin:$PATH">> /etc/profile
[root@wl01 opt]# source /etc/profile
[root@wl01 opt]# mysql -V
mysql  Ver 8.0.26 for Linux on x86_64 (MySQL Community Server - GPL)

#配置域名解析
[root@wl01 opt]# vim /etc/hosts
172.31.0.101 wl01
172.31.0.102 wl02
172.31.0.103 wl03

#配置互信
rm -rf /root/.ssh #清理旧的ssh密钥
ssh-keygen #生成新密钥,一路回车
ssh-copy-id root@wl01 #输入root@wl01的密码
ssh-copy-id root@wl02	#输入root@wl02的密码
ssh-copy-id root@wl03	#输入root@wl03的密码

#各节点互信验证
ssh root@wl01 date
ssh root@wl02 date
ssh root@wl03 date

#禁用防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service

#创建用户及目录
useradd mysql -M -s /sbin/nologin
mkdir -p /mysql-8026/8026/{binlog,relaylog,data,log,run,tmp}
mkdir -p /mysql-8026/3306/{binlog,relaylog,data,log,run,tmp}
chown -R mysql.mysql /mysql-8026

3.2 创建所有实例

`注意不同的节点修改对应server_id,report_host,report_port`
#编辑配置文件(以orch库为例)
#配置mysql库时,将配置文件中的端口号批量替换掉 sed -i 's/3306/8026/g' /mysql-8026/8026/my.cnf
vim /mysql-8026/3306/my.cnf 
[mysql]
no-auto-rehash
max_allowed_packet=128M
prompt="\u@\h \R:\m:\s[\d]> "
default_character_set=utf8mb4
socket=/mysql-8026/3306/run/mysql.sock

[mysqldump]
quick
max_allowed_packet=128M
socket=/mysql-8026/3306/run/mysql.sock
[mysqladmin]
socket=/mysql-8026/3306/run/mysql.sock
[mysqld]
user=mysql
port=3306
report_host='172.31.0.101'
report_host=3306
server-id=1013306   # ip末尾+端口号
default-time_zone='+8:00'
log_timestamps=SYSTEM
datadir=/mysql-8026/3306/data
basedir=/usr/local/mysql-8026
tmpdir=/mysql-8026/3306/tmp
socket=/mysql-8026/3306/run/mysql.sock
pid-file=/mysql-8026/3306/run/mysql.pid
character-set-server=utf8mb4

##redolog
innodb_log_file_size=2G
innodb_log_buffer_size=16M
innodb_log_files_in_group=2
innodb_log_group_home_dir=/mysql-8026/3306/data
##undolog
innodb_undo_directory=/mysql-8026/3306/data
innodb_max_undo_log_size=2G
innodb_undo_log_truncate=on
#innodb_undo_tablespaces=4 #8.0.14 已删除,可以使用 SQL 创建额外的撤消表空间

##binlog
binlog_format=row
log-bin=/mysql-8026/3306/binlog/mysql-bin
max_binlog_size=1G
binlog_cache_size=1M
sync_binlog=1

##relaylog
relay-log=/mysql-8026/3306/relaylog/mysql-relay
relay-log-purge=on
relay-log-recovery=on
##general log
#general_log=on
#general_log_file=/mysql-8026/3306/log/general.log
##error log
log-error=/mysql-8026/3306/log/error.log

##slow log
long_query_time=1
slow-query-log=on
slow-query-log-file=/mysql-8026/3306/log/slow.log

##connection
skip-external-locking
skip-name-resolve
max_connections=4000
max_user_connections=2500
max_connect_errors=10000
wait_timeout=7200
interactive_timeout=7200
connect_timeout=20
max_allowed_packet=512M

##gtid
gtid_mode=on
enforce_gtid_consistency=1
#log_slave_updates=1 
log_replica_updates=1 #8.0

##parallel replication mysql>5.7.22
# master
loose-binlog_transaction_dependency_tracking=WRITESET
#loose-transaction_write_set_extraction=XXHASH64 #8.0之前
binlog_transaction_dependency_history_size=25000 #默认
# slave
#slave-parallel-type=LOGICAL_CLOCK
replica_parallel_type=LOGICAL_CLOCK #8.0
#slave-parallel-workers=4 #8.0以前
replica_parallel_workers=4 #8.0
#master_info_repository=TABLE #8.0以前
#relay_log_info_repository=TABLE #8.0以前

##memory size
key_buffer_size=2M
table_open_cache=2048
table_definition_cache=4096
sort_buffer_size=2M
read_buffer_size=2M
read_rnd_buffer_size=2M
join_buffer_size=2M
myisam_sort_buffer_size=2M
tmp_table_size=64M
max_heap_table_size=64M
##lock and transaction
transaction_isolation=READ-COMMITTED
innodb_lock_wait_timeout=30
lock_wait_timeout=3600
##InnoDB
innodb_data_home_dir=/mysql-8026/3306/data
innodb_data_file_path=ibdata1:1G:autoextend
innodb_buffer_pool_size=1G
innodb_buffer_pool_instances=2
innodb_flush_log_at_trx_commit=1
innodb_max_dirty_pages_pct=75
innodb_flush_method=O_DIRECT
innodb_file_per_table=1
innodb_read_io_threads=16
innodb_write_io_threads=16
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_purge_threads=2

#初始化启动
mysqld --defaults-file=/mysql-8026/3306/my.cnf --initialize-insecure
mysqld_safe --defaults-file=/mysql-8026/3306/my.cnf &
mysql -S /mysql-8026/3306/run/mysql.sock

mysqld --defaults-file=/mysql-8026/8026/my.cnf --initialize-insecure
mysqld_safe --defaults-file=/mysql-8026/8026/my.cnf &
mysql -S /mysql-8026/8026/run/mysql.sock

3.3 配置所有实例

`(1)获取软件(所有节点)
wget https://github.com/openark/orchestrator/releases/download/v3.2.6/orchestrator-3.2.6-1.x86_64.rpm
wget https://github.com/openark/orchestrator/releases/download/v3.2.6/orchestrator-cli-3.2.6-1.x86_64.rpm

`(2)安装软件(所有节点)
ll /opt/orch*
-rw-r--r-- 1 root root 10970627 May 11 13:37 /opt/orchestrator-3.2.6-1.x86_64.rpm
-rw-r--r-- 1 root root 10543813 May 11 13:39 /opt/orchestrator-cli-3.2.6-1.x86_64.rpm
yum localinstall -y orchestrator-*
#如果遇到报错(Error: Package: Requires: ****  jq >= 1.5 **),请按如下方式解决
wget http://www6.atomicorp.com/channels/atomic/centos/7/x86_64/RPMS/oniguruma-5.9.5-3.el7.art.x86_64.rpm
yum install -y oniguruma-5.9.5-3.el7.art.x86_64.rpm 
wget http://www6.atomicorp.com/channels/atomic/centos/7/x86_64/RPMS/jq-1.5-1.el7.art.x86_64.rpm
yum install -y jq-1.5-1.el7.art.x86_64.rpm

`(3)配置orch数据库及用户(所有3306实例)
mysql -S /mysql-8026/3306/run/mysql.sock
CREATE DATABASE IF NOT EXISTS orchdb; 
CREATE USER 'orchuser'@'127.0.0.1' IDENTIFIED BY '123456'; 
GRANT ALL ON orchdb.* TO 'orchuser'@'127.0.0.1';

`(4)被管理节点配置主从关系(所有8026实例)
# 主库创建复制专用用户
mysql -S /mysql-8026/8026/run/mysql.sock
create user 'repl'@'172.31.0.%' IDENTIFIED WITH mysql_native_password BY '123456';
GRANT replication slave on *.* TO 'repl'@'172.31.0.%'; 
flush privileges;
# 从库 
#这里需要注意的是,orch检测主库宕机依赖从库的IO线程(本身连不上主库后,还会通过从库再去检测主库是否异常),所以默认change搭建的主从感知主库宕机的等待时间过长,需要需要稍微改下:
mysql -S /mysql-8026/8026/run/mysql.sock
reset master;
change master to 
master_host='172.31.0.102',
master_port=8026,
master_user='repl',
master_password='123456',
master_auto_position=1,
MASTER_HEARTBEAT_PERIOD=2,
MASTER_CONNECT_RETRY=1,
MASTER_RETRY_COUNT=86400; 
start slave; 
set global slave_net_timeout=8; 
set global read_only=1; 
set global super_read_only=1; 
#说明: 
slave_net_timeout(全局变量):MySQL5.7.7之后,默认改成60秒。该参数定义了从库从主库获取数据等待的秒数,超过这个时间从库会主动退出读取,中断连接,并尝试重连。
master_heartbeat_period:复制心跳的周期。默认是slave_net_timeout的一半。Master在没有数据的时候,每master_heartbeat_period秒发送一个心跳包,这样 Slave 就能知道 Master 是不是还正常。
slave_net_timeout:是设置在多久没收到数据后认为网络超时,之后 Slave 的 IO 线程会重新连接 Master 。结合这两个设置就可以避免由于网络问题导致的复制延误。master_heartbeat_period 单位是秒,可以是个带上小数,如 10.5,最高精度为 1 毫秒。

`(5)被管理MySQL数据库的用户权限(主库172.31.0.101 8026节点)
CREATE USER 'orchctl'@'%' IDENTIFIED WITH mysql_native_password BY '123456'; 
GRANT SUPER, PROCESS, REPLICATION SLAVE,REPLICATION CLIENT, RELOAD ON *.* TO 'orchctl'@'%'; 
GRANT SELECT ON mysql.slave_master_info TO 'orchctl'@'%'; 


4. 准备配置文件和脚本

4.1 修改ORCH配置文件

cp /usr/local/orchestrator/orchestrator-sample.conf.json /etc/orchestrator.conf.json
vim /etc/orchestrator.conf.json
{
  "Debug": true,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "orchctl",
  "MySQLTopologyPassword": "123456",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "BackendDB": "mysql",
  "MySQLOrchestratorHost": "127.0.0.1",
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "orchdb",
  "MySQLOrchestratorUser": "orchuser",
  "MySQLOrchestratorPassword": "123456",
  "MySQLConnectTimeoutSeconds": 1,
  "DefaultInstancePort": 3306,
  "DiscoverByShowSlaveHosts": true,
  "InstancePollSeconds": 5,
  "DiscoveryIgnoreReplicaHostnameFilters": [
    "a_host_i_want_to_ignore[.]example[.]com",
    ".*[.]ignore_all_hosts_from_this_domain[.]example[.]com",
    "a_host_with_extra_port_i_want_to_ignore[.]example[.]com:3307"
  ],
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 60,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
  "ReadOnly": false,
  "AuthenticationMethod": "",
  "HTTPAuthUser": "",
  "HTTPAuthPassword": "",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "test suite"
  },
  "ReplicationLagQuery": "",
  "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
  "DetectClusterDomainQuery": "",
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
  "PromotionIgnoreHostnameFilters": [],
  "DetectSemiSyncEnforcedQuery": "",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 5,
  "RecoveryPeriodBlockSeconds": 30,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": [
    "*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "*"
  ],
  "OnFailureDetectionProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreGracefulTakeoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` (for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}; failureClusterAlias:{failureClusterAlias}' >> /tmp/recovery.log",
    "/usr/local/orchestrator/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Unsuccessful Failover ' >> /tmp/recovery.log"],
  "PostMasterFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostGracefulTakeoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Planned takeover complete' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PreventCrossDataCenterMasterFailover": false,
  "PreventCrossRegionMasterFailover": false,
  "MasterFailoverDetachReplicaMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeReplicaRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true,
  "ConsulAddress": "",
  "ConsulAclToken": "",

  "RaftEnabled":true,
  "RaftDataDir":"/usr/local/orchestrator",
  "RaftBind":"172.31.0.101",
  "DefaultRaftPort":10008,
  "RaftNodes":[
    "172.31.0.101",
    "172.31.0.102",
    "172.31.0.103"
  ]
}

4.2 修改orch_hook.sh

vi /usr/local/orchestrator/orch_hook.sh
#!/bin/bash

isitdead=$1
cluster=$2
oldmaster=$3
newmaster=$4
mysqluser="orchctl"

logfile="/usr/local/orchestrator/orch_hook.log"

# list of clusternames
#clusternames=(rep blea lajos)

# clustername=( interface IP user Inter_IP)
#rep=( ens32 "192.168.56.121" root "192.168.56.125")

if [[ $isitdead == "DeadMaster" ]]; then

	array=( eth0 "172.31.0.188" root "172.31.0.101")
	interface=${array[0]}
	IP=${array[1]}
	user=${array[2]}

	if [ ! -z ${IP} ] ; then

		echo $(date)
		echo "Revocering from: $isitdead"
		echo "New master is: $newmaster"
		echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
		/usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster
		#mysql -h$newmaster -u$mysqluser < /usr/local/bin/orch_event.sql
	else

		echo "Cluster does not exist!" | tee $logfile

	fi
elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then

	array=( eth0 "172.31.0.188" root "172.31.0.101")
	interface=${array[0]}
	IP=${array[3]}
	user=${array[2]}
	slavehost=`echo $5 | cut -d":" -f1`

	echo $(date)
	echo "Revocering from: $isitdead"
	echo "New intermediate master is: $slavehost"
	echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
	/usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster


elif [[ $isitdead == "DeadIntermediateMaster" ]]; then

        array=( eth0 "172.31.0.188" root "172.31.0.101")
        interface=${array[0]}
        IP=${array[3]}
        user=${array[2]}
	slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`
	showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`
	newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`

	echo $(date)
	echo "Revocering from: $isitdead"
	echo "New intermediate master is: $newintermediatemaster"
	echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
	/usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster

fi

4.3 修改vip脚本

vi /usr/local/orchestrator/orch_vip.sh
#!/bin/bash

emailaddress="1103290832@qq.com"
sendmail=1

function usage {
  cat << EOF
 usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]
 
 OPTIONS:
    -h        Show this message
    -o string Old master hostname or IP address 
    -d int    If master is dead should be 1 otherweise it is 0
    -s string SSH options
    -n string New master hostname or IP address
    -i string Interface exmple eth0:1
    -I string Virtual IP
    -u string SSH user
EOF

}

while getopts ho:d:s:n:i:I:u: flag; do
  case $flag in
    o)
      orig_master="$OPTARG";
      ;;
    d)
      isitdead="${OPTARG}";
      ;;
    s)
      ssh_options="${OPTARG}";
      ;;
    n)
      new_master="$OPTARG";
      ;;
    i)
      interface="$OPTARG";
      ;;
    I)
      vip="$OPTARG";
      ;;
    u)
      ssh_user="$OPTARG";
      ;;
    h)
      usage;
      exit 0;
      ;;
    *)
      usage;
      exit 1;
      ;;
  esac
done


if [ $OPTIND -eq 1 ]; then 
    echo "No options were passed"; 
    usage;
fi

shift $(( OPTIND - 1 ));

# discover commands from our path
ssh=$(which ssh)
arping=$(which arping)
ip2util=$(which ip)

# command for adding our vip
cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"
# command for deleting our vip
cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"
# command for discovering if our vip is enabled
cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"
# command for sending gratuitous arp to announce ip move
cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "
# command for sending gratuitous arp to announce ip move on current server
cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "

vip_stop() {
    rc=0

    # ensure the vip is removed
    $ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \
    "[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    return $rc
}

vip_start() {
    rc=0

    # ensure the vip is added
    # this command should exit with failure if we are unable to add the vip
    # if the vip already exists always exit 0 (whether or not we added it)
    $ssh ${ssh_options} -tt ${ssh_user}@${new_master} \
     "[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    $cmd_local_arp_fix
    return $rc
}

vip_status() {
    $arping -c 1 -I ${interface} ${vip%/*}   
    if ping -c 1 -W 1 "$vip"; then
        return 0
    else
        return 1
    fi
}

if [[ $isitdead == 0 ]]; then
    echo "Online failover"
    if vip_stop; then 
        if vip_start; then
            echo "$vip is moved to $new_master."
            if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            echo "Can't add $vip on $new_master!" 
            if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    else
        echo $rc
        echo "Can't remove the $vip from orig_master!"
        if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
        exit 1
    fi


elif [[ $isitdead == 1 ]]; then
    echo "Master is dead, failover"
    # make sure the vip is not available 
    if vip_status; then 
        if vip_stop; then
            if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    fi

    if vip_start; then
          echo "$vip is moved to $new_master."
          if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi

    else
          echo "Can't add $vip on $new_master!" 
          if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
          exit 1
    fi
else
    echo "Wrong argument, the master is dead or live?"

fi
#将修改好的配置文件和脚本发送到wl02和dwl03
scp /etc/orchestrator.conf.json wl02:/etc/
修改配置文件该处为db02的ip地址
"RaftBind": "172.31.0.102",

scp /etc/orchestrator.conf.json wl03:/etc/
修改配置文件该处为db03的ip地址
"RaftBind": "172.31.0.103",

scp /usr/local/orchestrator/orch_hook.sh wl02:/usr/local/orchestrator/
scp /usr/local/orchestrator/orch_hook.sh wl03:/usr/local/orchestrator/

scp /usr/local/orchestrator/orch_vip.sh wl02:/usr/local/orchestrator/
scp /usr/local/orchestrator/orch_vip.sh wl03:/usr/local/orchestrator/

chmod 777 /usr/local/orchestrator/orch_hook.sh
chmod 777 /usr/local/orchestrator/orch_vip.sh

4.4 master节点创建vip

仅在master节点上创建VIP
# 添加VIP ip addr add 172.31.0.188 dev eth0
# 删除VIP ip addr del 172.31.0.188 dev eth0

5. 启动与操作

5.1 启动ORCH

#所有节点都启动ORCH
cd /usr/local/orchestrator && nohup ./orchestrator --config=/etc/orchestrator.conf.json http &

5.2 命令行控制操作

#列出所有集群
/usr/local/orchestrator/resources/bin/orchestrator-client -c clusters

#打印指定集群的拓扑关系
/usr/local/orchestrator/resources/bin/orchestrator-client -c topology -i wl01:8026
wl01:8026 (wl01)   [0s,ok,8.0.26,rw,ROW,>>,GTID]
+ wl02:8026 (wl02) [0s,ok,8.0.26,ro,ROW,>>,GTID]
+ wl03:8026 (wl03) [0s,ok,8.0.26,ro,ROW,>>,GTID]

#查看使用哪个API
#因为配置了Raft,有多个Orchestrator,所以需要ORCHESTRATOR_API的环境变量,orchestrator-client会自动选择leader
export ORCHESTRATOR_API="wl01:3000/api wl02:3000/api wl03:3000/api"
/usr/local/orchestrator/resources/bin/orchestrator-client -c which-api
wl02:3000/api

#忘记指定实例
/usr/local/orchestrator/resources/bin/orchestrator-client -c forget -i wl01:8026

#忘记指定集群
/usr/local/orchestrator/resources/bin/orchestrator-client -c forget-cluster -i wl01:8026

#打印指定实例的主库
/usr/local/orchestrator/resources/bin/orchestrator-client -c which-master -i wl01:8026

#打印指定实例的从库
/usr/local/orchestrator/resources/bin/orchestrator-client -c which-replicas -i wl01:8026

6. 图形界面管理操作

登录web管理界面

集群中任意一个节点的ip:3000

(1)点击Clushter中的Discover,输入所有的被管理mysql数据库ip与端口号
在这里插入图片描述

(2)点击Clushter中的Dashboard,查看被发现的集群
在这里插入图片描述

(3)点击集群名,查看集群拓扑图
在这里插入图片描述

(4)点击(3)中红框图标可查看节点具体信息
在这里插入图片描述

7. 验证故障转移

#主库关机,观察现象
[root@wl01 orchestrator]# mysql -S /mysql-8026/8026/run/mysql.sock
mysql> shutdown;

#查看当前leader节点,在leader节点查看故障漂移日志
[root@wl01 orchestrator]# /usr/local/orchestrator/resources/bin/orchestrator-client -c which-api
wl03:3000/api
[root@wl03 orchestrator]# tail -f orch_hook.log
/usr/local/orchestrator/orch_vip.sh -d 1 -n wl02 -i eth0 -I 172.31.0.188 -u root -o wl01  #vip由wl01漂移至wl02


#vip成功漂到wl02节点
[root@wl01 orchestrator]# ip a | grep "172.31.0.188"
[root@wl02 orchestrator]# ip a | grep "172.31.0.188"
inet 172.31.0.188/32 scope global eth0

web界面查看拓扑关系,wl01节点已经脱离集群,wl02与wl03重新构建了主从关系
在这里插入图片描述

8. 验证raft的高可用性

#(1)修复mysql数据库集群的高可用
#(2)查看当前raft的leader节点
[root@wl02 orchestrator]# /usr/local/orchestrator/resources/bin/orchestrator-client -c which-api
wl01:3000/api

在这里插入图片描述

#(3)关闭wl01节点的orch后端数据库,orch服务也会跟着停掉
[root@wl01 orchestrator]# mysql -S /mysql-8026/3306/run/mysql.sock
mysql> shutdown;
Query OK, 0 rows affected (0.00 sec)
mysql> exit
Bye
[1]   Done                    mysqld_safe --defaults-file=/mysql-8026/3306/my.cnf
[5]-  Exit 1                  cd /usr/local/orchestrator && nohup ./orchestrator --config=/etc/orchestrator.conf.json http
[root@wl01 orchestrator]# ps -ef | grep orch
root     23134 14495  0 10:28 pts/2    00:00:00 grep --color=auto orch


#(4)查看其它节点的orch日志,可以看到,wl02被选为新的leder
[root@wl03 orchestrator]# tail -f /usr/local/orchestrator/nohup.out
2022-01-18 10:27:38 DEBUG raft leader is 172.31.0.101:10008; state: Follower
2022/01/18 10:27:39 [WARN] raft: Rejecting vote request from 172.31.0.102:10008 since we have a leader: 172.31.0.101:10008
2022/01/18 10:27:39 [DEBUG] raft: Node 172.31.0.103:10008 updated peer set (2): [172.31.0.102:10008 172.31.0.101:10008 172.31.0.103:10008]
2022-01-18 10:27:39 DEBUG orchestrator/raft: applying command 6871: leader-uri
2022/01/18 10:27:39 [DEBUG] raft-net: 172.31.0.103:10008 accepted connection from: 172.31.0.102:38934
2022-01-18 10:27:43 DEBUG raft leader is 172.31.0.102:10008; state: Follower
2022-01-18 10:27:46 DEBUG orchestrator/raft: applying command 6872: request-health-report
2022-01-18 10:27:48 DEBUG raft leader is 172.31.0.102:10008; state: Follower
2022-01-18 10:27:53 INFO auditType:forget-clustr-aliases instance::0 cluster: message:Forgotten aliases: 0
2022-01-18 10:27:53 INFO auditType:review-unseen-instances instance::0 cluster: message:Operations: 0
2022-01-18 10:27:53 INFO auditType:forget-unseen instance::0 cluster: message:Forgotten instances: 0
2022-01-18 10:27:53 INFO auditType:resolve-unknown-masters instance::0 cluster: message:Num resolved hostnames: 0
2022-01-18 10:27:53 INFO auditType:inject-unseen-masters instance::0 cluster: message:Operations: 0
2022-01-18 10:27:53 INFO auditType:forget-unseen-differently-resolved instance::0 cluster: message:Forgotten instances: 0
2022-01-18 10:27:53 DEBUG raft leader is 172.31.0.102:10008; state: Follower

#(5)关闭wl02的mysql数据库服务,再次验证故障漂移
[root@wl02 orchestrator]# mysql -S /mysql-8026/8026/run/mysql.sock
mysql> shutdown;
Query OK, 0 rows affected (0.00 sec)
mysql> exit
Bye
[4]   Done                    mysqld_safe --defaults-file=/mysql-8026/8026/my.cnf

#(6)查看现象,vip漂移到wl03
[root@wl02 orchestrator]# ip a | grep "172.31.0.188"
[root@wl03 orchestrator]# ip a | grep "172.31.0.188"
inet 172.31.0.188/32 scope global eth0

#(7)通过wl01的ip地址登录的web界面失去连接

在这里插入图片描述

#(8)使用wl02的ip重新登录web界面
可以看到wl02节点已经脱离集群,wl03与wl01重新构建了主从关系

在这里插入图片描述

[外链图片转存中...(img-trbwAWL0-1642477908249)]

  • 3
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值