1、瀚高HAC集群重启后,所有节点显示Replica;无法启动主节点
[root@db ~]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7072987311974756506) -----+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------------------+---------+---------+----+-----------+
| hghaca | 192.168.80.111:5866 | Replica | running | | unknown |
| hghacb | 192.168.80.112:5866 | Replica | running | | unknown |
| hghacc | 192.168.80.113:5866 | Replica | running | | unknown |
+--------+---------------------+---------+---------+----+-----------+
2、停止所有节点HAC服务后,仅重启节点1(原主节点)Role也显示为Replica
[root@db ~]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7072987311974756506) -----+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------------------+---------+---------+----+-----------+
| hghaca | 192.168.80.111:5866 | Replica | running | | unknown |
| hghacb | 192.168.80.112:5866 | Replica | stopped | | unknown |
| hghacc | 192.168.80.113:5866 | Replica | stopped | | unknown |
+--------+---------------------+---------+---------+----+-----------+
3、停止所有节点的HAC服务,在节点1手动删除standby.signal文件,启动节点1的HAC服务后仍然不正常
[root@db ~]# systemctl stop hghac-vip
[root@db ~]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7072987311974756506) -----+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------------------+---------+---------+----+-----------+
| hghaca | 192.168.80.111:5866 | Replica | stopped | | unknown |
| hghacb | 192.168.80.112:5866 | Replica | stopped | | unknown |
| hghacc | 192.168.80.113:5866 | Replica | stopped | | unknown |
+--------+---------------------+---------+---------+----+-----------+
[root@db ~]# cd /db/hgdbdata/data/
[root@db data]# ls
audit_param.conf global pg_commit_ts pg_ident.conf pg_notify pg_stat pg_twophase postgresql.auto.conf postgresql.conf.backup
backup_label.old hgaudit pg_dynshmem pg_ident.conf.backup pg_replslot pg_stat_tmp PG_VERSION postgresql.base.conf postmaster.opts
base hgdb.lic pg_hba.conf pg_logical pg_serial pg_subtrans pg_wal postgresql.base.conf.backup secure_param.conf
current_logfiles patroni.dynamic.json pg_hba.conf.backup pg_multixact pg_snapshots pg_tblspc pg_xact postgresql.conf standby.signal
[root@db data]# rm -rf standby.signal
[root@db data]# systemctl start hghac-vip
[root@db data]# systemctl status hghac-vip
● hghac-vip.service - hghac
Loaded: loaded (/etc/systemd/system/hghac-vip.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2022-03-18 10:22:55 CST; 4s ago
Main PID: 21221 (hghac)
Tasks: 14
CGroup: /system.slice/hghac-vip.service
├─21221 /opt/HighGo/tools/hghac/hghac /opt/HighGo/tools/hghac/hghac.yaml
├─21227 /opt/HighGo/tools/hghac/hghac /opt/HighGo/tools/hghac/hghac.yaml
├─21241 /opt/HighGo4.5.7-see/bin/postgres -D /db/hgdbdata/data --config-file=/db/hgdbdata/data/postgresql.conf --listen_addresses=0.0.0.0 --port=5866 --cluster_name=ha --wal_l...
├─21243 postgres: ha: logger
├─21244 postgres: ha: auditwriter
├─21245 postgres: ha: startup recovering 00000019000000000000002D
├─21247 postgres: ha: checkpointer
├─21248 postgres: ha: background writer
├─21249 postgres: ha: stats collector
└─21250 postgres: ha: audit archiver or cleanup
Mar 18 10:22:55 db systemd[1]: Started hghac.
Mar 18 10:22:57 db hghac[21221]: 2022-03-18 10:22:57 CST [21241]: [1-1] 6233ed01.52f9 0 LOG: Password detection module is disabled
Mar 18 10:22:57 db hghac[21221]: 2022-03-18 10:22:57 CST [21241]: [2-1] 6233ed01.52f9 0 LOG: starting HighGo Security Enterprise Edition Database System 4.5.7 on CentOS...d on 20210804
Mar 18 10:22:57 db hghac[21221]: 2022-03-18 10:22:57 CST [21241]: [3-1] 6233ed01.52f9 0 LOG: listening on IPv4 address "0.0.0.0", port 5866
Mar 18 10:22:57 db hghac[21221]: 2022-03-18 10:22:57 CST [21241]: [4-1] 6233ed01.52f9 0 LOG: listening on Unix socket "/tmp/.s.PGSQL.5866"
Mar 18 10:22:57 db hghac[21221]: 2022-03-18 10:22:57 CST [21241]: [5-1] 6233ed01.52f9 0 LOG: redirecting log output to logging collector process
Mar 18 10:22:57 db hghac[21221]: 2022-03-18 10:22:57 CST [21241]: [6-1] 6233ed01.52f9 0 HINT: Future log output will appear in directory "../hgdb_log".
Mar 18 10:22:57 db hghac[21221]: localhost:5866 - accepting connections
Mar 18 10:22:57 db hghac[21221]: localhost:5866 - accepting connections
Mar 18 10:22:58 db hghac[21221]: localhost:5866 - accepting connections
Hint: Some lines were ellipsized, use -l to show in full.
[root@db data]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7072987311974756506) -----+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------------------+---------+---------+----+-----------+
| hghaca | 192.168.80.111:5866 | Replica | running | | unknown |
+--------+---------------------+---------+---------+----+-----------+
4、停止所有节点的HAC服务,在节点1手动删除standby.signal文件,启动节点1的数据库服务,数据库能正常启动,未生成standby.signal文件,表明数据库为正常读写模式
[root@db data]# systemctl stop hghac-vip
[root@db data]# pwd
/db/hgdbdata/data
[root@db data]# ls
audit_param.conf global pg_commit_ts pg_ident.conf pg_notify pg_stat pg_twophase postgresql.auto.conf postgresql.conf.backup
backup_label.old hgaudit pg_dynshmem pg_ident.conf.backup pg_replslot pg_stat_tmp PG_VERSION postgresql.base.conf postmaster.opts
base hgdb.lic pg_hba.conf pg_logical pg_serial pg_subtrans pg_wal postgresql.base.conf.backup secure_param.conf
current_logfiles patroni.dynamic.json pg_hba.conf.backup pg_multixact pg_snapshots pg_tblspc pg_xact postgresql.conf standby.signal
[root@db data]# rm -rf standby.signal
[root@db data]# pg_ctl start -D /db/hgdbdata/data/
waiting for server to start....2022-03-18 10:24:22 CST [21576]: [1-1] 6233ed56.5448 0 LOG: Password detection module is disabled
2022-03-18 10:24:22 CST [21576]: [2-1] 6233ed56.5448 0 LOG: starting HighGo Security Enterprise Edition Database System 4.5.7 on CentOS7 x86_64,build on 20210804
2022-03-18 10:24:22 CST [21576]: [3-1] 6233ed56.5448 0 LOG: listening on IPv4 address "0.0.0.0", port 5866
2022-03-18 10:24:22 CST [21576]: [4-1] 6233ed56.5448 0 LOG: listening on Unix socket "/tmp/.s.PGSQL.5866"
2022-03-18 10:24:22 CST [21576]: [5-1] 6233ed56.5448 0 LOG: redirecting log output to logging collector process
2022-03-18 10:24:22 CST [21576]: [6-1] 6233ed56.5448 0 HINT: Future log output will appear in directory "../hgdb_log".
done
server started
[root@db data]# ls
audit_param.conf global pg_commit_ts pg_ident.conf pg_notify pg_stat pg_twophase postgresql.auto.conf postgresql.conf.backup
backup_label.old hgaudit pg_dynshmem pg_ident.conf.backup pg_replslot pg_stat_tmp PG_VERSION postgresql.base.conf postmaster.opts
base hgdb.lic pg_hba.conf pg_logical pg_serial pg_subtrans pg_wal postgresql.base.conf.backup postmaster.pid
current_logfiles patroni.dynamic.json pg_hba.conf.backup pg_multixact pg_snapshots pg_tblspc pg_xact postgresql.conf secure_param.conf
[root@db data]# pg_ctl stop -D /db/hgdbdata/data/
waiting for server to shut down.... done
server stopped
[root@db data]#
5、启动节点1 HAC服务,检查数据库日志文件,报错如下
[root@db data]# systemctl start hghac-vip
[root@db data]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7072987311974756506) -----+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------------------+---------+---------+----+-----------+
| hghaca | 192.168.80.111:5866 | Replica | running | | unknown |
+--------+---------------------+---------+---------+----+-----------+
[root@db data]#
[root@db hgdb_log]# pwd
/db/hgdbdata/hgdb_log
[root@db hgdb_log]# tail -f hgdb-5.csv
2022-03-18 10:28:10.388 CST,"sysdba","highgo",22463,"127.0.0.1:14914",6233ee3a.57bf,2,"authentication",2022-03-18 10:28:10 CST,2/12,0,FATAL,28P01,"password authentication failed for user ""sysdba""","Password does not match for user ""sysdba"".
Connection matched pg_hba.conf line 4: ""host all all 0.0.0.0/0 sm3""",,,,,,,,""
2022-03-18 10:28:10.393 CST,,,22464,"127.0.0.1:14916",6233ee3a.57c0,1,"",2022-03-18 10:28:10 CST,,0,LOG,00000,"connection received: host=127.0.0.1 port=14916",,,,,,,,,""
2022-03-18 10:28:10.398 CST,"sysdba","highgo",22464,"127.0.0.1:14916",6233ee3a.57c0,2,"authentication",2022-03-18 10:28:10 CST,2/13,0,FATAL,28P01,"password authentication failed for user ""sysdba""","Password does not match for user ""sysdba"".
Connection matched pg_hba.conf line 4: ""host all all 0.0.0.0/0 sm3""",,,,,,,,""
2022-03-18 10:28:10.419 CST,,,22467,"127.0.0.1:14920",6233ee3a.57c3,1,"",2022-03-18 10:28:10 CST,,0,LOG,00000,"connection received: host=127.0.0.1 port=14920",,,,,,,,,""
2022-03-18 10:28:10.459 CST,,,22468,"127.0.0.1:14924",6233ee3a.57c4,1,"",2022-03-18 10:28:10 CST,,0,LOG,00000,"connection received: host=127.0.0.1 port=14924",,,,,,,,,""
2022-03-18 10:28:10.483 CST,"sysdba","highgo",22468,"127.0.0.1:14924",6233ee3a.57c4,2,"authentication",2022-03-18 10:28:10 CST,2/15,0,FATAL,28P01,"password authentication failed for user ""sysdba""","Password does not match for user ""sysdba"".
Connection matched pg_hba.conf line 4: ""host all all 0.0.0.0/0 sm3""",,,,,,,,""
2022-03-18 10:28:10.490 CST,,,22469,"127.0.0.1:14926",6233ee3a.57c5,1,"",2022-03-18 10:28:10 CST,,0,LOG,00000,"connection received: host=127.0.0.1 port=14926",,,,,,,,,""
2022-03-18 10:28:10.495 CST,"sysdba","highgo",22469,"127.0.0.1:14926",6233ee3a.57c5,2,"authentication",2022-03-18 10:28:10 CST,2/16,0,FATAL,28P01,"password authentication failed for user ""sysdba""","Password does not match for user ""sysdba"".
Connection matched pg_hba.conf line 4: ""host all all 0.0.0.0/0 sm3""",,,,,,,,""
经检查为hghac.yaml中密码与实际不符(因特殊需求前期更改过相关用户密码)
[root@db hghac]# pwd
/opt/HighGo/tools/hghac
[root@db hghac]# vi hghac.yaml
authentication:
replication:
password: High@123
username: sysdba
rewind:
password: High@123
username: sysdba
sysdba:
password: High@123
syssso:
password: High@123
syssao:
password: High@123
将hghac.yaml中的密码修改为正确的密码:
authentication:
replication:
password: High@789
username: sysdba
rewind:
password: High@789
username: sysdba
sysdba:
password: High@789
syssso:
password: High@789
syssao:
password: High@789
重启HAC服务,数据库Role显示为Leader:
[root@db data]# systemctl restart hghac-vip
[root@db data]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7072987311974756506) ----+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------------------+--------+---------+----+-----------+
| hghaca | 192.168.80.111:5866 | Leader | running | 25 | |
+--------+---------------------+--------+---------+----+-----------+
[root@db data]#
其他两个备节点也修改hghac.yaml文件为正确的密码,分别启动HAC服务,时间线发生变化:
[root@db data]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7072987311974756506) -----+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------------------+---------+---------+----+-----------+
| hghaca | 192.168.80.111:5866 | Leader | running | 26 | |
| hghacb | 192.168.80.112:5866 | Replica | running | 26 | 0 |
| hghacc | 192.168.80.113:5866 | Replica | running | 26 | 0 |
+--------+---------------------+---------+---------+----+-----------+
6、至此,集群恢复正常。