对 gprecoverseg 命令进行了 一次实际操作测试,模拟有一个主实例 down的 情况下如何进行恢复操作
1、查询当前数据库节点状态信息
dp_bidb=# select * from gp_segment_configuration order by 8,2;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
1 | -1 | p | p | s | u | 5432 | mdw | mdw | |
2 | 0 | p | p | s | u | 40000 | sdw1 | sdw1 | 41000 |
3 | 1 | p | p | s | u | 40001 | sdw1 | sdw1 | 41001 |
4 | 2 | p | p | s | u | 40002 | sdw1 | sdw1 | 41002 |
5 | 3 | p | p | s | u | 40003 | sdw1 | sdw1 | 41003 |
30 | 12 | m | m | s | u | 50000 | sdw1 | sdw1 | 51000 |
31 | 13 | m | m | s | u | 50001 | sdw1 | sdw1 | 51001 |
32 | 14 | m | m | s | u | 50002 | sdw1 | sdw1 | 51002 |
33 | 15 | m | m | s | u | 50003 | sdw1 | sdw1 | 51003 |
18 | 0 | m | m | s | u | 50000 | sdw2 | sdw2 | 51000 |
19 | 1 | m | m | s | u | 50001 | sdw2 | sdw2 | 51001 |
20 | 2 | m | m | s | u | 50002 | sdw2 | sdw2 | 51002 |
21 | 3 | m | m | s | u | 50003 | sdw2 | sdw2 | 51003 |
6 | 4 | p | p | s | u | 40000 | sdw2 | sdw2 | 41000 |
7 | 5 | p | p | s | u | 40001 | sdw2 | sdw2 | 41001 |
8 | 6 | p | p | s | u | 40002 | sdw2 | sdw2 | 41002 |
9 | 7 | p | p | s | u | 40003 | sdw2 | sdw2 | 41003 |
22 | 4 | m | m | s | u | 50000 | sdw3 | sdw3 | 51000 |
23 | 5 | m | m | s | u | 50001 | sdw3 | sdw3 | 51001 |
24 | 6 | m | m | s | u | 50002 | sdw3 | sdw3 | 51002 |
25 | 7 | m | m | s | u | 50003 | sdw3 | sdw3 | 51003 |
10 | 8 | p | p | s | u | 40000 | sdw3 | sdw3 | 41000 |
11 | 9 | p | p | s | u | 40001 | sdw3 | sdw3 | 41001 |
12 | 10 | p | p | s | u | 40002 | sdw3 | sdw3 | 41002 |
13 | 11 | p | p | s | u | 40003 | sdw3 | sdw3 | 41003 |
26 | 8 | m | m | s | u | 50000 | sdw4 | sdw4 | 51000 |
27 | 9 | m | m | s | u | 50001 | sdw4 | sdw4 | 51001 |
28 | 10 | m | m | s | u | 50002 | sdw4 | sdw4 | 51002 |
29 | 11 | m | m | s | u | 50003 | sdw4 | sdw4 | 51003 |
14 | 12 | p | p | s | u | 40000 | sdw4 | sdw4 | 41000 |
15 | 13 | p | p | s | u | 40001 | sdw4 | sdw4 | 41001 |
16 | 14 | p | p | s | u | 40002 | sdw4 | sdw4 | 41002 |
17 | 15 | p | p | s | u | 40003 | sdw4 | sdw4 | 41003 |
34 | -1 | m | m | s | u | 5432 | smdw | smdw | |
2、登录节点1杀掉一个主实例
[gpadmin@mdw ~]$ ssh sdw1
[gpadmin@sdw1 ~]$ ps -ef | grep 40000
gpadmin 9466 1 0 09:47 ? 00:00:00 /usr/local/greenplum-db-4.3.7.3/bin/postgres -D /data1/primary/gpseg0 -p 40000 -b 2 -z 16 --silent-mode=true -i -M quiescent -C 0
gpadmin 9471 9466 0 09:47 ? 00:00:00 postgres: port 40000, logger process
gpadmin 10007 9466 0 09:51 ? 00:00:00 postgres: port 40000, stats collector process
gpadmin 10008 9466 0 09:51 ? 00:00:00 postgres: port 40000, writer process
gpadmin 10009 9466 0 09:51 ? 00:00:00 postgres: port 40000, checkpoint process
gpadmin 10010 9466 0 09:51 ? 00:00:00 postgres: port 40000, sweeper process
gpadmin 10011 9466 0 09:51 ? 00:00:00 postgres: port 40000, stats sender process
gpadmin 10301 9466 0 09:51 ? 00:00:00 postgres: port 40000, primary process
gpadmin 10302 10301 0 09:51 ? 00:00:00 postgres: port 40000, primary receiver ack process
gpadmin 10303 10301 0 09:51 ? 00:00:01 postgres: port 40000, primary sender process
gpadmin 10304 10301 0 09:51 ? 00:00:00 postgres: port 40000, primary consumer ack process
gpadmin 10305 10301 0 09:51 ? 00:00:00 postgres: port 40000, primary recovery process
gpadmin 10306 10301 0 09:51 ? 00:00:00 postgres: port 40000, primary verification process
gpadmin 10569 8524 0 09:59 pts/0 00:00:00 grep 40000
[gpadmin@sdw1 ~]$ kill -9 9466
查看当前节点状态信息
dp_bidb=# select * from gp_segment_configuration order by 8,2;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
1 | -1 | p | p | s | u | 5432 | mdw | mdw | |
2 | 0 | m | p | s | d | 40000 | sdw1 | sdw1 | 41000 |
3 | 1 | p | p | s | u | 40001 | sdw1 | sdw1 | 41001 |
4 | 2 | p | p | s | u | 40002 | sdw1 | sdw1 | 41002 |
5 | 3 | p | p | s | u | 40003 | sdw1 | sdw1 | 41003 |
30 | 12 | m | m | s | u | 50000 | sdw1 | sdw1 | 51000 |
31 | 13 | m | m | s | u | 50001 | sdw1 | sdw1 | 51001 |
32 | 14 | m | m | s | u | 50002 | sdw1 | sdw1 | 51002 |
33 | 15 | m | m | s | u | 50003 | sdw1 | sdw1 | 51003 |
18 | 0 | p | m | c | u | 50000 | sdw2 | sdw2 | 51000 |
19 | 1 | m | m | s | u | 50001 | sdw2 | sdw2 | 51001 |
20 | 2 | m | m | s | u | 50002 | sdw2 | sdw2 | 51002 |
21 | 3 | m | m | s | u | 50003 | sdw2 | sdw2 | 51003 |
6 | 4 | p | p | s | u | 40000 | sdw2 | sdw2 | 41000 |
7 | 5 | p | p | s | u | 40001 | sdw2 | sdw2 | 41001 |
8 | 6 | p | p | s | u | 40002 | sdw2 | sdw2 | 41002 |
9 | 7 | p | p | s | u | 40003 | sdw2 | sdw2 | 41003 |
22 | 4 | m | m | s | u | 50000 | sdw3 | sdw3 | 51000 |
23 | 5 | m | m | s | u | 50001 | sdw3 | sdw3 | 51001 |
24 | 6 | m | m | s | u | 50002 | sdw3 | sdw3 | 51002 |
25 | 7 | m | m | s | u | 50003 | sdw3 | sdw3 | 51003 |
10 | 8 | p | p | s | u | 40000 | sdw3 | sdw3 | 41000 |
11 | 9 | p | p | s | u | 40001 | sdw3 | sdw3 | 41001 |
12 | 10 | p | p | s | u | 40002 | sdw3 | sdw3 | 41002 |
13 | 11 | p | p | s | u | 40003 | sdw3 | sdw3 | 41003 |
26 | 8 | m | m | s | u | 50000 | sdw4 | sdw4 | 51000 |
27 | 9 | m | m | s | u | 50001 | sdw4 | sdw4 | 51001 |
28 | 10 | m | m | s | u | 50002 | sdw4 | sdw4 | 51002 |
29 | 11 | m | m | s | u | 50003 | sdw4 | sdw4 | 51003 |
14 | 12 | p | p | s | u | 40000 | sdw4 | sdw4 | 41000 |
15 | 13 | p | p | s | u | 40001 | sdw4 | sdw4 | 41001 |
16 | 14 | p | p | s | u | 40002 | sdw4 | sdw4 | 41002 |
17 | 15 | p | p | s | u | 40003 | sdw4 | sdw4 | 41003 |
34 | -1 | m | m | s | u | 5432 | smdw | smdw | |
查看日志信息
dp_bidb=# select * from gp_configuration_history order by 1 desc limit 10;
time | dbid | desc
-------------------------------+------+-----------------------------------------------------------------------------------------------
2016-08-16 02:30:39.385916+08 | 18 | FTS: content 0 fault marking status UP mode: change-tracking role p
2016-08-16 02:30:39.385775+08 | 2 | FTS: content 0 fault marking status DOWN role m
主备实例发生了角色切换,并把当前主实例置为change-tracking
3、节点1上杀掉一个备实例
[gpadmin@sdw1 ~]$ ps -ef | grep 50001
gpadmin 9467 1 0 09:47 ? 00:00:00 /usr/local/greenplum-db-4.3.7.3/bin/postgres -D /data1/mirror/gpseg13 -p 50001 -b 31 -z 16 --silent-mode=true -i -M quiescent -C 13
gpadmin 9472 9467 0 09:47 ? 00:00:00 postgres: port 50001, logger process
gpadmin 9479 9467 0 09:47 ? 00:00:00 postgres: port 50001, mirror process
gpadmin 9482 9479 0 09:47 ? 00:00:04 postgres: port 50001, mirror receiver process
gpadmin 9484 9479 0 09:47 ? 00:00:03 postgres: port 50001, mirror consumer process
gpadmin 9486 9479 0 09:47 ? 00:00:00 postgres: port 50001, mirror consumer writer process
gpadmin 9488 9479 0 09:47 ? 00:00:00 postgres: port 50001, mirror consumer append only process
gpadmin 9490 9479 0 09:47 ? 00:00:00 postgres: port 50001, mirror sender ack process
gpadmin 9492 9479 0 09:47 ? 00:00:00 postgres: port 50001, mirror verification process
gpadmin 10848 8524 0 10:07 pts/0 00:00:00 grep 50001
[gpadmin@sdw1 ~]$ kill -9 9467
查看当前节点状态信息
dp_bidb=# select * from gp_segment_configuration order by 8,2;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
1 | -1 | p | p | s | u | 5432 | mdw | mdw | |
2 | 0 | m | p | s | d | 40000 | sdw1 | sdw1 | 41000 |
3 | 1 | p | p | s | u | 40001 | sdw1 | sdw1 | 41001 |
4 | 2 | p | p | s | u | 40002 | sdw1 | sdw1 | 41002 |
5 | 3 | p | p | s | u | 40003 | sdw1 | sdw1 | 41003 |
30 | 12 | m | m | s | u | 50000 | sdw1 | sdw1 | 51000 |
31 | 13 | m | m | s | d | 50001 | sdw1 | sdw1 | 51001 |
32 | 14 | m | m | s | u | 50002 | sdw1 | sdw1 | 51002 |
33 | 15 | m | m | s | u | 50003 | sdw1 | sdw1 | 51003 |
18 | 0 | p | m | c | u | 50000 | sdw2 | sdw2 | 51000 |
19 | 1 | m | m | s | u | 50001 | sdw2 | sdw2 | 51001 |
20 | 2 | m | m | s | u | 50002 | sdw2 | sdw2 | 51002 |
21 | 3 | m | m | s | u | 50003 | sdw2 | sdw2 | 51003 |
6 | 4 | p | p | s | u | 40000 | sdw2 | sdw2 | 41000 |
7 | 5 | p | p | s | u | 40001 | sdw2 | sdw2 | 41001 |
8 | 6 | p | p | s | u | 40002 | sdw2 | sdw2 | 41002 |
9 | 7 | p | p | s | u | 40003 | sdw2 | sdw2 | 41003 |
22 | 4 | m | m | s | u | 50000 | sdw3 | sdw3 | 51000 |
23 | 5 | m | m | s | u | 50001 | sdw3 | sdw3 | 51001 |
24 | 6 | m | m | s | u | 50002 | sdw3 | sdw3 | 51002 |
25 | 7 | m | m | s | u | 50003 | sdw3 | sdw3 | 51003 |
10 | 8 | p | p | s | u | 40000 | sdw3 | sdw3 | 41000 |
11 | 9 | p | p | s | u | 40001 | sdw3 | sdw3 | 41001 |
12 | 10 | p | p | s | u | 40002 | sdw3 | sdw3 | 41002 |
13 | 11 | p | p | s | u | 40003 | sdw3 | sdw3 | 41003 |
26 | 8 | m | m | s | u | 50000 | sdw4 | sdw4 | 51000 |
27 | 9 | m | m | s | u | 50001 | sdw4 | sdw4 | 51001 |
28 | 10 | m | m | s | u | 50002 | sdw4 | sdw4 | 51002 |
29 | 11 | m | m | s | u | 50003 | sdw4 | sdw4 | 51003 |
14 | 12 | p | p | s | u | 40000 | sdw4 | sdw4 | 41000 |
15 | 13 | p | p | c | u | 40001 | sdw4 | sdw4 | 41001 |
16 | 14 | p | p | s | u | 40002 | sdw4 | sdw4 | 41002 |
17 | 15 | p | p | s | u | 40003 | sdw4 | sdw4 | 41003 |
34 | -1 | m | m | s | u | 5432 | smdw | smdw | |
查看日志信息
dp_bidb=# select * from gp_configuration_history order by 1 desc limit 10;
time | dbid | desc
-------------------------------+------+-----------------------------------------------------------------------------------------------
2016-08-16 02:37:36.02561+08 | 31 | FTS: content 13 fault marking status DOWN role m
2016-08-16 02:37:36.025441+08 | 15 | FTS: content 13 fault marking status UP mode: change-tracking role p
备实例down将主实例置为change-tracking模式,不发生角色切换
4、执行实例恢复
[gpadmin@mdw ~]$ gprecoverseg
查看切换日志信息
dp_bidb=# select * from gp_configuration_history order by 1 desc limit 10;
time | dbid | desc
-------------------------------+------+-----------------------------------------------------------------------------------------------
2016-08-16 02:45:13.047233+08 | 31 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2016-08-16 02:45:13.047233+08 | 18 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2016-08-16 02:45:13.047233+08 | 2 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2016-08-16 02:45:13.047233+08 | 15 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2016-08-16 02:45:09.963293+08 | 2 | gprecoverseg: segment config for resync: segment mode and status
2016-08-16 02:45:09.963293+08 | 31 | gprecoverseg: segment config for resync: segment mode and status
先将down的实例启动,然后进行主备重新同步
5、查看同步状态
[gpadmin@mdw ~]$ gpstate -e
同步完成查看日志信息
dp_bidb=# select * from gp_configuration_history order by 1 desc limit 10;
time | dbid | desc
-------------------------------+------+-----------------------------------------------------------------------------------------------
2016-08-16 02:45:36.756482+08 | 31 | FTS: changed segment to insync from resync.
2016-08-16 02:45:36.756482+08 | 15 | FTS: changed segment to insync from resync.
2016-08-16 02:45:36.751624+08 | 2 | FTS: changed segment to insync from resync.
2016-08-16 02:45:36.751624+08 | 18 | FTS: changed segment to insync from resync.
6、rebalance实例角色
[gpadmin@mdw ~]$ gprecoverseg -r
查看日志信息
dp_bidb=# select * from gp_configuration_history order by 1 desc limit 10;
time | dbid | desc了
-------------------------------+------+-----------------------------------------------------------------------------------------------
2016-08-16 02:58:52.731579+08 | 18 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2016-08-16 02:58:52.731579+08 | 2 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2016-08-16 02:58:49.649369+08 | 18 | gprecoverseg: segment config for resync: segment mode and status
2016-08-16 02:58:41.918471+08 | 2 | FTS: content 0 fault marking status UP mode: change-tracking role p
2016-08-16 02:58:41.917958+08 | 18 | FTS: content 0 fault marking status DOWN role m
先将当前主实例18切为备实例角色并关闭,启动当前备实例2为主实例置为change-tracking状态,启动关闭的备实例,重新同步
查看同步状态
[gpadmin@mdw ~]$ gpstate -e
同步完成查看日志信息
dp_bidb=# select * from gp_configuration_history order by 1 desc limit 10;
time | dbid | desc
-------------------------------+------+-----------------------------------------------------------------------------------------------
2016-08-16 02:59:42.080391+08 | 18 | FTS: changed segment to insync from resync.
2016-08-16 02:59:42.080391+08 | 2 | FTS: changed segment to insync from resync.