greenplum集群说明
greenplum整个集群是由多台服务器组合而成,任何一台服务都有可能发生软件或硬件故障,我们一起来模拟一下任何一个节点或服务器故障后,greenplumn的容错及恢复方法
master 服务器:l-test5
standby 服务器:l-test6
segment 服务器:(primary + mirror):l-test[7-12]
greenplum集群说明:1 master + 1 standby + 24 primary segments + 24 mirror segments
正常集群状态
在master查看数据库当前的状态
[gpadmin@l-test5 ~]$ gpstate
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args:
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-Gathering data from segments...
..
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-Greenplum instance status summary
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Master instance = Active
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Master standby = l-test6
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Standby master state = Standby host passive
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total segment instance count from metadata = 48
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Primary Segment Status
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total primary segments = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total primary segment valid (at master) = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total primary segment failures (at master) = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid files found = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of /tmp lock files found = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number postmaster processes missing = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number postmaster processes found = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Mirror Segment Status
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total mirror segments = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total mirror segment valid (at master) = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total mirror segment failures (at master) = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid files found = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number of /tmp lock files found = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number postmaster processes missing = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number postmaster processes found = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number mirror segments acting as primary segments = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
master 服务器故障
当master节点故障后,我们需要激活standby节点作为新的master节点(如果服务器允许,把vip也切换到standby服务器)
在激活standby节点的可以直接指定新的standby节点,也可以等原master服务器恢复后,指定原master节点为standby节点
这里我就不关服务器了,直接模拟master节点故障的状态
关闭master节点
[gpadmin@l-test5 ~]$ gpstop -a -m
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Starting gpstop with args: -a -m
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Gathering information and validating the environment...
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-There are 0 connections to the database
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Master host=l-test5
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=smart
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Master segment instance directory=/export/gp_data/master/gpseg-1
20180211:15:16:25:022773 gpstop:l-test5:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process
20180211:15:16:25:022773 gpstop:l-test5:gpadmin-[INFO]:-Terminating processes for segment /export/gp_data/master/gpseg-1
[gpadmin@l-test5 ~]$ psql
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.65432"?
激活standby节点
[gpadmin@l-test6 ~]$ gpactivatestandby -a -d /export/gp_data/master/gpseg-1/
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:------------------------------------------------------
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:-Standby data directory = /export/gp_data/master/gpseg-1
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:-Standby port = 65432
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:-Standby running = yes
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:-Force standby activation = no
.
.
.
切换服务器vip
原master节点服务器卸载vip:
[root@l-test5 ~]#
[root@l-test5 ~]# ip a d 10.0.0.1/32 brd + dev bond0
-----------
原standby节点服务器挂载vip:
[root@l-test6 ~]#
[root@l-test6 ~]# ip a a 10.0.0.1/32 brd + dev bond0 && arping -q -c 3 -U -I bond0 10.0.0.1
指定新的standby节点
我们指定原master节点为新的standby节点服务器
需要先删除原master的数据文件,然后重新执行初始化standby节点即可
[gpadmin@l-test6 ~]$ gpinitstandby -a -s l-test5
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[INFO]:-Checking for filespace directory /export/gp_data/master/gpseg-1 on l-test5
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[ERROR]:-Filespace directory already exists on host l-test5
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[ERROR]:-Failed to create standby
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[ERROR]:-Error initializing standby master: master data directory exists
----------注意,这步是在原master节点操作-------
[gpadmin@l-test5 ~]$ cd /export/gp_data/master/
[gpadmin@l-test5 /export/gp_data/master]$ ll
total 4
drwx------ 17 gpadmin gpadmin 4096 Feb 11 15:17 gpseg-1
[gpadmin@l-test5 /export/gp_data/master]$ rm -rf gpseg-1/
[gpadmin@l-test5 /export/gp_data/master]$ ll
total 0
[gpadmin@l-test5 /export/gp_data/master]$
----------
[gpadmin@l-test6 ~]$ gpinitstandby -a -s l-test5
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Checking for filespace directory /export/gp_data/master/gpseg-1 on l-test5
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:------------------------------------------------------
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:------------------------------------------------------
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum master hostname = l-test6
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum master data directory = /export/gp_data/master/gpseg-1
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum master port = 65432
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum standby master hostname = l-test5
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum standby master port = 65432
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum standby master data directory = /export/gp_data/master/gpseg-1
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum update system catalog = On
.
.
standby 服务器故障
当standby节点服务器恢复后,需要将standby节点删除,然后重新初始化一下standby服务器即可
在master服务器上查看整个greenplum集群状态,我们发现[WARNING]:-Standby master state
[gpadmin@l-test5 ~]$ gpstate
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args:
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-Gathering data from segments...
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[INFO]:-Greenplum instance status summary
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[INFO]:- Master instance = Active
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[INFO]:- Master standby = l-test6
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[WARNING]:-Standby master state = Standby host DOWN
.
.
[gpadmin@l-test5 ~]$ gpstate -f
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: -f
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-Standby master details
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-----------------------
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:- Standby address = l-test6
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:- Standby data directory = /export/gp_data/master/gpseg-1
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:- Standby port = 65432
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[WARNING]:-Standby PID = 10480 Have lock file /tmp/.s.PGSQL.65432.lock but no process running on port 65432 <<<<<<<<
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[WARNING]:-Standby status = Status could not be determined <<<<<<<<
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:--pg_stat_replication
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-No entries found.
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
1、删除故障的standby节点
[gpadmin@l-test5 ~]$ gpinitstandby -r -a
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:------------------------------------------------------
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Warm master standby removal parameters
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:------------------------------------------------------
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master hostname = l-test5
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master data directory = /export/gp_data/master/gpseg-1
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master port = 65432
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master hostname = l-test6
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master port = 65432
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master data directory = /export/gp_data/master/gpseg-1
20180209:16:52:18:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Removing standby master from catalog...
20180209:16:52:18:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Database catalog updated successfully.
20180209:16:52:18:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Removing filespace directories on standby master...
20180209:16:52:18:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Successfully removed standby master
2、重新初始化standby节点
[gpadmin@l-test5 ~]$ gpinitstandby -s l-test6 -a
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Checking for filespace directory /export/gp_data/master/gpseg-1 on l-test6
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:------------------------------------------------------
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:------------------------------------------------------
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master hostname = l-test5
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master data directory = /export/gp_data/master/gpseg-1
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master port = 65432
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master hostname = l-test6
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master port = 65432
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master data directory = /export/gp_data/master/gpseg-1
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum update system catalog = On
.
.
3、检查standby 的配置信息
[gpadmin@l-test5 ~]$ gpstate -f
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: -f
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-Standby master details
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-----------------------
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:- Standby address = l-test6
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:- Standby data directory = /export/gp_data/master/gpseg-1
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:- Standby port = 65432
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:- Standby PID = 19057
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:- Standby status = Standby host passive
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--pg_stat_replication
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--WAL Sender State: streaming
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--Sync state: sync
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--Sent Location: 0/14000000
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--Flush Location: 0/14000000
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--Replay Location: 0/14000000
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
segment 节点故障
当一个primary segment节点故障,那么它所对应的mirror segment节点会接替primary的状态,继续保证整个集群的数据完整性
当一个mirror segment节点出现故障,它不会影响整个集群的可用性,但是需要尽快修复,保证所有的primary segment都有备份
如果primary segment 和 它所对应的mirror segment 节点都出现故障,那么greenplum认为集群数据不完整,整个集群将不再提供服务,直到primary segment 或 mirror segment恢复
修复segment节点,需要重启greenplum集群
primary segment节点和mirror segment节点的故障修复方式是一样的,这里以mirror节点故障为例
关闭一个节点后集群状态
[gpadmin@l-test7 ~]$ pg_ctl -D /export/gp_data/mirror/data4/gpseg23 stop -m fast
waiting for server to shut down.... done
server stopped
--------master节点执行-------
[gpadmin@l-test5 ~]$ gpstate
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args:
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-Gathering data from segments...
.
.
.
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:- Mirror Segment Status
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:- Total mirror segments = 24
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:- Total mirror segment valid (at master) = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total mirror segment failures (at master) = 1 <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total number of postmaster.pid files missing = 1 <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid files found = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total number of postmaster.pid PIDs missing = 1 <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total number of /tmp lock files missing = 1 <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:- Total number of /tmp lock files found = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total number postmaster processes missing = 1 <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:- Total number postmaster processes found = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:- Total number mirror segments acting as primary segments = 0
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 24
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
[gpadmin@l-test5 ~]$ gpstate -m
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: -m
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--Current GPDB mirror list and status
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--Type = Spread
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status
.
.
.
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:- l-test12 /export/gp_data/mirror/data3/gpseg22 50002 Passive Synchronized
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[WARNING]:-l-test7 /export/gp_data/mirror/data4/gpseg23 50003 Failed <<<<<<<<
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) have failed
[gpadmin@l-test5 ~]$ gpstate -e
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: -e
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-Gathering data from segments...
..
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-Segment Mirroring Status Report
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-Primaries in Change Tracking
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:- Current Primary Port Change tracking size Mirror Port
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:- l-test9 40003 128 bytes l-test7 50003
关闭整个greenplum集群
[gpadmin@l-test5 ~]$ gpstop -a -M fast
.
.
.
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:- Segments stopped successfully = 48
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:- Segments with errors during stop = 0
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:-
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[WARNING]:-Segments that are currently marked down in configuration = 1 <<<<<<<<
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:- (stop was still attempted on these segments)
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:-----------------------------------------------------
.
.
.
以restricted方式启动数据库
[gpadmin@l-test5 ~]$ gpstart -a -R
.
.
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-Process results...
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:- Successful segment starts = 47
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:- Failed segment starts = 0
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:- Skipped segment starts (segments are marked down in configuration) = 0
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-Successfully started 47 of 47 segment instances
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-Starting Master instance l-test5 directory /export/gp_data/master/gpseg-1 in RESTRICTED mode
20180211:16:25:23:008571 gpstart:l-test5:gpadmin-[INFO]:-Command pg_ctl reports Master l-test5 instance active
20180211:16:25:23:008571 gpstart:l-test5:gpadmin-[INFO]:-Starting standby master
20180211:16:25:23:008571 gpstart:l-test5:gpadmin-[INFO]:-Checking if standby master is running on host: l-test6 in directory: /export/gp_data/master/gpseg-1
20180211:16:25:24:008571 gpstart:l-test5:gpadmin-[INFO]:-Database successfully started
开始修复故障节点
[gpadmin@l-test5 ~]$ gprecoverseg -a
.
.
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-Greenplum instance recovery parameters
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:----------------------------------------------------------
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-Recovery type = Standard
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:----------------------------------------------------------
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-Recovery 1 of 1
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:----------------------------------------------------------
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Synchronization mode = Incremental
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Failed instance host = l-test7
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Failed instance address = l-test7
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Failed instance directory = /export/gp_data/mirror/data4/gpseg23
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Failed instance port = 50003
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Failed instance replication port = 51003
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Recovery Source instance host = l-test9
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Recovery Source instance address = l-test9
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Recovery Source instance directory = /export/gp_data/primary/data4/gpseg23
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Recovery Source instance port = 40003
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Recovery Source instance replication port = 41003
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:- Recovery Target = in-place
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:----------------------------------------------------------
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-1 segment(s) to recover
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-Ensuring 1 failed segment(s) are stopped
.
.
检查集群修复状态
一直等到Data Status 这个属性全部都是Synchronized即可进行下一步操作
[gpadmin@l-test5 ~]$ gpstate -m
.
20180211:16:25:51:009237 gpstate:l-test5:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status
20180211:16:25:51:009237 gpstate:l-test5:gpadmin-[INFO]:- l-test11 /export/gp_data/mirror/data1/gpseg0 50000 Passive Synchronized
.
.
20180211:16:25:51:009237 gpstate:l-test5:gpadmin-[INFO]:- l-test7 /export/gp_data/mirror/data4/gpseg23 50003 Passive Resynchronizing
20180211:16:25:51:009237 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
[gpadmin@l-test5 ~]$
[gpadmin@l-test5 ~]$ gpstate -m
.
.
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:- l-test11 /export/gp_data/mirror/data1/gpseg0 50000 Passive Synchronized
.
.
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:- l-test7 /export/gp_data/mirror/data4/gpseg23 50003 Passive Synchronized
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
重启greenplum集群
[gpadmin@l-test5 ~]$ gpstop -a -r
.
.
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:- Segments stopped successfully = 48
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:- Segments with errors during stop = 0
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-Successfully shutdown 48 of 48 segment instances
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-Database successfully shutdown with no errors reported
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-Cleaning up leftover gpmmon process
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-No leftover gpmmon process found
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-Cleaning up leftover gpsmon processes
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-No leftover gpsmon processes on some hosts. not attempting forceful termination on these hosts
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-Cleaning up leftover shared memory
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-Restarting System...