greenplum数据库集群异常FATAL","XX000","Number of freeTIDs 788079, do not match maximum free order numbe

    背景描述:greenplum生产集群晚上运行任务负载过高导致实例异常,集群资源繁忙无法正常登陆数据库无法进行恢复,清理部分任务进程后停库进行重启失败,64个实例29个失败。

    现象:重启异常:[ERROR]:-gpstart error: Do not have enough valid segments to start the array.

启动过程如下:

23:13:31:19:012428 gpstart:mas:gpadmin-[INFO]:-Starting gpstart with args: -a
23:13:31:19:012428 gpstart:mas:gpadmin-[INFO]:-Gathering information and validating the environment...
23:13:31:19:012428 gpstart:mas:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.8.0 build 1'
23:13:31:19:012428 gpstart:mas:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
23:13:31:19:012428 gpstart:mas:gpadmin-[INFO]:-Starting Master instance in admin mode
23:13:31:20:012428 gpstart:mas:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
23:13:31:20:012428 gpstart:mas:gpadmin-[INFO]:-Obtaining Segment details from master...
23:13:31:21:012428 gpstart:mas:gpadmin-[INFO]:-Setting new master era
23:13:31:21:012428 gpstart:mas:gpadmin-[INFO]:-Master Started...
23:13:31:21:012428 gpstart:mas:gpadmin-[INFO]:-Shutting down master
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg02 directory /data1/pg_system/primary/gpseg4 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg02 directory /data1/pg_system/primary/gpseg5 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg02 directory /data2/pg_system/primary/gpseg6 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg02 directory /data2/pg_system/primary/gpseg7 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg03 directory /data1/pg_system/primary/gpseg8 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg03 directory /data1/pg_system/primary/gpseg9 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg03 directory /data2/pg_system/primary/gpseg10 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg03 directory /data2/pg_system/primary/gpseg11 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg04 directory /data1/pg_system/primary/gpseg12 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg04 directory /data1/pg_system/primary/gpseg13 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg04 directory /data2/pg_system/primary/gpseg14 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg04 directory /data2/pg_system/primary/gpseg15 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg05 directory /data1/pg_system/primary/gpseg16 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg05 directory /data1/pg_system/primary/gpseg17 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg05 directory /data2/pg_system/primary/gpseg18 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg05 directory /data2/pg_system/primary/gpseg19 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg06 directory /data1/pg_system/primary/gpseg20 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg06 directory /data1/pg_system/primary/gpseg21 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg06 directory /data2/pg_system/primary/gpseg22 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg06 directory /data2/pg_system/primary/gpseg23 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg07 directory /data1/pg_system/primary/gpseg24 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg07 directory /data1/pg_system/primary/gpseg25 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg07 directory /data2/pg_system/primary/gpseg26 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg07 directory /data2/pg_system/primary/gpseg27 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg01 directory /data1/pg_system/mirror/gpseg28 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg01 directory /data1/pg_system/mirror/gpseg29 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg01 directory /data2/pg_system/mirror/gpseg30 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on seg01 directory /data2/pg_system/mirror/gpseg31 <<<<<
23:13:31:24:012428 gpstart:mas:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
................................................................................................................................................................................. 
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-Process results...
23:13:34:21:012428 gpstart:mas:gpadmin-[ERROR]:-No segment started for content: 8.
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-dumping success segments: ['seg02:/data2/pg_system/mirror/gpseg3:content=3:dbid=37:mode=s:status=u', 'seg02:/data2/pg_system/mirror/gpseg2:content=2:dbid=36:mode=s:status=u', 'seg02:/data1/pg_system/mirror/gpseg1:content=1:dbid=35:mode=s:status=u', 'seg02:/data1/pg_system/mirror/gpseg0:content=0:dbid=34:mode=s:status=u', 'seg06:/data1/pg_system/mirror/gpseg17:content=17:dbid=51:mode=c:status=u', 'seg06:/data1/pg_system/mirror/gpseg16:content=16:dbid=50:mode=c:status=u', 'seg06:/data2/pg_system/mirror/gpseg18:content=18:dbid=52:mode=c:status=u', 'seg06:/data2/pg_system/mirror/gpseg19:content=19:dbid=53:mode=c:status=u', 'seg03:/data1/pg_system/mirror/gpseg5:content=5:dbid=39:mode=c:status=u', 'seg03:/data1/pg_system/mirror/gpseg4:content=4:dbid=38:mode=c:status=u', 'seg03:/data2/pg_system/mirror/gpseg7:content=7:dbid=41:mode=c:status=u', 'seg03:/data2/pg_system/mirror/gpseg6:content=6:dbid=40:mode=c:status=u', 'seg07:/data2/pg_system/mirror/gpseg23:content=23:dbid=57:mode=c:status=u', 'seg07:/data2/pg_system/mirror/gpseg22:content=22:dbid=56:mode=c:status=u', 'seg07:/data1/pg_system/mirror/gpseg20:content=20:dbid=54:mode=c:status=u', 'seg07:/data1/pg_system/mirror/gpseg21:content=21:dbid=55:mode=c:status=u', 'seg05:/data2/pg_system/mirror/gpseg14:content=14:dbid=48:mode=c:status=u', 'seg05:/data2/pg_system/mirror/gpseg15:content=15:dbid=49:mode=c:status=u', 'seg05:/data1/pg_system/mirror/gpseg13:content=13:dbid=47:mode=c:status=u', 'seg05:/data1/pg_system/mirror/gpseg12:content=12:dbid=46:mode=c:status=u', 'seg04:/data2/pg_system/mirror/gpseg11:content=11:dbid=45:mode=c:status=u', 'seg04:/data2/pg_system/mirror/gpseg10:content=10:dbid=44:mode=c:status=u', 'seg04:/data1/pg_system/mirror/gpseg9:content=9:dbid=43:mode=c:status=u', 'seg08:/data2/pg_system/mirror/gpseg27:content=27:dbid=61:mode=c:status=u', 'seg08:/data2/pg_system/mirror/gpseg26:content=26:dbid=60:mode=c:status=u', 'seg08:/data1/pg_system/mirror/gpseg24:content=24:dbid=58:mode=c:status=u', 'seg08:/data1/pg_system/mirror/gpseg25:content=25:dbid=59:mode=c:status=u', 'seg08:/data2/pg_system/primary/gpseg30:content=30:dbid=32:mode=c:status=u', 'seg08:/data2/pg_system/primary/gpseg31:content=31:dbid=33:mode=c:status=u', 'seg08:/data1/pg_system/primary/gpseg28:content=28:dbid=30:mode=c:status=u', 'seg08:/data1/pg_system/primary/gpseg29:content=29:dbid=31:mode=c:status=u', 'seg01:/data1/pg_system/primary/gpseg1:content=1:dbid=3:mode=s:status=u', 'seg01:/data1/pg_system/primary/gpseg0:content=0:dbid=2:mode=s:status=u', 'seg01:/data2/pg_system/primary/gpseg2:content=2:dbid=4:mode=s:status=u', 'seg01:/data2/pg_system/primary/gpseg3:content=3:dbid=5:mode=s:status=u']
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-----------------------------------------------------
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-DBID:42  FAILED  host:'seg04' datadir:'/data1/pg_system/mirror/gpseg8' with reason:'Segment postmaster has exited; check segment logfile'
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-----------------------------------------------------
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-----------------------------------------------------
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-   Successful segment starts                                            = 35
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-Failed segment starts                                                = 1    <<<<<<<<
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration)   = 28   <<<<<<<<
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-----------------------------------------------------
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-Successfully started 35 of 36 segment instances, skipped 28 other segments <<<<<<<<
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-----------------------------------------------------
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-Segment instance startup failures reported
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-Failed start 1 of 36 segment instances <<<<<<<<
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-Review /home/gpadmin/gpAdminLogs/gpstart_23.log
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-----------------------------------------------------
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-****************************************************************************
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-There are 28 segment(s) marked down in the database
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
23:13:34:21:012428 gpstart:mas:gpadmin-[WARNING]:-****************************************************************************
23:13:34:21:012428 gpstart:mas:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait...
23:13:34:25:012428 gpstart:mas:gpadmin-[ERROR]:-gpstart error: Do not have enough valid segments to start the array.
[gpadmin@mas ~]$ 

登陆节点seg04查看日志:

cd /data1/pg_system/mirror/gpseg8

日志记录关键报错信息如下:

 13:31:26.124568 CST,,,p32570,th497243936,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 19D2/CF69C378",,,,,,,0,,"xlog.c",6988,
 13:31:26.776825 CST,,,p32570,th497243936,,,,0,,,seg-1,,,,,"FATAL","XX000","Number of freeTIDs 788079, do not match maximum free order number 78819
7, for 'gp_persistent_relation_node' (cdbpersistentstore.c:626)",,,,,,,0,,"cdbpersistentstore.c",626,"Stack trace:
1    0xb03bda postgres <symbol not found> (elog.c:502)
2    0xb05be8 postgres elog_finish (elog.c:1446)
3    0xcb2e14 postgres <symbol not found> (cdbpersistentstore.c:626)
4    0xcb3e17 postgres PersistentStore_InitScanUnderLock (cdbpersistentstore.c:659)
5    0xc9a19c postgres PersistentFileSysObj_StartupInitScan (cdbpersistentfilesysobj.c:673)
6    0x5643f5 postgres StartupXLOG (xlog.c:7171)
7    0x566616 postgres StartupProcessMain (xlog.c:10970)
8    0x5f6715 postgres AuxiliaryProcessMain (bootstrap.c:463)
9    0x8ede24 postgres <symbol not found> (postmaster.c:7589)
10   0x8ee04d postgres StartMasterOrPrimaryPostmasterProcesses (postmaster.c:1576)
11   0x90079f postgres doRequestedPrimaryMirrorModeTransitions (primary_mirror_mode.c:2087)
12   0x8f8ea2 postgres <symbol not found> (postmaster.c:2485)
13   0x8fa840 postgres PostmasterMain (postmaster.c:7589)
14   0x7fc8bf postgres main (main.c:206)
15   0x3e68e1ecdd libc.so.6 __libc_start_main (??:0)
16   0x4c4869 postgres <symbol not found> (??:0)
"
 13:31:26.789524 CST,,,p32539,th497243936,,,,0,,,seg-1,,,,,"LOG","00000","startup process (PID 32570) exited with exit code 1",,,,,,,0,,"postmaster
.c",5854,
 13:31:26.789566 CST,,,p32539,th497243936,,,,0,,,seg-1,,,,,"LOG","00000","aborting startup due to startup process failure",,,,,,,0,,"postmaster.c",
4706,
 13:31:26.983173 CST,,,p32562,th497243936,"127.0.0.1","56915", 13:31:25 CST,0,,,seg-1,,,,,"WARNING","01000","PrimaryMirrorTransitionReque
st (2) Result: Transition to primary/mirror mode PrimarySegment, data state InChangeTracking resulted in Error",,,,,,,0,,"primary_mirror_mode.c",1319,
[gpadmin@seg04 pg_log]$ 
[gpadmin@seg04 pg_log]$ 

解决措施:

在该节点(seg04)实例下修改参数文件postgresql.conf,在文件中添加参数:gp_persistent_skip_free_list=true,然后在进行启动数据库集群,集群启动成功,执行命令gprecoverseg进行恢复失败的实例。

greenplum数据库异常可参考资料较少,望对读者有所帮助!

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值