greenplum集群节点主机宕机恢复异常persistent serial number 28603383, TID (20,45))“,,0,,“xact.c“,1780,“Stack trace

背景:生产环境greenplum集群(greenplum 4.3.8)数据节点seg12主机上4个mirror及seg13主机上4个mirror实例(对应primary在seg12)发生异常(down),mirror实例宕机后短时间内seg12主机发生异常宕机(此时seg13上已经异常的4个实例对应4个primary主机宕机,即有4个实例对应主备均异常),集群无完整的数据副本已无法进行正常提供服务执行sql报错。此时集群8个mirror异常4个primary异常。

过程:紧急联系机房进行宕机主机恢复,主机未发现硬件类故障,正常开机。

1.执行gprecoverseg因存在4个实例主备均异常故无法进行恢复。

2.尝试gpstop -M fast停止集群进行,进行重启。集群停止后进行启动失败:

ERROR:-gpstart error: Do not have enough valid segments to start the array

3.查看启动异常实例日志

2021-09-28 18:23:13.605756 CST,,,p21903,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","database system was not properly shut down; automatic recovery in progr

ess",,,,,,,0,,"xlog.c",6721,

2021-09-28 18:23:13.625746 CST,,,p21903,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","redo starts at 2379/10E603E8",,,,,,,0,,"xlog.c",6853,

2021-09-28 18:23:13.949767 CST,,,p21903,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","unexpected pageaddr 2378/C7418000 in log file 9081, segment 4, offset 5

4624256",,,,,,,0,,"xlog.c",4499,

2021-09-28 18:23:13.949837 CST,,,p21903,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","redo done at 2379/13417DF8",,,,,,,0,,"xlog.c",6942,

2021-09-28 18:23:13.949862 CST,,,p21903,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 2379/13417FD8",,,,,,,0,,"xlog.c",6988,

2021-09-28 18:23:14.690278 CST,,,p21903,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","Finished startup pass 1.  Proceeding to startup crash recovery passes 2 and 3.",,,,,,,0,,"xlog.c",7208,

2021-09-28 18:23:15.288902 CST,,,p21913,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","Recovery Aborting Transaction 557533704",,,,,"Record abort transaction record for crashed transaction 557533704 with 3 'Create Pending' file-system objects (first file-system object Relation File: 'base/17146/322338197 (segment file #0)', persistent serial number 28603380, TID (1448,73))",,0,,"cdbpersistentrecovery.c",1689,

2021-09-28 18:23:15.308408 CST,,,p21913,th-1181391072,,,,0,,,seg-1,,,x557533704,,"LOG","00000","Recovery Aborting Transaction 572079088",,,,,"Record abort transaction record for crashed transaction 572079088 with 6 'Create Pending' file-system objects (first file-system object Relation File: 'base/17146/331149861 (segment file #0)', persistent serial number 29477266, TID (1507,82))",,0,,"cdbpersistentrecovery.c",1689,

2021-09-28 18:23:15.336321 CST,,,p21913,th-1181391072,,,,0,,,seg-1,,,x572079088,,"LOG","00000","Recovery Aborting Transaction 557533712",,,,,"Record abort transaction record for crashed transaction 557533712 with 3 'Create Pending' file-system objects (first file-system object Relation File: 'base/17146/322338273 (segment file #0)', persistent serial number 28603383, TID (20,45))",,0,,"cdbpersistentrecovery.c",1689,

2021-09-28 18:23:15.399482 CST,,,p21913,th-1181391072,,,,0,,,seg-1,,,x572079088,,"FATAL","XX000","Crash recovery abort invalid for transaction 557533712 current status 'Committed' (0x1) and new status 'Aborted' (0x2) (xact.c:1780)",,,,,"Record abort transaction record for crashed transaction 557533712 with 3 'Create Pending' file-system objects (first file-system object Relation File: 'base/17146/322338273 (segment file #0)', persistent serial number 28603383, TID (20,45))",,0,,"xact.c",1780,"Stack trace:

1    0xb03bda postgres <symbol not found> (elog.c:502)

2    0xb05be8 postgres elog_finish (elog.c:1446)

3    0x540b45 postgres RecordCrashTransactionAbortRecord (xact.c:1780)

4    0xcae95c postgres PersistentRecovery_CrashAbort (cdbpersistentrecovery.c:1697)

5    0x560fad postgres StartupXLOG_Pass2 (xlog.c:7361)

6    0x566815 postgres StartupProcessMain (xlog.c:11076)

7    0x5f6729 postgres AuxiliaryProcessMain (bootstrap.c:468)

8    0x8ede24 postgres <symbol not found> (postmaster.c:7589)

9    0x8f5f3d postgres <symbol not found> (postmaster.c:4749)

10   0x8f8ec8 postgres <symbol not found> (postmaster.c:2437)

11   0x8fa840 postgres PostmasterMain (postmaster.c:7589)

12   0x7fc8bf postgres main (main.c:206)

13   0x300641ecdd libc.so.6 __libc_start_main (??:0)

14   0x4c4869 postgres <symbol not found> (??:0)

2021-09-28 18:23:15.407813 CST,,,p21876,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","startup pass 2 process (PID 21913) exited with exit code 1",,,,,,,0,,"p

ostmaster.c",5854,

2021-09-28 18:23:15.407856 CST,,,p21876,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","aborting startup due to startup process failure",,,,,,,0,,"postmaster.c

",4793,

从日志看出数据库在提交事务时因主机发生异常宕机导致了事务提交崩溃相关pg_xlog日志记录发生丢失,启动进程无法找到对应日志进行恢复,启动失败。

4.尝试使用pg_resetxlog工具进行恢复日志,还是启动失败。

5.经多番尝试,最终在postgresql.conf文件中添加gp_crash_recovery_abort_suppress_fatal=true参数跳过相关错误后集群成功启动。启动集群后进行失败节点恢复。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值