Oralce GoldenGate故障处理-主机重启导致OGG pump进程传输tail文件异常

1 故障描述

内存异常,XCF报警导致数据库主机宕机 。

2 故障恢复

  2.1 REPORT 日志分析

将数据库主机起来之后,数据库可以  正常启动,OGG进程也是都起来了,但是过一段时间后,pump传输进程abend。

使用view report pump1日志如下:

源端错误信息就是一致刷
2023-02-17 15:00:21  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:00:31  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:00:41  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:00:51  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.
2023-02-17 15:00:21  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:00:31  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:00:41  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809

过一段时间后abend。

目标MGR rpport信息如下:

类似信息一直刷新 。

2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51344 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).

2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51345 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).
2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51344 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).

2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51345 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).
2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51344 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).

2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51345 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).
2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51344 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).

2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51345 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).
2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51344 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).

2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51345 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).
2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51344 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).

2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51345 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).

 此时看信息就觉得是7809 连接 有问题,于是是看

netstat -a|grep 7809 

发现端口正常进行监听。就是连接不上 ,奇怪纳闷了。

  2.2 GGSERR.LOG分析

从目标端的err日志发现如下信息:

目标端报错时错误信息error log信息

2023-02-17 14:46:51  INFO    OGG-01677  Oracle GoldenGate Collector for Oracle:  Waiting for connection (started dynamically).
2023-02-17 14:46:51  ERROR   OGG-00303  Oracle GoldenGate Collector for Oracle:  TCP/IP bind error 125 (Address already in use). 我们开始未发现这个error,是 所有ogg进程重启 在重启间接解决的
2023-02-17 14:46:51  ERROR   OGG-01668  Oracle GoldenGate Collector for Oracle:  PROCESS ABENDING.
2023-02-17 14:46:51  INFO    OGG-01677  Oracle GoldenGate Collector for Oracle:  Waiting for connection (started dynamically).
2023-02-17 14:46:51  ERROR   OGG-00303  Oracle GoldenGate Collector for Oracle:  TCP/IP bind error 125 (Address already in use).
2023-02-17 14:46:51  ERROR   OGG-01668  Oracle GoldenGate Collector for Oracle:  PROCESS ABENDING.

2.3 正确日志汇总

源端pump

正确日志刷到一定程度建立其他端口连接

2023-02-17 15:00:51  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:00:21  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:00:31  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:00:41  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:00:51  WARNING OGG-01223  TCP/IP error 79 (Connection refused), endpoint: 192.168.248.92:7809.

2023-02-17 15:01:06  INFO    OGG-01226  Socket buffer size set to 27985 (flush size 27985).

2023-02-17 15:01:06  INFO    OGG-01230  Recovered from TCP error, host 192.168.248.92, port 7840.

2023-02-17 15:01:09  INFO    OGG-01056  Recovery initialization completed for target file ./dirdat/mo041446, at RBA 145191066, CSN 13091982261579.

2023-02-17 15:01:09  INFO    OGG-01478  Output file ./dirdat/mo is using format RELEASE 11.2.
 


目标端正常时错误信息error log信息

2023-02-17 15:01:01  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from EXTRACT on host [192.168.243.29]:51344 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).
2023-02-17 15:01:01  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from EXTRACT on host [192.168.243.29]:51345 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).
2023-02-17 15:01:01  INFO    OGG-01677  Oracle GoldenGate Collector for Oracle:  Waiting for connection (started dynamically).
2023-02-17 15:01:01  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from SERVER on host [127.0.0.1]:33045 (REPORT 14418 7840).
2023-02-17 15:01:01  INFO    OGG-00974  Oracle GoldenGate Manager for Oracle, mgr.prm:  Manager started collector process (Port 7840).
2023-02-17 15:01:01  INFO    OGG-01228  Oracle GoldenGate Collector for Oracle:  Timeout in 300 seconds.
2023-02-17 15:01:01  INFO    OGG-01677  Oracle GoldenGate Collector for Oracle:  Waiting for connection (started dynamically).
2023-02-17 15:01:01  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from SERVER on host [127.0.0.1]:33046 (REPORT 14419 7841).

2023-02-17 15:01:01  INFO    OGG-00974  Oracle GoldenGate Manager for Oracle, mgr.prm:  Manager started collector process (Port 7841).
2023-02-17 15:01:01  INFO    OGG-01228  Oracle GoldenGate Collector for Oracle:  Timeout in 300 seconds.
2023-02-17 15:01:04  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (eoms): start er *.

mgr report 正确日志


2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51344 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).

2023-02-17 15:01:01  INFO    OGG-00963  Command received from EXTRACT on host [192.168.243.29]:51345 (START SERVER CPU -1 PRI -1  TIMEOUT 300 PARAMS ).

2023-02-17 15:01:01  INFO    OGG-00963  Command received from SERVER on host [127.0.0.1]:33045 (REPORT 14418 7840).

2023-02-17 15:01:01  INFO    OGG-00974  Manager started collector process (Port 7840).

2023-02-17 15:01:01  INFO    OGG-00963  Command received from SERVER on host [127.0.0.1]:33046 (REPORT 14419 7841).

2023-02-17 15:01:01  INFO    OGG-00974  Manager started collector process (Port 7841).

3 故障总结

分析report日志的同时也需要查看ggserr.log进行分析,片面了。MD

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值