今天早上 8 点多钟,发现主数据库 CPU 持续居高不下,一直维持在 90% 左右,而这个点数据库应该闲的蛋疼
topas 发现耗掉 CPU 的全是 ggsci 进程。
Topas Monitor for host: bjsczjdbzsj01 EVENTS/QUEUES FILE/TTY
Tue Mar 19 09:00:01 2013 Interval: 2 Cswitch 56687 Readch 197.5M
Syscall 120.7K Writech 4112.5K
CPU User% Kern% Wait% Idle% Physc Entc Reads 32220 Rawin 0
ALL 82.2 5.6 1.2 11.0 11.63 116.3 Writes 13182 Ttyout 600
Forks 23 Igets 0
Network KBPS I-Pack O-Pack KB-In KB-Out Execs 23 Namei 13892
Total 741.4 786.6 444.0 563.1 178.4 Runqueue 33.5 Dirblk 0
Waitqueue 1.0
Disk Busy% KBPS TPS KB-Read KB-Writ MEMORY
Total 100.0 48.4K 10.1K 47.4K 938.6 PAGING Real,MB 81920
Faults 30321 % Comp 94
FileSystem KBPS TPS KB-Read KB-Writ Steals 15501 % Noncomp 5
Total 187.4K 17.9K 187.3K 121.5 PgspIn 0 % Client 5
PgspOut 0
Name PID CPU% PgSp Owner PageIn 9507 PAGING SPACE
ggsci 46006566 8.6 15.7 oracle PageOut 47 Size,MB 16384
ocssd.bi 8388624 4.3 133.5 grid Sios 9550 % Used 24
oracle 11403492 4.3 42.5 oracle % Free 76
ggsci 15925646 4.3 15.7 oracle NFS (calls/sec)
ggsci 31392128 4.3 15.7 oracle SerV2 0 WPAR Activ 0
ggsci 6095204 4.3 15.7 oracle CliV2 0 WPAR Total 0
ggsci 23134254 4.3 15.7 oracle SerV3 0 Press: "h"-help
ggsci 22151308 4.3 15.7 oracle CliV3 0 "q"-quit
ggsci 21692764 4.3 15.7 oracle
ggsci 42467358 4.3 15.7 oracle
ggsci 21234052 4.3 15.7 oracle
ggsci 24052400 4.3 15.7 oracle
ggsci 35913764 4.3 15.7 oracle
ggsci 590450 4.3 15.7 oracle
ggsci 23199846 4.3 15.7 oracle
ggsci 12845086 4.3 15.7 oracle
ggsci 38469784 4.3 15.7 oracle
ggsci 24314114 4.3 15.7 oracle
ggsci 33620042 4.3 15.7 oracle
ggsci 33685546 4.3 15.7 oracle
主数据库上安装了 ggs 和 ggsyy 两个 goldengate 实例,一个用
7809 端口,一个用 7810 端口,前一阵 Oracle 原厂实施 OEM 12C 后,尝试安装 goldengate 插件失败,出过
这一性能问题,但是当时通过停止 OEM agent 和屏蔽插件进程参数,已经解决了,为何又再次重现?
通过 ps -ef 查看发现大量的 ./ggsci 命令都是从使用 7810 端口的 ggsyy 实例中发出,只有一个是从 7809
端口发出(这个是我自己监控打开的)。
bjsczjdbzsj01:/home/oracle/ggs$ps -ef | grep ggsci | grep PORT
oracle 12845086 1 56 08:20:56 - 10:26 ./ggsci PORT 8000-8300 -m 7810
oracle 22151308 1 57 07:52:21 - 18:11 ./ggsci PORT 8000-8300 -m 7810
oracle 23134254 1 55 07:16:37 - 27:09 ./ggsci PORT 8000-8300 -m 7810
oracle 23199846 1 55 08:53:08 - 2:07 ./ggsci PORT 8000-8300 -m 7810
oracle 29098078 1 58 07:48:49 - 18:50 ./ggsci PORT 8000-8300 -m 7810
oracle 31457498 1 62 07:20:14 - 26:49 ./ggsci PORT 8000-8300 -m 7810
oracle 33620042 1 57 07:09:28 - 28:59 ./ggsci PORT 8000-8300 -m 7810
oracle 33685546 1 56 08:24:33 - 9:29 ./ggsci PORT 8000-8300 -m 7810
oracle 35913764 1 57 08:35:14 - 7:04 ./ggsci PORT 8000-8300 -m 7810
oracle 38469784 1 58 07:55:58 - 17:11 ./ggsci PORT 8000-8300 -m 7810
oracle 42467358 1 58 08:06:38 - 14:56 ./ggsci PORT 8000-8300 -m 7810
oracle 55967816 1 57 07:13:05 - 28:06 ./ggsci PORT 8000-8300 -m 7810
oracle 5767526 1 53 07:34:31 - 22:55 ./ggsci PORT 8000-8300 -m 7810
oracle 6095204 1 52 07:45:12 - 23:36 ./ggsci PORT 8000-8300 -m 7810
oracle 6357336 1 58 07:27:23 - 26:55 ./ggsci PORT 8000-8300 -m 7810
oracle 15925646 1 74 08:38:51 - 8:34 ./ggsci PORT 8000-8300 -m 7810
oracle 19399038 1 71 08:49:31 - 4:21 ./ggsci PORT 8000-8300 -m 7810
oracle 21234052 1 51 08:17:24 - 11:21 ./ggsci PORT 8000-8300 -m 7810
oracle 21692764 1 52 07:38:03 - 24:31 ./ggsci PORT 8000-8300 -m 7810
oracle 24314114 1 53 07:59:29 - 17:10 ./ggsci PORT 8000-8300 -m 7810
oracle 25952762 1 54 08:10:15 - 16:58 ./ggsci PORT 8000-8300 -m 7810
oracle 27984288 1 51 08:42:22 - 5:45 ./ggsci PORT 8000-8300 -m 7810
oracle 31392128 1 72 08:45:59 - 5:43 ./ggsci PORT 8000-8300 -m 7810
oracle 33161706 1 54 08:13:47 - 16:08 ./ggsci PORT 8000-8300 -m 7810
oracle 37683540 1 53 08:03:06 - 19:47 ./ggsci PORT 8000-8300 -m 7810
oracle 45678926 1 56 08:31:42 - 7:54 ./ggsci PORT 8000-8300 -m 7810
oracle 46006566 1 51 07:41:40 - 20:54 ./ggsci PORT 8000-8300 -m 7810
oracle 590450 28901990 57 08:56:40 - 1:10 ./ggsci PORT 8000-8300 -m 7810
oracle 10814004 1 58 07:30:54 - 23:59 ./ggsci PORT 8000-8300 -m 7810
oracle 11797206 1 67 07:23:46 - 26:36 ./ggsci PORT 8000-8300 -m 7810
oracle 16908944 28901990 58 09:00:17 - 0:11 ./ggsci PORT 8000-8300 -m 7810
oracle 24052400 1 60 08:28:05 - 8:36 ./ggsci PORT 8000-8300 -m 7810
oracle 36962936 30670938 0 Mar 16 - 1:59 ./ggsci PORT 7815-8000 -m 7809
根据上述判断,问题肯定出在 ggsyy 实例,查看该实例的 error log ,发现文件大小已经暴涨到 19 GB,
和之前的情况一模一样……
bjsczjdbzsj01:/home/oracle/ggsyy$ls -l ggs*log
-rw-r--r-- 1 oracle oinstall 19048559981 Mar 19 09:00 ggserr.log
用 tail -f 查看,发现从主机 emserver1.em.com 的 GUI 界面上不断循环地往 ggsyy 实例发出 ggssci 命令
bjsczjdbzsj01:/home/oracle/ggsyy$tail -f ggserr.log
2013-03-19 08:55:04 INFO OGG-01053 Oracle GoldenGate Capture for Oracle, pzjts_ts.prm: Recovery completed for target file ./dirdat/yj000137, at RBA 1469.
2013-03-19 08:55:04 INFO OGG-01057 Oracle GoldenGate Capture for Oracle, pzjts_ts.prm: Recovery completed for all targets.
2013-03-19 08:56:40 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:52978 (START GGSCI ).
2013-03-19 08:56:40 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 08:56:41 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:39867 (REPORT 590450 8005).
2013-03-19 08:59:10 ERROR OGG-01224 Oracle GoldenGate Command Interpreter for Oracle: Bad file number.
2013-03-19 08:59:10 ERROR OGG-01668 Oracle GoldenGate Command Interpreter for Oracle: PROCESS ABENDING.
2013-03-19 09:00:17 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:58712 (START GGSCI ).
2013-03-19 09:00:17 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 09:00:18 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:10031 (REPORT 16908944 8018).
bjsczjdbzsj01:/home/oracle/ggsyy$tail -f ggserr.log
2013-03-19 08:55:04 INFO OGG-01053 Oracle GoldenGate Capture for Oracle, pzjts_ts.prm: Recovery completed for target file ./dirdat/yj000137, at RBA 1469.
2013-03-19 08:55:04 INFO OGG-01057 Oracle GoldenGate Capture for Oracle, pzjts_ts.prm: Recovery completed for all targets.
2013-03-19 08:56:40 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:52978 (START GGSCI ).
2013-03-19 08:56:40 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 08:56:41 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:39867 (REPORT 590450 8005).
2013-03-19 08:59:10 ERROR OGG-01224 Oracle GoldenGate Command Interpreter for Oracle: Bad file number.
2013-03-19 08:59:10 ERROR OGG-01668 Oracle GoldenGate Command Interpreter for Oracle: PROCESS ABENDING.
2013-03-19 09:00:17 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:58712 (START GGSCI ).
2013-03-19 09:00:17 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 09:00:18 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:10031 (REPORT 16908944 8018).
2013-03-19 09:03:49 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:57373 (START GGSCI ).
2013-03-19 09:03:49 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 09:03:50 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:14929 (REPORT 8192366 8020).
2013-03-19 09:07:26 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:38624 (START GGSCI ).
2013-03-19 09:07:26 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 09:07:27 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:25090 (REPORT 24838184 8039).
用 tail -200 查看,发现日志中有大量如下输出,上次也是这个输出撑爆了硬盘
bjsczjdbzsj01:/home/oracle/ggsyy$tail -200 ggserr.log
2013-03-19 08:33:26 WARNING OGG-01930 Oracle GoldenGate Capture for Oracle, pzj_cx9.prm: Datastore error in 'dirbdb': BDB0060 PANIC: fatal region error detected; run recovery.
2013-03-19 08:33:26 WARNING OGG-01930 Oracle GoldenGate Capture for Oracle, pzj_cx9.prm: Datastore error in 'dirbdb': BDB0060 PANIC: fatal region error detected; run recovery.
2013-03-19 08:33:26 WARNING OGG-01930 Oracle GoldenGate Capture for Oracle, pzj_cx9.prm: Datastore error in 'dirbdb': BDB0060 PANIC: fatal region error detected; run recovery.
2013-03-19 08:35:26 WARNING OGG-01930 Oracle GoldenGate Capture for Oracle, pcqstqz1.prm: Datastore error in 'dirbdb': BDB0060 PANIC: fatal region error detected; run recovery.
emserver1.em.com 主机正是此前安装 OEM 12c 失败的机器,之后换了一台新机器安装 OEM 12c,又在该主机上
已安装好的 weblogic 上部署了 GoldenGate Director 来监控 GoldenGate 进程,推断上述的 ggsci 命令可能是
director 监控发出的,打开 director 监控页面,发现 ggsyy 实例显示为红X,展开也看不到任何进程,也就是
没有配置成功,而且还是刚配的,打开 GoldenGate Directot admin tool 测试 ggsyy 实例的连接性,结果连接
超时,初步判断导致 Director 大量发出 ggsci 命令耗尽 CPU 资源的原因可能是由于 ggsyy 实例未配置成功引起
的,日志文件过大也可能会对数据库服务器的性能产生影响。
备份并清空日志
bjsczjdbzsj01:/home/oracle/ggsyy$tail -50000 ggserr.log > ggserr.log_bak_20130319
bjsczjdbzsj01:/home/oracle/ggsyy$cat /dev/null > ggserr.log
清空日志后,日志文件无任何输出
bjsczjdbzsj01:/home/oracle/ggsyy$tail -f ggserr.log
尝试重启 mgr 进程,查看是否能够正常输出日志
GGSCI (prod.oracle.com) 1> stop mgr
Manager process is required by other GGS processes.
Are you sure you want to stop it (y/n)? y
Sending STOP request to MANAGER ...
Request processed.
Manager stopped.
GGSCI (prod.oracle.com) 2> start mgr
Manager started.
2013-03-19 15:48:40 INFO OGG-00987 Oracle GoldenGate Command Interpreter for Oracle: GGSCI command (oracle): stop mgr.
2013-03-19 15:48:41 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host prod.oracle.com (STOP).
2013-03-19 15:48:41 WARNING OGG-00938 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager is stopping at user request.
2013-03-19 15:48:47 INFO OGG-00987 Oracle GoldenGate Command Interpreter for Oracle: GGSCI command (oracle): start mgr.
2013-03-19 15:48:47 INFO OGG-00983 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started (port 7809).
日志能够正常输出,说明日志清空操作安全。
清空日志后,尝试在 GoldenGate Director 中重新配置 ggsyy 实例的连接性,连接测试成功。
Director 监控页面中 ggsyy 实例目录显示为绿色,ggsyy 实例进程状态能够正确显示出来,
topas 发现耗掉 CPU 的全是 ggsci 进程。
Topas Monitor for host: bjsczjdbzsj01 EVENTS/QUEUES FILE/TTY
Tue Mar 19 09:00:01 2013 Interval: 2 Cswitch 56687 Readch 197.5M
Syscall 120.7K Writech 4112.5K
CPU User% Kern% Wait% Idle% Physc Entc Reads 32220 Rawin 0
ALL 82.2 5.6 1.2 11.0 11.63 116.3 Writes 13182 Ttyout 600
Forks 23 Igets 0
Network KBPS I-Pack O-Pack KB-In KB-Out Execs 23 Namei 13892
Total 741.4 786.6 444.0 563.1 178.4 Runqueue 33.5 Dirblk 0
Waitqueue 1.0
Disk Busy% KBPS TPS KB-Read KB-Writ MEMORY
Total 100.0 48.4K 10.1K 47.4K 938.6 PAGING Real,MB 81920
Faults 30321 % Comp 94
FileSystem KBPS TPS KB-Read KB-Writ Steals 15501 % Noncomp 5
Total 187.4K 17.9K 187.3K 121.5 PgspIn 0 % Client 5
PgspOut 0
Name PID CPU% PgSp Owner PageIn 9507 PAGING SPACE
ggsci 46006566 8.6 15.7 oracle PageOut 47 Size,MB 16384
ocssd.bi 8388624 4.3 133.5 grid Sios 9550 % Used 24
oracle 11403492 4.3 42.5 oracle % Free 76
ggsci 15925646 4.3 15.7 oracle NFS (calls/sec)
ggsci 31392128 4.3 15.7 oracle SerV2 0 WPAR Activ 0
ggsci 6095204 4.3 15.7 oracle CliV2 0 WPAR Total 0
ggsci 23134254 4.3 15.7 oracle SerV3 0 Press: "h"-help
ggsci 22151308 4.3 15.7 oracle CliV3 0 "q"-quit
ggsci 21692764 4.3 15.7 oracle
ggsci 42467358 4.3 15.7 oracle
ggsci 21234052 4.3 15.7 oracle
ggsci 24052400 4.3 15.7 oracle
ggsci 35913764 4.3 15.7 oracle
ggsci 590450 4.3 15.7 oracle
ggsci 23199846 4.3 15.7 oracle
ggsci 12845086 4.3 15.7 oracle
ggsci 38469784 4.3 15.7 oracle
ggsci 24314114 4.3 15.7 oracle
ggsci 33620042 4.3 15.7 oracle
ggsci 33685546 4.3 15.7 oracle
主数据库上安装了 ggs 和 ggsyy 两个 goldengate 实例,一个用
7809 端口,一个用 7810 端口,前一阵 Oracle 原厂实施 OEM 12C 后,尝试安装 goldengate 插件失败,出过
这一性能问题,但是当时通过停止 OEM agent 和屏蔽插件进程参数,已经解决了,为何又再次重现?
通过 ps -ef 查看发现大量的 ./ggsci 命令都是从使用 7810 端口的 ggsyy 实例中发出,只有一个是从 7809
端口发出(这个是我自己监控打开的)。
bjsczjdbzsj01:/home/oracle/ggs$ps -ef | grep ggsci | grep PORT
oracle 12845086 1 56 08:20:56 - 10:26 ./ggsci PORT 8000-8300 -m 7810
oracle 22151308 1 57 07:52:21 - 18:11 ./ggsci PORT 8000-8300 -m 7810
oracle 23134254 1 55 07:16:37 - 27:09 ./ggsci PORT 8000-8300 -m 7810
oracle 23199846 1 55 08:53:08 - 2:07 ./ggsci PORT 8000-8300 -m 7810
oracle 29098078 1 58 07:48:49 - 18:50 ./ggsci PORT 8000-8300 -m 7810
oracle 31457498 1 62 07:20:14 - 26:49 ./ggsci PORT 8000-8300 -m 7810
oracle 33620042 1 57 07:09:28 - 28:59 ./ggsci PORT 8000-8300 -m 7810
oracle 33685546 1 56 08:24:33 - 9:29 ./ggsci PORT 8000-8300 -m 7810
oracle 35913764 1 57 08:35:14 - 7:04 ./ggsci PORT 8000-8300 -m 7810
oracle 38469784 1 58 07:55:58 - 17:11 ./ggsci PORT 8000-8300 -m 7810
oracle 42467358 1 58 08:06:38 - 14:56 ./ggsci PORT 8000-8300 -m 7810
oracle 55967816 1 57 07:13:05 - 28:06 ./ggsci PORT 8000-8300 -m 7810
oracle 5767526 1 53 07:34:31 - 22:55 ./ggsci PORT 8000-8300 -m 7810
oracle 6095204 1 52 07:45:12 - 23:36 ./ggsci PORT 8000-8300 -m 7810
oracle 6357336 1 58 07:27:23 - 26:55 ./ggsci PORT 8000-8300 -m 7810
oracle 15925646 1 74 08:38:51 - 8:34 ./ggsci PORT 8000-8300 -m 7810
oracle 19399038 1 71 08:49:31 - 4:21 ./ggsci PORT 8000-8300 -m 7810
oracle 21234052 1 51 08:17:24 - 11:21 ./ggsci PORT 8000-8300 -m 7810
oracle 21692764 1 52 07:38:03 - 24:31 ./ggsci PORT 8000-8300 -m 7810
oracle 24314114 1 53 07:59:29 - 17:10 ./ggsci PORT 8000-8300 -m 7810
oracle 25952762 1 54 08:10:15 - 16:58 ./ggsci PORT 8000-8300 -m 7810
oracle 27984288 1 51 08:42:22 - 5:45 ./ggsci PORT 8000-8300 -m 7810
oracle 31392128 1 72 08:45:59 - 5:43 ./ggsci PORT 8000-8300 -m 7810
oracle 33161706 1 54 08:13:47 - 16:08 ./ggsci PORT 8000-8300 -m 7810
oracle 37683540 1 53 08:03:06 - 19:47 ./ggsci PORT 8000-8300 -m 7810
oracle 45678926 1 56 08:31:42 - 7:54 ./ggsci PORT 8000-8300 -m 7810
oracle 46006566 1 51 07:41:40 - 20:54 ./ggsci PORT 8000-8300 -m 7810
oracle 590450 28901990 57 08:56:40 - 1:10 ./ggsci PORT 8000-8300 -m 7810
oracle 10814004 1 58 07:30:54 - 23:59 ./ggsci PORT 8000-8300 -m 7810
oracle 11797206 1 67 07:23:46 - 26:36 ./ggsci PORT 8000-8300 -m 7810
oracle 16908944 28901990 58 09:00:17 - 0:11 ./ggsci PORT 8000-8300 -m 7810
oracle 24052400 1 60 08:28:05 - 8:36 ./ggsci PORT 8000-8300 -m 7810
oracle 36962936 30670938 0 Mar 16 - 1:59 ./ggsci PORT 7815-8000 -m 7809
根据上述判断,问题肯定出在 ggsyy 实例,查看该实例的 error log ,发现文件大小已经暴涨到 19 GB,
和之前的情况一模一样……
bjsczjdbzsj01:/home/oracle/ggsyy$ls -l ggs*log
-rw-r--r-- 1 oracle oinstall 19048559981 Mar 19 09:00 ggserr.log
用 tail -f 查看,发现从主机 emserver1.em.com 的 GUI 界面上不断循环地往 ggsyy 实例发出 ggssci 命令
bjsczjdbzsj01:/home/oracle/ggsyy$tail -f ggserr.log
2013-03-19 08:55:04 INFO OGG-01053 Oracle GoldenGate Capture for Oracle, pzjts_ts.prm: Recovery completed for target file ./dirdat/yj000137, at RBA 1469.
2013-03-19 08:55:04 INFO OGG-01057 Oracle GoldenGate Capture for Oracle, pzjts_ts.prm: Recovery completed for all targets.
2013-03-19 08:56:40 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:52978 (START GGSCI ).
2013-03-19 08:56:40 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 08:56:41 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:39867 (REPORT 590450 8005).
2013-03-19 08:59:10 ERROR OGG-01224 Oracle GoldenGate Command Interpreter for Oracle: Bad file number.
2013-03-19 08:59:10 ERROR OGG-01668 Oracle GoldenGate Command Interpreter for Oracle: PROCESS ABENDING.
2013-03-19 09:00:17 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:58712 (START GGSCI ).
2013-03-19 09:00:17 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 09:00:18 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:10031 (REPORT 16908944 8018).
bjsczjdbzsj01:/home/oracle/ggsyy$tail -f ggserr.log
2013-03-19 08:55:04 INFO OGG-01053 Oracle GoldenGate Capture for Oracle, pzjts_ts.prm: Recovery completed for target file ./dirdat/yj000137, at RBA 1469.
2013-03-19 08:55:04 INFO OGG-01057 Oracle GoldenGate Capture for Oracle, pzjts_ts.prm: Recovery completed for all targets.
2013-03-19 08:56:40 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:52978 (START GGSCI ).
2013-03-19 08:56:40 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 08:56:41 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:39867 (REPORT 590450 8005).
2013-03-19 08:59:10 ERROR OGG-01224 Oracle GoldenGate Command Interpreter for Oracle: Bad file number.
2013-03-19 08:59:10 ERROR OGG-01668 Oracle GoldenGate Command Interpreter for Oracle: PROCESS ABENDING.
2013-03-19 09:00:17 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:58712 (START GGSCI ).
2013-03-19 09:00:17 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 09:00:18 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:10031 (REPORT 16908944 8018).
2013-03-19 09:03:49 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:57373 (START GGSCI ).
2013-03-19 09:03:49 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 09:03:50 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:14929 (REPORT 8192366 8020).
2013-03-19 09:07:26 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GUI on host emserver1.em.com:38624 (START GGSCI ).
2013-03-19 09:07:26 INFO OGG-00976 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started 'ggsci' process on port 0.
2013-03-19 09:07:27 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host loopback:25090 (REPORT 24838184 8039).
用 tail -200 查看,发现日志中有大量如下输出,上次也是这个输出撑爆了硬盘
bjsczjdbzsj01:/home/oracle/ggsyy$tail -200 ggserr.log
2013-03-19 08:33:26 WARNING OGG-01930 Oracle GoldenGate Capture for Oracle, pzj_cx9.prm: Datastore error in 'dirbdb': BDB0060 PANIC: fatal region error detected; run recovery.
2013-03-19 08:33:26 WARNING OGG-01930 Oracle GoldenGate Capture for Oracle, pzj_cx9.prm: Datastore error in 'dirbdb': BDB0060 PANIC: fatal region error detected; run recovery.
2013-03-19 08:33:26 WARNING OGG-01930 Oracle GoldenGate Capture for Oracle, pzj_cx9.prm: Datastore error in 'dirbdb': BDB0060 PANIC: fatal region error detected; run recovery.
2013-03-19 08:35:26 WARNING OGG-01930 Oracle GoldenGate Capture for Oracle, pcqstqz1.prm: Datastore error in 'dirbdb': BDB0060 PANIC: fatal region error detected; run recovery.
emserver1.em.com 主机正是此前安装 OEM 12c 失败的机器,之后换了一台新机器安装 OEM 12c,又在该主机上
已安装好的 weblogic 上部署了 GoldenGate Director 来监控 GoldenGate 进程,推断上述的 ggsci 命令可能是
director 监控发出的,打开 director 监控页面,发现 ggsyy 实例显示为红X,展开也看不到任何进程,也就是
没有配置成功,而且还是刚配的,打开 GoldenGate Directot admin tool 测试 ggsyy 实例的连接性,结果连接
超时,初步判断导致 Director 大量发出 ggsci 命令耗尽 CPU 资源的原因可能是由于 ggsyy 实例未配置成功引起
的,日志文件过大也可能会对数据库服务器的性能产生影响。
备份并清空日志
bjsczjdbzsj01:/home/oracle/ggsyy$tail -50000 ggserr.log > ggserr.log_bak_20130319
bjsczjdbzsj01:/home/oracle/ggsyy$cat /dev/null > ggserr.log
清空日志后,日志文件无任何输出
bjsczjdbzsj01:/home/oracle/ggsyy$tail -f ggserr.log
尝试重启 mgr 进程,查看是否能够正常输出日志
GGSCI (prod.oracle.com) 1> stop mgr
Manager process is required by other GGS processes.
Are you sure you want to stop it (y/n)? y
Sending STOP request to MANAGER ...
Request processed.
Manager stopped.
GGSCI (prod.oracle.com) 2> start mgr
Manager started.
2013-03-19 15:48:40 INFO OGG-00987 Oracle GoldenGate Command Interpreter for Oracle: GGSCI command (oracle): stop mgr.
2013-03-19 15:48:41 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host prod.oracle.com (STOP).
2013-03-19 15:48:41 WARNING OGG-00938 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager is stopping at user request.
2013-03-19 15:48:47 INFO OGG-00987 Oracle GoldenGate Command Interpreter for Oracle: GGSCI command (oracle): start mgr.
2013-03-19 15:48:47 INFO OGG-00983 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager started (port 7809).
日志能够正常输出,说明日志清空操作安全。
清空日志后,尝试在 GoldenGate Director 中重新配置 ggsyy 实例的连接性,连接测试成功。
Director 监控页面中 ggsyy 实例目录显示为绿色,ggsyy 实例进程状态能够正确显示出来,
再通过 topas 查看主机 CPU 资源骤降。
转载请注明作者出处及原文链接:
http://blog.csdn.net/xiangsir/article/details/8693767