系统环境
oracle 11.2.0.4+ogg12.2.0.1.0
延迟现象
GGSCI (cluster-10-176-50-29 as ogg@yjgk2) 169> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT RUNNING REP_PS1 14:27:09 00:00:57
REPLICAT RUNNING REP_PS2 00:00:02 00:00:10
REPLICAT RUNNING REP_PS3 00:00:03 00:00:10
REPLICAT RUNNING REP_PS4 00:00:03 00:00:03
REPLICAT RUNNING REP_PS5 00:00:05 00:00:01
REPLICAT RUNNING REP_PS6 00:00:05 00:00:01
REPLICAT RUNNING REP_PS7 00:00:00 36:55:12
通过send rep_ps7,status查看事务提交的记录非常慢
问题分析与解决
问题分析
1、查看对哪些张业务表执行什么操作慢
如果查看呢?通过以下方式来排查
1、查看复制进程对应的操作系统进程
[oracle@cluster-10-176-50-29 ~]$ ps -ef |grep rep_ps7
oracle 100611 204642 0 Jan11 ? 00:09:19 /home/oracle/ogg_12/replicat PARAMFILE /home/oracle/ogg_12/dirprm/rep_ps7.prm REPORTFILE /home/oracle/ogg_12/dirrpt/REP_PS7.rpt PROCESSID REP_PS7 USESUBDIRS
oracle 193720 58421 0 10:12 pts/5 00:00:00 grep --color=auto rep_ps7
[oracle@cluster-10-176-50-29 ~]$ ps -ef |grep 100611
oracle 100611 204642 0 Jan11 ? 00:09:19 /home/oracle/ogg_12/replicat PARAMFILE /home/oracle/ogg_12/dirprm/rep_ps7.prm REPORTFILE /home/oracle/ogg_12/dirrpt/REP_PS7.rpt PROCESSID REP_PS7 USESUBDIRS
oracle 100629 100611 76 Jan11 ? 1-12:47:22 oracleyjgk2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 198394 58421 0 10:15 pts/5 00:00:00 grep --color=auto 100611
2、根据操作进程找到对应的会话
select a.spid process_id,b.SID,b.SERIAL#,b.PROGRAM,b.SQL_ADDRESS from v$process a,v$session b where a.ADDR=b.PADDR and a.SPID='100629'
3、根据会话找到执行的相应SQL_ID
select m.SQL_ID,m.EVENT,m.BLOCKING_SESSION_STATUS,m.CURRENT_OBJ#,count(*) from v$active_session_history m where m.SESSION_ID='1495'
group by m.SQL_ID,m.EVENT,m.BLOCKING_SESSION_STATUS,m.CURRENT_OBJ#
order by 5,4
4、根据SQL_ID查看相应的执行计划,分析原因
select * from table(dbms_xplan.display_cursor('5dvxcryg12mzf'))SQL_ID 5dvxcryg12mzf, child number 0
-------------------------------------
DELETE FROM "YJ"."T_YJ_FX_QXYB" WHERE "OBJ_ID" = :b0 AND "QBSJ"
= :b1 AND "YBSJ" = :b2 AND "ETL_DATE" = :b3 AND "LAT" = :b4 AND "LON" =
:b5 AND "RAIN" = :b6 AND "PRES" = :b7 AND "REH" = :b8 AND "TMP" = :b9
AND "WD" = :b10 AND "WS" = :b11 AND "QXXH" = :b12 AND "WDM" = :b13 AND
ROWNUM = 1
Plan hash value: 892274622
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | | | 22062 (100)| |
| 1 | DELETE | T_YJ_FX_QXYB | | | | |
|* 2 | COUNT STOPKEY | | | | | |
|* 3 | TABLE ACCESS FULL| T_YJ_FX_QXYB | 1 | 120 | 22062 (1)| 00:04:25 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM=1)
3 - filter(("OBJ_ID"=:B0 AND "WD"=:B10 AND "REH"=:B8 AND "TMP"=:B9 AND
"PRES"=:B7 AND "QXXH"=:B12 AND "WS"=:B11 AND "LON"=:B5 AND "LAT"=:B4 AND
"WDM"=:B13 AND "RAIN"=:B6 AND "ETL_DATE"=:B3 AND "YBSJ"=:B2 AND "QBSJ"=:B1))
解决方式
因删除记录操作走的全表扫描,在该表上创建相应的索引
create unique index idx_obj_id_tyqq on T_YJGK_FX_QXYB(obj_id)
创建索引时,可能会提示资源忙,需要先停止一下复制进程,创建索引完成后再启动复制进程。
再次查看同步情况
GGSCI (cluster-10-176-50-29 as ogg@yjgk2) 178> info rep_ps7
REPLICAT REP_PS7 Last Started 2021-01-13 10:25 Status RUNNING
Checkpoint Lag 00:00:00 (updated 36:59:03 ago)
Process ID 214682
Log Read Checkpoint File dirdat/dp000002558
**2021-01-11 21:28:45.496671 RBA 471926878**
GGSCI (cluster-10-176-50-29 as ogg@yjgk2) 179> !
info rep_ps7
REPLICAT REP_PS7 Last Started 2021-01-13 10:25 Status RUNNING
Checkpoint Lag 36:45:24 (updated 00:00:00 ago)
Process ID 214682
Log Read Checkpoint File dirdat/dp000002561
**2021-01-11 21:43:25.501406 RBA 462572573**
同步日志已经向前应用了
查看执行计划
select * from table(dbms_xplan.display_cursor('74xsv8m6ms4b8'))
SQL_ID 74xsv8m6ms4b8, child number 0
-------------------------------------
DELETE FROM "YJGK"."T_YJGK_QXFX_QXYB" WHERE "OBJ_ID" = :b0
Plan hash value: 52556501
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | | | 2 (100)| |
| 1 | DELETE | T_YJGK_QXFX_QXYB | | | | |
|* 2 | INDEX UNIQUE SCAN| IDX_OBJ_ID_TYQQ | 1 | 33 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJ_ID"=:B0)
删除操作现在走索引,而不是全表扫描了。
总结
1、ogg同步延迟的原因有很多,需要根据问题具体分析
2、ogg同步环境中,业务表都要求有主键(业务表设计不合理)
3、缺少实时监控,虽然是备库,但延迟了几十个小时才发现。