某社保Streams故障终极解决…

某社保使用Streams对两套数据库进向双向复制,于9月21号和26分别发生重大故障导致正常业务停机。
27号接到电话匆匆忙忙准备行程,并且收集故障信息。
合作伙伴在故障发生时收集的信息如下:
查询v$streams_capture:
SELECT CAPTURE_NAME, STATE FROM V$STREAMS_CAPTURE;
CAPTURE_NAME STATE
-------------- ------------------------
STREAM_CAPTURE PAUSED FOR FLOW CONTROL
查询V$BUFFERED_SUBSCRIBERS得到:
V$BUFFERED_SUBSCRIBERS may show TOO MANY UNBROWSED MESSAGES
我到达现场后,由于故障不在,但应该可以从ALERT.log里面发现什么东西,于是收集了全部三个实例的alert.log日志再加上合作伙伴收集到的信息进行分析。

alert1.log里面可以找到:

Tue Sep 27 10:17:41 2011

Propagation Schedule for (STRMADMIN1.CAP_GISIZJ_QUEUE, "STRMADMIN1"."APP_GISIZJ_QUEUE"@LDCX) encountered following error:

ORA-25307: Enqueue rate too high, flow control enable

另外一个alert日志里面可以看到从2010年这个潜在的故障就已经存在了,但没有被引起关注,因为那时业务量也不是很大,因此stream的压力不大,所以问题没有被暴露出来。

Fri Jun 18 10:39:24 2010

Propagation Schedule for (STRMADMIN1.UP_CAP_QUE_GIJD, "STRMADMIN1"."UP_APP_QUE_GIJD"@LDSC) encountered following error:

ORA-25307: Enqueue rate too high, flow control enabled ß 吻合第三种情况

Fri Jun 18 10:42:59 2010

<wbr></wbr>

stream的故障表现通常有三种形式:
1. 查询v$streams_capture:
SELECT CAPTURE_NAME, STATE FROM V$STREAMS_CAPTURE;
CAPTURE_NAME STATE
-------------- ------------------------
STREAM_CAPTURE PAUSED FOR FLOW CONTROL
2. 查询V$BUFFERED_SUBSCRIBERS得到:
V$BUFFERED_SUBSCRIBERS may show TOO MANY UNBROWSED MESSAGES
3.就是上面那个Error Message: ORA-25307: Enqueue rate too high, flow control enabled

造成streams故障的原因有如下几种.

1) INIT.ORA parameter problem

Inadequate Memory esp., Streams pool size is small or not set

<wbr><wbr><wbr></wbr></wbr></wbr> 2) If AQ_TM_PROCESSES parameter is set too high in 9i and 10g

For 10g/11g unset AQ_TM_PROCESSES and QMON is automatically configured (auto-tune).

<wbr><wbr><wbr></wbr></wbr></wbr> 3) Capture parameter setting have not been done properly .

4) Publisher has enqueued a lot more messages than have been browsed.

(>1000 unbrowsed messages)

<wbr><wbr><wbr></wbr></wbr></wbr> 5) High latency

High Latency can be due to long-running transactions,many dependent transactions, or slow Capture,Propagation, or Apply processes.In some situations, propagation may become disabled (if the number of failures is (16). Also check Propagation latency from dba_queue_schedules;

6) Check Apply latency

7) Long running Transactions

Check that there is no occurrence for long running transaction entries with the alert log

file, it will be something like that C002: long running txn detected, xid:

0x0008.010.00084261

<wbr></wbr>8) Some known bugs

<wbr><wbr><wbr></wbr></wbr></wbr> bug 7206332 bug 6054573 bug 5093060 bug 7719668

<wbr><wbr><wbr></wbr></wbr></wbr> bug 7206332(lack of evidence no patch at this moment)
bug 6054573( base bug 5093060 fixed at 10.2.0.4)

bug 5093060(NA for 10.2.0.4)

bug 7719668 (11.1.0.7)

经过综合分析,我认为客户这套streams是由于bug导致.

当前环境补丁信息如下,仅仅升级了10.2.0.4,并没有在10.2.0.4上打stream相关的补丁。

使用Stream的话Oracle推荐要打上如下补丁方可安全使用Stream

$ ./opatch lsinventory

Invoking OPatch 10.2.0.4.3

<wbr>Oracle Interim Patch Installer version 10.2.0.4.3</wbr>

Copyright (c) 2007, Oracle Corporation.<wbr></wbr> All rights reserved.

<wbr>Oracle Home<span style="mso-spacerun:yes"><wbr><wbr><wbr><wbr><wbr><wbr></wbr></wbr></wbr></wbr></wbr></wbr></span> : /oracle/app/10.2</wbr>

Central Inventory : /oracle/oraInventory

<wbr><wbr></wbr></wbr> from<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr> : /etc/oraInst.loc

OPatch version<wbr><wbr><wbr></wbr></wbr></wbr> : 10.2.0.4.3

OUI version<wbr><wbr><wbr><wbr><wbr></wbr></wbr></wbr></wbr></wbr> <wbr></wbr>: 10.2.0.4.0

OUI location<wbr><wbr><wbr><wbr><wbr></wbr></wbr></wbr></wbr></wbr> : /oracle/app/10.2/oui

Log file location : /oracle/app/10.2/cfgtoollogs/opatch/opatch2011-09-28_16-40-0

7PM.log

<wbr>Lsinventory Output file location : /oracle/app/10.2/cfgtoollogs/opatch/lsinv/lsi</wbr>

nventory2011-09-28_16-40-07PM.txt

<wbr>--------------------------------------------------------------------------------</wbr>

Installed Top-level Products (2):

<wbr>Oracle Database 10g<span style="mso-spacerun:yes"><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></span> 10.2.0.1.0</wbr>

Oracle Database 10g Release 2 Patch Set 3<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr> <wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>10.2.0.4.0

There are 2 products installed in this Oracle Home.

<wbr>There are no Interim patches installed in this Oracle Home.</wbr>

<wbr> --------------------------------------------------------------------------------</wbr>

<wbr>OPatch succeeded.</wbr>

上面结果可以得知,此环境没有打任何和streams相关的补丁。

客户环境在故障发生时还有另外一个现象,就是latch free,从下面的awr可以看到latch free非常高,在latch breakdown中可以查到是channel operations parent latch, 此latch非常少见,通常不会对数据库造成什么影响,但此次出现了,经过分析此为bug7810939,也是由于使用stream的bug。

[转载]某社保Streams故障终极解决方法

软件开发商报Capture进程写trace文件造成文件巨大,并且由于文件被锁定无法清理,因此造成文件系统被消耗很快,不得不停机清理,具体表现为某trace文件里面不断的写如下信息:

LET: next scn before krvxrolf.

scn: 0x0000.00000000

LET:<wbr></wbr> next scn after krvxrolf.

scn: 0xffff.ffffffff

LET: next scn before krvxrolf.

此又为streams的另一个 bug 6994160 : LET: next scn before krvxrolf. This patch already merged into patch 8680173

最后得出结论打上如下Stream的补丁:8680137, 7033630, 7219752, 6838714, 6827260, 6640411, 5933656另外加上latch补丁和写trace文件的补丁。

补丁应用很成功,启动数据库和流后,又发现另外一个我曾经给Shanghaionstar解决的一个问题:7257038:USING DBMS_REDEFINITION ON STREAMS TABLE CAUSES 'MISSING STREAMS MVDD'。

为了不影响数据库,我们使用rolling mode应用这个补丁,没想到的事情出来了,这个以前从没碰到过:

在RAC中一个节点应用这个补丁后,问题解决,但另一个节点在数据库启动后却没有解决,最后我们将数据库全部停下来,回退此补丁后再重新打一次,问题得到解决,由此说明oracle所谓的rolling mode patch并不是十分的可靠,可靠性可能在99.99%,而我在这次环境中就碰到的这个0.01%,因为我给这么多客户打过的补丁中以前是比没碰到过这种情况的。

综合一下:如果要使用streams,最好打上oracle推荐的补丁再进行配置,否则会有这样和那样的问题,并且这些补丁解决了一些streams的性能问题。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值