又遇10G BUG

今天接一用户电话,说数据库突然DOWN了,叫我看看......

[@more@]

经查警告日志中有下面的错误信息:

Mon Jul 16 17:59:46 2007

Errors in file /adosprod/dump/bdump/adosprod_mman_381670.trc:

ORA-00600: internal error code, arguments: [kmgs_pre_process_request_6], [6], [453], [64], [3], [0x700000208C55290], [], []

Mon Jul 16 17:59:47 2007

Errors in file /adosprod/dump/bdump/adosprod_mman_381670.trc:

ORA-00600: internal error code, arguments: [kmgs_pre_process_request_6], [6], [453], [64], [3], [0x700000208C55290], [], []

Mon Jul 16 17:59:47 2007

MMAN: terminating instance due to error 822

Mon Jul 16 17:59:47 2007

Errors in file /adosprod/dump/bdump/adosprod_lns1_746478.trc:

ORA-00822: MMAN process terminated with error

Instance terminated by MMAN, pid = 381670

显然,数据库在遭遇600错误后实例被关闭。在metalink上查询,发现是Bug 4433838

The error occurs when the parameter SGA_TARGET is set to an exact multiple of 4Gb.

处理办法是:

Ensure the value set for the parameter SGA_TARGET is not an exact multiple of 4Gb.

用户的SGA_TARGET8G,正好是4G的整数倍,本次故障遭遇该BUG的可能性非常大,告之用户修改该参数,并持续观察。

10G的BUG已经遇到过很多了,5月底一温州客户的ASM实例不能mount DISKGROUP,经METALINK确认也是一BUG,下面是当时的情况:

用户将数据库(RAC)从10.2.0.1升级到10.2.0.3后,试图启动两个ASM实例时出同样错误:
SQL> startup
ASM instance started

Total System Global Area 130023424 bytes
Fixed Size 2043664 bytes
Variable Size 102813936 bytes
ASM Cache 25165824 bytes
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"ORADATA"


SQL> select * from v$asm_disk;

no rows selected

SQL> select * from v$asm_diskgroup;

no rows selected
1、查询实例参数如下:
*.background_dump_dest='/oracle/product/10.2.0/admin/+ASM/bdump'
*.cluster_database=true
*.core_dump_dest='/oracle/product/10.2.0/admin/+ASM/cdump'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='SHARED'
*.user_dump_dest='/oracle/product/10.2.0/admin/+ASM/udump'
+ASM1.instance_number=1
+ASM2.instance_number=2
*.asm_diskgroups='ORADATA'
*.asm_diskstring='/dev/vgora/*'
从参数来看没有发现什么问题。
2、从启动ASM实例所出现的错误来看,显然是因找不到DISKGROUP所指定的磁盘所导致。
3、进一步查询卷组vgora的信息,发现从操作系统上看该卷组的状态和所包含PV的状态都是正常的:
WZORA2:/#vgdisplay -v vgora
--- Volume groups ---
VG Name /dev/vgora
VG Write Access read/write
VG Status available, shared, server
Max LV 255
Cur LV 4
Open LV 4
Max PV 16
Cur PV 1
Act PV 1
Max PE per PV 65535
VGDA 2
PE Size (Mbytes) 16
Total PE 26242
Alloc PE 26220
Free PE 22
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- ---

WZORA2 Server
WZORA1 Client

--- Logical volumes ---
LV Name /dev/vgora/lvocr
LV Status available/syncd
LV Size (Mbytes) 304
Current LE 19
Allocated PE 19
Used PV 1

LV Name /dev/vgora/lvvote
LV Status available/syncd
LV Size (Mbytes) 48
Current LE 3
Allocated PE 3
Used PV 1

LV Name /dev/vgora/lvdata
LV Status available/syncd
LV Size (Mbytes) 419008
Current LE 26188
Allocated PE 26188
Used PV 1

LV Name /dev/vgora/asm
LV Status available/syncd
LV Size (Mbytes) 160
Current LE 10
Allocated PE 10
Used PV 1


--- Physical volumes ---
PV Name /dev/dsk/c4t0d2
PV Name /dev/dsk/c8t0d2 Alternate Link
PV Status available
Total PE 26242
Free PE 22
Autoswitch On

4、现在的问题是操作系统上状态正常的盘为什么在ASM实例启动的时候不能被发现。
5、分别执行下面的命令,确定问题原因不是因为权限问题导致。
$ cd $ORACLE_HOME/bin
$ ls -ltr oracle
-rwsr-s--x 1 oracle oinstall 284370040 May 26 10:25 oracle
$ cd /dev/dsk
$ ls -ltr
total 0
brw-r----- 1 bin sys 31 0x000000 Mar 21 02:55 c0t0d0
brw-r----- 1 bin sys 31 0x021000 Mar 21 02:55 c2t1d0
brw-r----- 1 bin sys 31 0x021002 Mar 21 02:55 c2t1d0s2
brw-r----- 1 bin sys 31 0x021003 Mar 21 02:55 c2t1d0s3
brw-r----- 1 bin sys 31 0x030000 Mar 21 02:55 c3t0d0
brw-r----- 1 bin sys 31 0x021001 Mar 21 03:09 c2t1d0s1
brw-r----- 1 bin sys 31 0x060000 Mar 29 17:06 c6t0d0
brw-r----- 1 bin sys 31 0x040200 Mar 29 17:06 c4t0d2
brw-r----- 1 bin sys 31 0x040100 Mar 29 17:06 c4t0d1
brw-r----- 1 bin sys 31 0x070000 Mar 29 17:08 c7t0d0
brw-r----- 1 bin sys 31 0x080100 Mar 29 17:08 c8t0d1
brw-r----- 1 bin sys 31 0x080200 Mar 29 17:09 c8t0d2
brw-r----- 1 bin sys 31 0x030001 Mar 29 17:18 c3t0d0s1
brw-r----- 1 bin sys 31 0x030002 Mar 29 17:18 c3t0d0s2
brw-r----- 1 bin sys 31 0x030003 Mar 29 17:18 c3t0d0s3
$ cd /dev/rdsk
$ ls -ltr
total 0
crw-r--r-- 1 bin sys 188 0x000000 Mar 21 02:55 c0t0d0
crw-r--r-- 1 bin sys 188 0x021003 Mar 21 02:55 c2t1d0s3
crw-r--r-- 1 bin sys 188 0x060000 Mar 29 17:06 c6t0d0
crw-r--r-- 1 bin sys 188 0x040100 Mar 29 17:06 c4t0d1
crw-r--r-- 1 bin sys 188 0x040200 Mar 29 17:06 c4t0d2
crw-r--r-- 1 bin sys 188 0x070000 Mar 29 17:08 c7t0d0
crw-r--r-- 1 bin sys 188 0x030000 Mar 29 17:17 c3t0d0
crw-r--r-- 1 bin sys 188 0x021001 Mar 29 17:20 c2t1d0s1
crw-r--r-- 1 bin sys 188 0x030001 Mar 29 17:20 c3t0d0s1
crw-r--r-- 1 bin sys 188 0x030003 Mar 29 17:21 c3t0d0s3
crw-r--r-- 1 bin sys 188 0x080100 Mar 29 17:41 c8t0d1
crw-r--r-- 1 bin sys 188 0x080200 Mar 29 17:41 c8t0d2
crw-r--r-- 1 bin sys 188 0x021002 Mar 29 17:48 c2t1d0s2
crw-r--r-- 1 bin sys 188 0x030002 Mar 29 17:48 c3t0d0s2
crw-r--r-- 1 bin sys 188 0x021000 Apr 24 10:54 c2t1d0
卷组vgora对应的两个PV为/dev/dsk/c4t0d2和/dev/dsk/c8t0d2。执行下面的命令,将PV的属主修改为oracle:oinstall
WZORA2:/# cd /dev/dsk
WZORA2:/dev/dsk#chown oracle:oinstall /dev/dsk/c4t0d2
WZORA2:/dev/dsk#chown oracle:oinstall /dev/dsk/c8t0d2
WZORA2:/dev/dsk#cd /dev/rdsk
WZORA2:/dev/rdsk#chown oracle:oinstall /dev/dsk/c4t0d2
WZORA2:/dev/rdsk#chown oracle:oinstall /dev/dsk/c8t0d2
重新启动ASM实例,仍然失败。
SQL> startup
ASM instance started

Total System Global Area 130023424 bytes
Fixed Size 2043664 bytes
Variable Size 102813936 bytes
ASM Cache 25165824 bytes
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"ORADATA"
可见原因也不是权限问题。

6、用kfed工具对盘组所对应的盘进行了读取测试,发现读取是正常的。
$ cd $ORACLE_HOME/rdbms/lib
$ cp ins_rdbms.mk ins_rdbms.mk_prekfed

ikfod: $(KFOD)
-mv -f $(ORACLE_HOME)/bin/kfod $(ORACLE_HOME)/bin/kfod0
-mv $(ORACLE_HOME)/rdbms/lib/kfod $(ORACLE_HOME)/bin/kfod
-chmod 751 $(ORACLE_HOME)/bin/kfod
改为
ikfod: $(KFOD)
-mv -f $(ORACLE_HOME)/bin/kfod $(ORACLE_HOME)/bin/kfod0
-mv $(ORACLE_HOME)/rdbms/lib/kfod $(ORACLE_HOME)/bin/kfod
-chmod 751 $(ORACLE_HOME)/bin/kfod

ikfed: $(KFED)
-mv -f $(ORACLE_HOME)/bin/kfed $(ORACLE_HOME)/bin/kfed0
-mv $(ORACLE_HOME)/rdbms/lib/kfed $(ORACLE_HOME)/bin/kfed
-chmod 751 $(ORACLE_HOME)/bin/kfed

$ make -f ins_rdbms.mk ikfed
$ kfed read /dev/vgora/rlvdata
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check: 1585696588 ; 0x00c: 0x5e83cf4c
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: ORADATA_0000 ; 0x028: length=12
kfdhdb.grpname: ORADATA ; 0x048: length=7
kfdhdb.fgname: ORADATA_0000 ; 0x068: length=12
kfdhdb.capname: ; 0x088: length=0
kfdhdb.crestmp.hi: 32887249 ; 0x0a8: HOUR=0x11 DAYS=0xe MNTH=0x4 YEAR=0x7d7
kfdhdb.crestmp.lo: 3430825984 ; 0x0ac: USEC=0x0 MSEC=0x390 SECS=0x7 MINS=0x33
kfdhdb.mntstmp.hi: 32888649 ; 0x0b0: HOUR=0x9 DAYS=0x1a MNTH=0x5 YEAR=0x7d7
kfdhdb.mntstmp.lo: 3740819456 ; 0x0b4: USEC=0x0 MSEC=0x218 SECS=0x2f MINS=0x37
kfdhdb.secsize: 1024 ; 0x0b8: 0x0400
kfdhdb.blksize: 4096 ; 0x0ba: 0x1000
kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize: 419008 ; 0x0c4: 0x000664c0
kfdhdb.pmcnt: 5 ; 0x0c8: 0x00000005
kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001
kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002
kfdhdb.redomirrors[0]: 0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]: 0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]: 0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]: 0 ; 0x0de: 0x0000
kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi: 32887249 ; 0x0e4: HOUR=0x11 DAYS=0xe MNTH=0x4 YEAR=0x7d7
kfdhdb.grpstmp.lo: 3430473728 ; 0x0e8: USEC=0x0 MSEC=0x238 SECS=0x7 MINS=0x33
kfdhdb.ub4spare[0]: 0 ; 0x0ec: 0x00000000
kfdhdb.ub4spare[1]: 0 ; 0x0f0: 0x00000000
kfdhdb.ub4spare[2]: 0 ; 0x0f4: 0x00000000
kfdhdb.ub4spare[3]: 0 ; 0x0f8: 0x00000000
kfdhdb.ub4spare[4]: 0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[5]: 0 ; 0x100: 0x00000000
kfdhdb.ub4spare[6]: 0 ; 0x104: 0x00000000
kfdhdb.ub4spare[7]: 0 ; 0x108: 0x00000000
kfdhdb.ub4spare[8]: 0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[9]: 0 ; 0x110: 0x00000000
kfdhdb.ub4spare[10]: 0 ; 0x114: 0x00000000
kfdhdb.ub4spare[11]: 0 ; 0x118: 0x00000000
kfdhdb.ub4spare[12]: 0 ; 0x11c: 0x00000000
kfdhdb.ub4spare[13]: 0 ; 0x120: 0x00000000
kfdhdb.ub4spare[14]: 0 ; 0x124: 0x00000000
kfdhdb.ub4spare[15]: 0 ; 0x128: 0x00000000
kfdhdb.ub4spare[16]: 0 ; 0x12c: 0x00000000
kfdhdb.ub4spare[17]: 0 ; 0x130: 0x00000000
kfdhdb.ub4spare[18]: 0 ; 0x134: 0x00000000
kfdhdb.ub4spare[19]: 0 ; 0x138: 0x00000000
kfdhdb.ub4spare[20]: 0 ; 0x13c: 0x00000000
kfdhdb.ub4spare[21]: 0 ; 0x140: 0x00000000
kfdhdb.ub4spare[22]: 0 ; 0x144: 0x00000000
kfdhdb.ub4spare[23]: 0 ; 0x148: 0x00000000
kfdhdb.ub4spare[24]: 0 ; 0x14c: 0x00000000
kfdhdb.ub4spare[25]: 0 ; 0x150: 0x00000000
kfdhdb.ub4spare[26]: 0 ; 0x154: 0x00000000
kfdhdb.ub4spare[27]: 0 ; 0x158: 0x00000000
kfdhdb.ub4spare[28]: 0 ; 0x15c: 0x00000000
kfdhdb.ub4spare[29]: 0 ; 0x160: 0x00000000
kfdhdb.ub4spare[30]: 0 ; 0x164: 0x00000000
kfdhdb.ub4spare[31]: 0 ; 0x168: 0x00000000
kfdhdb.ub4spare[32]: 0 ; 0x16c: 0x00000000
kfdhdb.ub4spare[33]: 0 ; 0x170: 0x00000000
kfdhdb.ub4spare[34]: 0 ; 0x174: 0x00000000
kfdhdb.ub4spare[35]: 0 ; 0x178: 0x00000000
kfdhdb.ub4spare[36]: 0 ; 0x17c: 0x00000000
kfdhdb.ub4spare[37]: 0 ; 0x180: 0x00000000
kfdhdb.ub4spare[38]: 0 ; 0x184: 0x00000000
kfdhdb.ub4spare[39]: 0 ; 0x188: 0x00000000
kfdhdb.ub4spare[40]: 0 ; 0x18c: 0x00000000
kfdhdb.ub4spare[41]: 0 ; 0x190: 0x00000000
kfdhdb.ub4spare[42]: 0 ; 0x194: 0x00000000
kfdhdb.ub4spare[43]: 0 ; 0x198: 0x00000000
kfdhdb.ub4spare[44]: 0 ; 0x19c: 0x00000000
kfdhdb.ub4spare[45]: 0 ; 0x1a0: 0x00000000
kfdhdb.ub4spare[46]: 0 ; 0x1a4: 0x00000000
kfdhdb.ub4spare[47]: 0 ; 0x1a8: 0x00000000
kfdhdb.ub4spare[48]: 0 ; 0x1ac: 0x00000000
kfdhdb.ub4spare[49]: 0 ; 0x1b0: 0x00000000
kfdhdb.ub4spare[50]: 0 ; 0x1b4: 0x00000000
kfdhdb.ub4spare[51]: 0 ; 0x1b8: 0x00000000
kfdhdb.ub4spare[52]: 0 ; 0x1bc: 0x00000000
kfdhdb.ub4spare[53]: 0 ; 0x1c0: 0x00000000
kfdhdb.ub4spare[54]: 0 ; 0x1c4: 0x00000000
kfdhdb.ub4spare[55]: 0 ; 0x1c8: 0x00000000
kfdhdb.ub4spare[56]: 0 ; 0x1cc: 0x00000000
kfdhdb.ub4spare[57]: 0 ; 0x1d0: 0x00000000
kfdhdb.acdb.aba.seq: 0 ; 0x1d4: 0x00000000
kfdhdb.acdb.aba.blk: 0 ; 0x1d8: 0x00000000
kfdhdb.acdb.ents: 0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare: 0 ; 0x1de: 0x0000
7、执行了下面的命令:/usr/sbin/lvmchk /dev/vgora/rlvdata,没有任何结果返回,将文件/usr/sbin/lvmchk改名后,重新启动ASM实例,发现可以启动。METALINK工程师据此认为故障原因是bug 6051728,并提供两种解决方案:
(i) Remove the patch (which installed the lvmchk) (or)
(ii) Rename /usr/sbin/lvmchk to someother name"
8、于是采用第二种方案,果然顺利启动RAC
9、最后进行数据库升级的后续事宜。并启动应用服务器,登陆应用,一切正常。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/85922/viewspace-926773/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/85922/viewspace-926773/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值