PST的位置
每个磁盘的AU 1是为PST表预留的,但是并不是每一个磁盘都有PST表的信息。
PST副本数量
在external冗余的磁盘组中只有一份PST表。
在normal冗余的磁盘组中,至少有两份PST表。如果磁盘组中有三个或更多的failgroups,PST表将会有3份。
在high冗余的磁盘组中,至少有三份PST表的信息。如果有四个failgroups,将会有4份PST表,如果有五个或者更多failgroups,将会有五份PST表。
以下我们可以创建不同冗余度的磁盘组,观察asm alert日志,来看创建磁盘组后的事情:
external
SQL> create diskgroup archdg external redundancy disk ‘/dev/asm_arch01’
NOTE: Assigning number (2,0) to disk (/dev/asm_arch01)
NOTE: initializing header on grp 2 disk ARCHDG_0000
NOTE: initiating PST update: grp = 2
Thu Mar 19 14:28:24 2020
GMON updating group 2 at 5 for pid 21, osid 3585
NOTE: group ARCHDG: initial PST location: disk 0000 (PST copy 0) -----<<<<copy 0确定只有一份,而且再disk0 上面,有前面日志看出,0号盘就是(/dev/asm_arch01),所以PST在/dev/asm_arch01上
NOTE: PST update grp = 2 completed successfully
。。。。。。。。
normal
SQL> create diskgroup archdg normal redundancy disk ‘/dev/asm_arch01’,’/dev/asm_arch02’,’/dev/asm_arch03’
NOTE: Assigning number (2,0) to disk (/dev/asm_arch01) ------<<<<<(2,0)=v$asm_disk.GROUP_NUMBER为2,v$asm_disk.DISK_NUMBER为0
NOTE: Assigning number (2,1) to disk (/dev/asm_arch02)
NOTE: Assigning number (2,2) to disk (/dev/asm_arch03)
NOTE: initializing header on grp 2 disk ARCHDG_0000
NOTE: initializing header on grp 2 disk ARCHDG_0001
NOTE: initializing header on grp 2 disk ARCHDG_0002
Mon Mar 23 14:39:28 2020
GMON updating for reconfiguration, group 2 at 5 for pid 22, osid 4059
NOTE: group 2 PST updated.
NOTE: initiating PST update: grp = 2
GMON updating group 2 at 6 for pid 22, osid 4059
NOTE: group ARCHDG: initial PST location: disk 0000 (PST copy 0)***—<<<copy 0副本1***
NOTE: group ARCHDG: initial PST location: disk 0001 (PST copy 1)***—<<<copy 1副本2***
NOTE: group ARCHDG: initial PST location: disk 0002 (PST copy 2)***—<<<copy 2副本3***
NOTE: PST update grp = 2 completed successfully
。。。。。
这次我们看到创建了3份PST表,分别位于disk 0000,disk 0001,disk 0002,即/dev/asm_arch01,/dev/asm_arch02,/dev/asm_arch03。
high
待验证。
PST包含的内容
可以通过以下两种方式查看pst的具体内容:
1)gmon trace 每次尝试mount磁盘组时,GMON trace文件就会记录PST信息。如果已经mount上,不会再刷新pst信息
2)kfed read读取AU1固定的位置。
- gmon trace
先通过第一种方法查看,先dismount testdg。然后再mount的时候,输出gmon日志,观察内容
这里说下进程gmon的作用:
GMON(asm disk group monitor):磁盘组监控进程。这个进程负责维护磁盘组中各个磁盘状态的一致性。当磁盘组中的磁盘成员发生该笔那时(例如:添加、删除磁盘,或者磁盘出现故障),该进程负责离线(offline)或者上线(online)磁盘。可以说。这是ASM实例中最重要的后台进程之一。相关元数据的操作都是该进程来完成的
gmon监控PST的信息只有在磁盘mount的时候,会记录。当其他元数据发生变化的时候,gmon也会输出日志)。
tail -200f +ASM_gmon_7107.trc
*** 2020-06-19 14:53:39.484
Mode of disk 0 (TESTDG_0000) in grp 2 changing from 0x9 to 0x7f
Mode of disk 1 (TESTDG_0001) in grp 2 changing from 0x9 to 0x7f
POST res = 12
=============== PST ====================
grpNum: 2
state: 1
callCnt: 17
(lockvalue) valid=1 ver=0.0 ndisks=1 flags=0x3 from inst=0 (I am 1) last=0
--------------- HDR --------------------
next: 5
last: 5
pst count: 1
pst locations: 0
incarn: 1
dta size: 3
version: 1
ASM version: 186646528 = 11.2.0.0.0
contenttype: 1
partnering pattern: [ ]
--------------- LOC MAP ----------------
0: dirty 0 cur_loc: 0 stable_loc: 0
1: dirty 0 cur_loc: 0 stable_loc: 0
--------------- DTA --------------------
0: sts v v(rw) p(rw) a(x) d(x) fg# = 1 addTs = 2443622938 parts:
1: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2443622938 parts:
--------------- HBEAT ------------------ 《《《《磁盘心跳
kfdpHbeat_dump: state=3, inst=1, ts=33102446.3598159872,
rnd=3909470396.2766819163.4244887513.759222249.
kfk io-queue: 0x7ffff628eb98
kfdpHbeatCB_dump: at 0x7ffff628eb88 with ts=06/19/2020 14:53:39 iop=0x7ffff628eb98, grp=2, disk=0/3914523479, isWrite=1 Hbeat #46 state=2 iostate=4
--------------- KFGRP ------------------ 《《《fail group信息
kfgrp: TESTDG number: 2/3261207486 type: 1 compat: 11.2.0.0.0 dbcompat:1.0.0.0.1
timestamp: 431172646 state: 4 flags: 10 gpnlist: a0bb7ee8 a0bb7ee8
KFGPN at a0bb7e28 in dependent chain
kfdsk:0xa0ba7e60
disk: TESTDG_0000 num: 0/7084547265881367383 grp: 2/7084547265228051390 compat: 11.2.0.0.0 dbcompat:10.1.0.0.0
fg: TESTDG_0000 path: /dev/asm_arch01
mnt: O hdr: M mode: v v(rw) p(rw) a(x) d(x) sta: N flg: 1001
slot 65535 ddeslot 65535 numslots 65535 dtype 0 enc 0 part 0 flags 0
kfts: 2020/06/19 11:40:26.454000
kfts: 2020/06/19 11:47:01.479000
pcnt: 0 ()
kfkid: 0xa0bd5a98, kfknm: , status: IDENTIFIED
fob: (KSFD)a2002c18, magic: bebe ausize: 4194304
kfdds: dn=0 inc=3914523479 dsk=0xa0ba7e60 usrp=(nil)
kfkds 0x7ffff60395d8, kfkid 0xa0bd5a98, magic abbe, libnum 0, bpau 8192, fob 0xa20035b0
kfdsk:0xa0ba7048
disk: TESTDG_0001 num: 1/7084547265881367382 grp: 2/7084547265228051390 compat: 11.2.0.0.0 dbcompat:10.1.0.0.0
fg: TESTDG_0001 path: /dev/asm_arch02
mnt: O hdr: M mode: v v(rw) p(rw) a(x) d(x) sta: N flg: 1001
slot 65535 ddeslot 65535 numslots 65535 dtype 0 enc 0 part 0 flags 0
kfts: 2020/06/19 11:40:26.454000
kfts: 2020/06/19 11:47:01.479000
pcnt: 0 ()
kfkid: 0xa0bd4ae0, kfknm: , status: IDENTIFIED
fob: (KSFD)a2002758, magic: bebe ausize: 4194304
kfdds: dn=1 inc=3914523478 dsk=0xa0ba7048 usrp=(nil)
kfkds 0x7ffff6039530, kfkid 0xa0bd4ae0, magic abbe, libnum 0, bpau 8192, fob 0xa20036e0
gmon的trace信息以====或者-------分割,= PST ===标记的部分显示了磁盘组号(grpNum),类型(grpTyp),和状态。— HDR —标记的部分显示了PST的副本数(pst count)和所在的磁盘号。— DTA —显示了PST所在磁盘的状态。PST(Partnership and Status Table)包含一个磁盘组里所有磁盘的相关信息 – 磁盘号,磁盘状态,partner磁盘号,心跳信息和failgroup信息(11g之后)。
- kfed read
每个ASM磁盘的1号AU为PST表预留,但是只有部分磁盘有PST数据。由于PST是重要的ASM元数据,只要失败组个数足够,在normal冗余中将会有三副本,在high冗余中将会有五副本。
直接读取AU1全部block即可。
磁盘心跳位置
根据前面创建的external磁盘组,磁盘组状态如下:
Disk Group Name Fail Group Path File Name Status Status Status TYPE File Size (MB) Used Size (MB) Pct. Used
---------------- --------------- --------------------------------- ------------------------- ------------ ------- ------- --------- -------------- -------------- ---------
TESTDG TESTDG_0000 /dev/asm_arch01 TESTDG_0000 MEMBER CACHED ONLINE REGULAR 3,072 40 1.30
*************** -------------- --------------
TOTAL 3,072 40
TESTDG_0001 /dev/asm_arch02 TESTDG_0001 MEMBER CACHED ONLINE REGULAR 2,048 24 1.17
*************** -------------- --------------
TOTAL 2,048 24
TESTDG_0002 /dev/asm_arch03 TESTDG_0002 MEMBER CACHED ONLINE REGULAR 2,048 32 1.56
*************** -------------- --------------
TOTAL 2,048 32
**************** -------------- --------------
TOTAL 7,168 96
asm的disk心跳验证,oracle正是通过这个来判断某个disk是否属于某个磁盘组,以防止其他磁盘组将该disk挂载,尤其实在HA环境中,在主机一边挂载磁盘组的时候,一定要冻结另一个备机,就可以用这个字段来验证,这个磁盘组是否有主机在挂载。
查看每个磁盘组里面的磁盘心跳状态:
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 19 ; 0x002: KFBTYP_HBEAT -----<<<磁盘心跳标志位
kfbh.datfmt: 2 ; 0x003: 0x02
kfbh.block.blk: 2047 ; 0x004: blk=2047
kfbh.block.obj: 2147483648 ; 0x008: disk=0
kfbh.check: 4118038751 ; 0x00c: 0xf57448df
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdpHbeatB.instance: 1 ; 0x000: 0x00000001
kfdpHbeatB.ts.hi: 33102442 ; 0x004: HOUR=0xa DAYS=0x13 MNTH=0x6 YEAR=0x7e4
kfdpHbeatB.ts.lo: 3597499392 ; 0x008: USEC=0x0 MSEC=0x35f SECS=0x26 MINS=0x35
kfdpHbeatB.rnd[0]: 3372924585 ; 0x00c: 0xc90abea9
kfdpHbeatB.rnd[1]: 704815548 ; 0x010: 0x2a02a1bc
kfdpHbeatB.rnd[2]: 3651789415 ; 0x014: 0xd9a9e267
kfdpHbeatB.rnd[3]: 2589087288 ; 0x018: 0x9a525638
[grid@11gasm ~]$ kfed read /dev/asm_arch02 aun=1 blkn=1023 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2147484671 ; 0x004: blk=1023 (indirect)
kfbh.block.obj: 2147483649 ; 0x008: disk=1
kfbh.check: 17662463 ; 0x00c: 0x010d81ff
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error message
[grid@11gasm ~]$ kfed read /dev/asm_arch03 aun=1 blkn=1023 aus=4194304|more
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
可以看的出来,并不是每个asm磁盘上面都有心跳盘,只有每个PST的最后一个block是磁盘心跳
磁盘心跳也属于PST元数据的一部分,位置和PST的位置保持一致。
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=0 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 17 ; 0x002: KFBTYP_PST_META ---这表示元数据block类型
kfbh.datfmt: 2 ; 0x003: 0x02 ---<<<<表示元数据block格式
kfbh.block.blk: 1024 ; 0x004: blk=1024
kfbh.block.obj: 2147483648 ; 0x008: disk=0
kfbh.check: 2952973672 ; 0x00c: 0xb002cd68
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdpHdrPairBv1.first.super.time.hi:33102443 ; 0x000: HOUR=0xb DAYS=0x13 MNTH=0x6 YEAR=0x7e4 ---<<<<最后一次pst元数据的更新时间,这是高位
kfdpHdrPairBv1.first.super.time.lo:969494528 ; 0x004: USEC=0x0 MSEC=0x254 SECS=0x1c MINS=0xe ---<<<<最后一次pst元数据的更新时间,这是地位
kfdpHdrPairBv1.first.super.last: 2 ; 0x008: 0x00000002 ---<<<<最新的disk版本号
kfdpHdrPairBv1.first.super.next: 2 ; 0x00c: 0x00000002 ---<<<<下一个disk版本号
kfdpHdrPairBv1.first.super.copyCnt: 1 ; 0x010: 0x01 ---<<<<表示PST元数据的备份信息,如果是1表示磁盘组是external冗余。如果是3那么则是normal。
kfdpHdrPairBv1.first.super.version: 1 ; 0x011: 0x01
kfdpHdrPairBv1.first.super.ub2spare: 0 ; 0x012: 0x0000
kfdpHdrPairBv1.first.super.incarn: 1 ; 0x014: 0x00000001
kfdpHdrPairBv1.first.super.copy[0]: 0 ; 0x018: 0x0000
kfdpHdrPairBv1.first.super.copy[1]: 0 ; 0x01a: 0x0000
kfdpHdrPairBv1.first.super.copy[2]: 0 ; 0x01c: 0x0000
kfdpHdrPairBv1.first.super.copy[3]: 0 ; 0x01e: 0x0000
kfdpHdrPairBv1.first.super.copy[4]: 0 ; 0x020: 0x0000
kfdpHdrPairBv1.first.super.dtaSz: 3 ; 0x022: 0x0003 ---<<<<表示pst元数据中,diskgroup所包含的disk个数
kfdpHdrPairBv1.first.asmCompat:186646528 ; 0x024: 0x0b200000
kfdpHdrPairBv1.first.newCopy[0]: 0 ; 0x028: 0x0000
kfdpHdrPairBv1.first.newCopy[1]: 0 ; 0x02a: 0x0000
kfdpHdrPairBv1.first.newCopy[2]: 0 ; 0x02c: 0x0000
kfdpHdrPairBv1.first.newCopy[3]: 0 ; 0x02e: 0x0000
kfdpHdrPairBv1.first.newCopy[4]: 0 ; 0x030: 0x0000
kfdpHdrPairBv1.first.newCopyCnt: 0 ; 0x032: 0x00
kfdpHdrPairBv1.first.contType: 1 ; 0x033: 0x01
kfdpHdrPairBv1.first.spare0: 0 ; 0x034: 0x00000000
kfdpHdrPairBv1.first.ppat[0]: 0 ; 0x038: 0x0000
kfdpHdrPairBv1.first.ppat[1]: 0 ; 0x03a: 0x0000
kfdpHdrPairBv1.first.ppat[2]: 0 ; 0x03c: 0x0000
kfdpHdrPairBv1.first.ppat[3]: 0 ; 0x03e: 0x0000
kfdpHdrPairBv1.first.ppatsz: 0 ; 0x040: 0x00
kfdpHdrPairBv1.first.spare1: 0 ; 0x041: 0x00
kfdpHdrPairBv1.first.spare2: 0 ; 0x042: 0x0000
kfdpHdrPairBv1.first.spares[0]: 0 ; 0x044: 0x00000000
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 17 ; 0x002: KFBTYP_PST_META
kfbh.datfmt: 2 ; 0x003: 0x02
kfbh.block.blk: 1025 ; 0x004: blk=1025
kfbh.block.obj: 2147483648 ; 0x008: disk=0
kfbh.check: 2952974697 ; 0x00c: 0xb002d169
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdpHdrPairBv1.first.super.time.hi:33102443 ; 0x000: HOUR=0xb DAYS=0x13 MNTH=0x6 YEAR=0x7e4
kfdpHdrPairBv1.first.super.time.lo:969493504 ; 0x004: USEC=0x0 MSEC=0x253 SECS=0x1c MINS=0xe
kfdpHdrPairBv1.first.super.last: 1 ; 0x008: 0x00000001
kfdpHdrPairBv1.first.super.next: 2 ; 0x00c: 0x00000002
kfdpHdrPairBv1.first.super.copyCnt: 1 ; 0x010: 0x01
kfdpHdrPairBv1.first.super.version: 1 ; 0x011: 0x01
kfdpHdrPairBv1.first.super.ub2spare: 0 ; 0x012: 0x0000
kfdpHdrPairBv1.first.super.incarn: 1 ; 0x014: 0x00000001
kfdpHdrPairBv1.first.super.copy[0]: 0 ; 0x018: 0x0000
kfdpHdrPairBv1.first.super.copy[1]: 0 ; 0x01a: 0x0000
kfdpHdrPairBv1.first.super.copy[2]: 0 ; 0x01c: 0x0000
kfdpHdrPairBv1.first.super.copy[3]: 0 ; 0x01e: 0x0000
kfdpHdrPairBv1.first.super.copy[4]: 0 ; 0x020: 0x0000
kfdpHdrPairBv1.first.super.dtaSz: 3 ; 0x022: 0x0003
kfdpHdrPairBv1.first.asmCompat:186646528 ; 0x024: 0x0b200000
kfdpHdrPairBv1.first.newCopy[0]: 0 ; 0x028: 0x0000
kfdpHdrPairBv1.first.newCopy[1]: 0 ; 0x02a: 0x0000
kfdpHdrPairBv1.first.newCopy[2]: 0 ; 0x02c: 0x0000
kfdpHdrPairBv1.first.newCopy[3]: 0 ; 0x02e: 0x0000
kfdpHdrPairBv1.first.newCopy[4]: 0 ; 0x030: 0x0000
kfdpHdrPairBv1.first.newCopyCnt: 0 ; 0x032: 0x00
kfdpHdrPairBv1.first.contType: 1 ; 0x033: 0x01
kfdpHdrPairBv1.first.spare0: 0 ; 0x034: 0x00000000
kfdpHdrPairBv1.first.ppat[0]: 0 ; 0x038: 0x0000
kfdpHdrPairBv1.first.ppat[1]: 0 ; 0x03a: 0x0000
kfdpHdrPairBv1.first.ppat[2]: 0 ; 0x03c: 0x0000
kfdpHdrPairBv1.first.ppat[3]: 0 ; 0x03e: 0x0000
kfdpHdrPairBv1.first.ppatsz: 0 ; 0x040: 0x00
kfdpHdrPairBv1.first.spare1: 0 ; 0x041: 0x00
kfdpHdrPairBv1.first.spare2: 0 ; 0x042: 0x0000
kfdpHdrPairBv1.first.spares[0]: 0 ; 0x044: 0x00000000
[grid@11gasm ~]$ kfed read /dev/asm_arch02 aun=1 blkn=0 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2147483648 ; 0x004: blk=0 (indirect)
kfbh.block.obj: 2147483649 ; 0x008: disk=1
kfbh.check: 17662464 ; 0x00c: 0x010d8200
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error message
[grid@11gasm ~]$ kfed read /dev/asm_arch02 aun=1 blkn=1 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2147483649 ; 0x004: blk=1 (indirect)
kfbh.block.obj: 2147483649 ; 0x008: disk=1
kfbh.check: 17662465 ; 0x00c: 0x010d8201
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error message
[grid@11gasm ~]$ kfed read /dev/asm_arch03 aun=1 blkn=0 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2147483648 ; 0x004: blk=0 (indirect)
kfbh.block.obj: 2147483650 ; 0x008: disk=2
kfbh.check: 17662467 ; 0x00c: 0x010d8203
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error message
[grid@11gasm ~]$ kfed read /dev/asm_arch03 aun=1 blkn=1 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2147483649 ; 0x004: blk=1 (indirect)
kfbh.block.obj: 2147483650 ; 0x008: disk=2
kfbh.check: 17662466 ; 0x00c: 0x010d8202
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error messa 《《《《无效信息
进一步验证,PST只在asm磁盘0上面,1和2上面都没有
PST位置发生变化
PST在下列情况下会再分配:
• PST所在磁盘不可用(ASM实例启动时)
• 磁盘offline
• PST的读写发生IO错误
• 磁盘被正常的删除掉(手工或后台)
这些情况下,如果相同失败组中有其他可用的磁盘,PST将会被重分配到其他磁盘;否则会重分配到其他没有PST副本的失败组。
基于以上testdg磁盘组,模拟删除磁盘,验证PST是否发生变化:
- PST位置不发生变化
SQL> alter diskgroup testdg drop disk TESTDG_0002;
Diskgroup altered.
asm alert日志:
Fri Jun 19 12:00:58 2020
SQL> alter diskgroup testdg drop disk TESTDG_0002
NOTE: requesting all-instance membership refresh for group=2
Fri Jun 19 12:01:01 2020
NOTE: membership refresh pending for group 2/0x43a213ba (TESTDG)
Fri Jun 19 12:01:04 2020
GMON querying group 2 at 9 for pid 13, osid 7105
SUCCESS: refreshed membership for 2/0x43a213ba (TESTDG)
SUCCESS: alter diskgroup testdg drop disk TESTDG_0002 ------<<<<<drop已经成功,前台会话返回删除成功信息。
NOTE: starting rebalance of group 2/0x43a213ba (TESTDG) at power 1
Starting background process ARB0 ------<<<<<启动arbn进程,该进程主要负责在磁盘组中进行数据extent的重平衡,可以有arb0-arb9和arba。实际真正干活的
Fri Jun 19 12:01:04 2020
ARB0 started with pid=20, OS id=7348
NOTE: assigning ARB0 to group 2/0x43a213ba (TESTDG) with 1 parallel I/O
cellip.ora not found.
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 2/0x43a213ba (TESTDG) ------<<<<<rebalance完成
NOTE: Attempting voting file refresh on diskgroup TESTDG
Fri Jun 19 12:01:07 2020
NOTE: requesting all-instance membership refresh for group=2
GMON updating for reconfiguration, group 2 at 10 for pid 20, osid 7384
NOTE: group 2 PST updated.
SUCCESS: grp 2 disk TESTDG_0002 emptied
NOTE: erasing header on grp 2 disk TESTDG_0002
NOTE: process _x000_+asm (7384) initiating offline of disk 2.3914523473 (TESTDG_0002) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 2/0xe952e351, mask = 0x6a, op = clear ------<<<<<开始更新PST,group2disk2进行clear
GMON updating disk modes for group 2 at 11 for pid 20, osid 7384
NOTE: PST update grp = 2 completed successfully
NOTE: initiating PST update: grp = 2, dsk = 2/0xe952e351, mask = 0x7e, op = clear
GMON updating disk modes for group 2 at 12 for pid 20, osid 7384
NOTE: cache closing disk 2 of grp 2: TESTDG_0002
NOTE: PST update grp = 2 completed successfully ------<<<<<PST更新完成
GMON updating for reconfiguration, group 2 at 13 for pid 20, osid 7384
NOTE: cache closing disk 2 of grp 2: (not open) TESTDG_0002
NOTE: group 2 PST updated.
NOTE: membership refresh pending for group 2/0x43a213ba (TESTDG)
GMON querying group 2 at 14 for pid 13, osid 7105
GMON querying group 2 at 15 for pid 13, osid 7105
Fri Jun 19 12:01:13 2020
NOTE: Disk TESTDG_0002 in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 2/0x43a213ba (TESTDG)
NOTE: Attempting voting file refresh on diskgroup TESTDG
这里补充下RBAL和ARBn进程的作用:
RBAL(ASM rebalance master):rebalance主进程。这个进程负责协调磁盘组的rebalance操作。而具体的平衡磁盘组extent的操作是由ARBn进程完成的。在数据库实例中也存在RBAL进程,但是它的功能和ASM实例中的RBAL是不同的。数据库中的RBAL进程负责为数据库管理ASM磁盘。
通过他们的trc也可以看出点内容:
abr的trc
[root@11gasm trace]# tail -200f +ASM_arb0_7348.trc
Trace file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_arb0_7348.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Automatic Storage Management option
ORACLE_HOME = /u01/app/grid/11.2.0/grid
System name: Linux
Node name: 11gasm
Release: 2.6.32-642.el6.x86_64
Version: #1 SMP Wed Apr 13 00:51:26 EDT 2016
Machine: x86_64
VM name: VMWare Version: 6
Instance name: +ASM
Redo thread mounted by this instance: 0 <none>
Oracle process number: 20
Unix process pid: 7348, image: oracle@11gasm (ARB0)
*** 2020-06-19 12:01:04.766
*** SESSION ID:(10.1) 2020-06-19 12:01:04.766
*** CLIENT ID:() 2020-06-19 12:01:04.766
*** SERVICE NAME:() 2020-06-19 12:01:04.766
*** MODULE NAME:() 2020-06-19 12:01:04.766
*** ACTION NAME:() 2020-06-19 12:01:04.766
ARB0 relocating file +TESTDG.2.1 (1 entries)
ARB0 relocating file +TESTDG.3.1 (5 entries)
ARB0 relocating file +TESTDG.5.1 (1 entries)
ARB0 relocating file +TESTDG.5.1 (1 entries)
ARB0 relocating file +TESTDG.3.1 (1 entries) -------<<<<<relocate文件
上面abrn的trc文件出现的relocate过程,属于rebalance3个过程中的第二个阶段,而且通过trc可以看到,rebalance是以文件(relocating file)为基本操作单位的,跟过arb进程的跟踪文件可以观察到。第一阶段-rebalance plan,ASM会计算出重平衡的计划。计划取决于很多因素,例如磁盘组大小、磁盘组中的文件个数、磁盘的partnership是否需要调整等等。这个过程时间不会太长,一般不会超过几分钟。
第二阶段-extent relocating,是真正干活的阶段,这个阶段,ASM的区会在磁盘组中的磁盘间移动,这个过程会花费大部分的时间。这个过程中,ASM会记录区的移动数量,以及实际的I/O性能,从而估算该过程需要花费的时间(GV
A
S
M
O
P
E
R
A
T
I
O
N
.
E
S
T
M
I
N
U
T
E
S
记
录
估
算
出
的
时
间
)
。
不
过
要
注
意
的
是
,
这
只
是
估
算
的
时
间
,
真
正
的
花
费
时
间
还
取
决
于
整
体
负
载
(
特
别
是
磁
盘
相
关
的
负
载
)
。
如
果
重
平
衡
是
由
于
磁
盘
组
中
的
一
个
或
者
多
个
磁
盘
损
坏
造
成
的
,
那
么
这
个
阶
段
还
会
对
不
满
足
冗
余
度
要
求
的
数
据
做
镜
像
。
第
三
阶
段
是
磁
盘
的
c
o
m
p
a
c
t
i
n
g
阶
段
(
A
S
M
11.1.0.7
版
本
及
以
上
支
持
)
,
这
个
过
程
是
将
磁
盘
上
存
的
数
据
尽
可
能
的
移
动
到
磁
盘
的
外
圈
磁
道
上
去
(
机
械
盘
的
外
圈
速
度
更
快
)
,
以
提
供
更
高
的
性
能
。
需
要
注
意
的
是
,
在
这
个
阶
段
,
G
V
ASM_OPERATION.EST_MINUTES记录估算出的时间)。不过要注意的是,这只是估算的时间,真正的花费时间还取决于整体负载(特别是磁盘相关的负载)。如果重平衡是由于磁盘组中的一个或者多个磁盘损坏造成的,那么这个阶段还会对不满足冗余度要求的数据做镜像。 第三阶段是磁盘的compacting阶段(ASM 11.1.0.7版本及以上支持),这个过程是将磁盘上存的数据尽可能的移动到磁盘的外圈磁道上去(机械盘的外圈速度更快),以提供更高的性能。需要注意的是,在这个阶段,GV
ASMOPERATION.ESTMINUTES记录估算出的时间)。不过要注意的是,这只是估算的时间,真正的花费时间还取决于整体负载(特别是磁盘相关的负载)。如果重平衡是由于磁盘组中的一个或者多个磁盘损坏造成的,那么这个阶段还会对不满足冗余度要求的数据做镜像。第三阶段是磁盘的compacting阶段(ASM11.1.0.7版本及以上支持),这个过程是将磁盘上存的数据尽可能的移动到磁盘的外圈磁道上去(机械盘的外圈速度更快),以提供更高的性能。需要注意的是,在这个阶段,GVASM_OPERATION.EST_MINUTES会一直显示为0,这个显示为0的“特性”将来可能会被优化。这个阶段所花费的时间,取决于磁盘组中磁盘的数量、重平衡的原因等等,一般情况下,这个阶段花费的时间会比第二阶段短很多。
rbal的trc感觉就是一些调度信息:
[root@11gasm trace]# tail -200f +ASM_rbal_2373.trc
Stale disk (2.1) is being freed
Stale disk (2.2) is being freed
KGGPNP_SIHA: attribute value for 'ASM_DISKSTRING' is '/dev/asm*'
kggpnpSIHAGetItem 2 = /dev/asm*
*** 2020-06-19 11:40:26.422
KGGPNP_SIHA: attribute value for 'ASM_DISKSTRING' is '/dev/asm*'
kggpnpSIHAGetItem 2 = /dev/asm*
*** 2020-06-19 11:40:27.796
KGGPNP_SIHA: attribute value for 'ASM_DISKSTRING' is '/dev/asm*'
kggpnpSIHAGetItem 2 = /dev/asm*
*** 2020-06-19 11:40:33.811
kfgbRegister: registering group 2/0xCD685734 (TESTDG)
kfgbBind: binding kfgpn for group 2/0xCD685734 (TESTDG)
kfdp_query(TESTDG): 41
----- Abridged Call Stack Trace -----
ksedsts()+465<-kfdp_query()+530<-kfiRefresh()+147<-kfi_set_cb()+2149<-kfeInvokeLockCallback()+2733<-kfgbAttrLckRegister()+309<-kfgbDriver()+2264<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201
<-__libc_start_main()+253
----- End of Abridged Call Stack Trace -----
- PST位置发生变化
SQL> alter diskgroup testdg drop disk TESTDG_0000 ;
Diskgroup altered.
asm alert日志:
Starting background process ARB0
Fri Jun 19 20:07:41 2020
ARB0 started with pid=22, OS id=11147
NOTE: assigning ARB0 to group 2/0x811213c7 (TESTDG) with 1 parallel I/O
cellip.ora not found.
NOTE: F1X0 copy 1 relocating from 0:2 to 1:9 for diskgroup 2 (TESTDG)
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 2/0x811213c7 (TESTDG)
NOTE: Attempting voting file refresh on diskgroup TESTDG
Fri Jun 19 20:07:44 2020
NOTE: requesting all-instance membership refresh for group=2
GMON updating for reconfiguration, group 2 at 37 for pid 21, osid 11086
NOTE: group 2 PST updated.
SUCCESS: grp 2 disk TESTDG_0000 emptied
NOTE: erasing header on grp 2 disk TESTDG_0000
NOTE: process x000+asm (11086) initiating offline of disk 0.3914523490 (TESTDG_0000) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 0/0xe952e362, mask = 0x6a, op = clear
GMON updating disk modes for group 2 at 38 for pid 21, osid 11086
NOTE: group TESTDG: updated PST location: disk 0001 (PST copy 0) ----《《《PST位置有disk0变为了disk 1
NOTE: PST update grp = 2 completed successfully
NOTE: initiating PST update: grp = 2, dsk = 0/0xe952e362, mask = 0x7e, op = clear
GMON updating disk modes for group 2 at 39 for pid 21, osid 11086
NOTE: cache closing disk 0 of grp 2: TESTDG_0000
NOTE: PST update grp = 2 completed successfully
GMON updating for reconfiguration, group 2 at 40 for pid 21, osid 11086
NOTE: cache closing disk 0 of grp 2: (not open) TESTDG_0000
NOTE: group 2 PST updated.
NOTE: membership refresh pending for group 2/0x811213c7 (TESTDG)
GMON querying group 2 at 41 for pid 13, osid 7105
GMON querying group 2 at 42 for pid 13, osid 7105
Fri Jun 19 20:07:50 2020
NOTE: Disk TESTDG_0000 in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 2/0x811213c7 (TESTDG)
NOTE: Attempting voting file refresh on diskgroup TESTDG
gmon trace
*** 2020-06-19 20:07:47.762
GMON updating for reconfiguration, group 2 at 37 for pid 21, osid 11086
PRE
=============== PST ====================
grpNum: 2
state: 1
callCnt: 37
(lockvalue) valid=1 ver=0.0 ndisks=1 flags=0x3 from inst=0 (I am 1) last=0
。。。。。。。
NOTE: GMON selects PST disk TESTDG_0001 in failgroup TESTDG_0001 ----《《《gmon也显示PST表更新到了disk1上面。
。。。。。。。
strace跟踪gmon进程观察PST的变化
如前面所述,gmon进程和PST有密切的关系,可以通过gmon的trc观察PST的内容。下面我们可以使用操作系统命令strace,来具体观察gmon进程干了什么?
strace开启:
[root@11gasm trace]# ps -ef|grep gmon
grid 7107 1 0 11:46 ? 00:00:02 asm_gmon_+ASM
root 8319 5955 0 14:38 pts/3 00:00:00 grep gmon
[root@11gasm trace]# strace -fr -o /tmp/7107.log -p 7107
more /tmp/7107.log
。。。。。。。。
7107 0.000023 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107 0.000046 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200s\300.$\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000128 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200i=(W\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000096 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107 0.000027 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137345
7107 0.000033 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107 3.001138 getrusage(RUSAGE_SELF, {ru_utime={0, 22996}, ru_stime={0, 242963}, ...}) = 0
7107 0.000078 getrusage(RUSAGE_SELF, {ru_utime={0, 22996}, ru_stime={0, 242963}, ...}) = 0
7107 0.000083 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107 0.000077 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107 0.000179 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107 0.000062 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000060 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107 0.000097 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107 0.000068 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107 0.000046 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000050 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107 0.000090 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107 0.000149 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200..\220>\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000527 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107 0.000130 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200|\242W\275\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000299 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107 0.000183 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = 0
7107 2.921696 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137937
7107 0.000023 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137937
7107 0.000024 write(23, "\n*** 2020-06-19 12:01:10.804\n", 29) = 29
7107 0.000030 lseek(23, 0, SEEK_CUR) = 7149
7107 0.000013 write(23, "GMON updating for reconfiguratio"..., 70) = 70
7107 0.000016 write(24, "J?ikSN~U61\n", 11) = 11
7107 0.000017 write(23, "\n", 1) = 1
7107 0.000023 lstat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=342530, ...}) = 0
7107 0.000026 stat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=342530, ...}) = 0
7107 0.000043 lstat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=342530, ...}) = 0
7107 0.000020 open("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", O_WRONLY|O_CREAT|O_APPEND, 0660) = 25
7107 0.000019 fcntl(25, F_SETFD, FD_CLOEXEC) = 0
7107 0.000015 write(25, "<msg time='2020-06-19T12:01:10.8"..., 268) = 268
7107 0.000018 close(25) = 0
7107 0.000019 stat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=342798, ...}) = 0
7107 0.000017 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137937
7107 0.000049 lstat("/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log", {st_mode=S_IFREG|0640, st_size=75068, ...}) = 0
7107 0.000023 open("/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 25
7107 0.000017 fcntl(25, F_SETFD, FD_CLOEXEC) = 0
7107 0.000013 write(25, "GMON updating for reconfiguratio"..., 70) = 70
7107 0.000043 write(25, "\n", 1) = 1
7107 0.000017 close(25) = 0
7107 0.000015 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137937
7107 0.000135 lseek(23, 0, SEEK_CUR) = 7220
7107 0.000017 write(23, "PRE", 3) = 3
7107 0.000015 write(24, "3?w9~13\n", 8) = 8
7107 0.000015 write(23, "\n", 1) = 1
7107 0.000014 write(23, "=============== PST ============"..., 41) = 41
7107 0.000014 write(23, "\n", 1) = 1
7107 0.000014 write(23, "grpNum: 2 ", 13) = 13
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "state: 1 ", 13) = 13
7107 0.000012 write(23, "\n", 1) = 1
7107 0.000014 write(23, "callCnt: 10 ", 14) = 14
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000016 write(23, "(lockvalue) valid=1 ver=0.0 ndis"..., 74) = 74
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "--------------- HDR ------------"..., 41) = 41
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "next: \t 2 ", 10) = 10
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "last: \t 2 ", 10) = 10
7107 0.000012 write(23, "\n", 1) = 1
7107 0.000014 write(23, "pst count: \t 1 ", 15) = 15
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "pst locations: \t 0 ", 19) = 19
7107 0.000014 write(23, "\n", 1) = 1
7107 0.000013 write(23, "incarn: \t 1 ", 14) = 14
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000013 write(23, "dta size: \t 3 ", 14) = 14
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "version: \t 1 ", 13) = 13
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000015 write(23, "ASM version: \t 186646528 = 11.2."..., 37) = 37
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "contenttype: \t 1", 16) = 16
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "partnering pattern: \t [ ]", 25) = 25
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000013 write(23, "--------------- LOC MAP --------"..., 41) = 41
7107 0.000014 write(23, "\n", 1) = 1
7107 0.000014 write(23, "0: dirty 0 \t cur_loc: 0 \t stable"..., 39) = 39
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "1: dirty 0 \t cur_loc: 1 \t stable"..., 39) = 39
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "--------------- DTA ------------"..., 41) = 41
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000017 write(23, "0: sts v v(rw) p(rw) a(x) d(x) f"..., 65) = 65
7107 0.000017 write(23, "\n", 1) = 1
7107 0.000016 write(23, "1: sts v v(rw) p(rw) a(x) d(x) f"..., 65) = 65
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000036 write(23, "2: sts v v(rw) p(rw) a(x) d(x) f"..., 65) = 65
7107 0.000017 write(23, "\n", 1) = 1
7107 0.000014 write(23, "--------------- HBEAT ----------"..., 41) = 41
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000015 lseek(23, 0, SEEK_CUR) = 8013
7107 0.000011 write(23, "kfdpHbeat_dump: state=1, inst=1,"..., 55) = 55
7107 0.000013 write(24, "7?jB~MCt\n", 9) = 9
7107 0.000014 write(23, "\n", 1) = 1
7107 0.000014 write(23, "\trnd=141522539.1586440573.197865"..., 48) = 48
7107 0.000014 write(23, "\n", 1) = 1
7107 0.000014 write(23, "kfk io-queue: (nil)", 22) = 22
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000045 write(23, "kfdpHbeatCB_dump: at 0x7ffff628e"..., 147) = 147
7107 0.000024 write(23, "\n", 1) = 1
7107 0.000015 write(23, "--------------- KFGRP ----------"..., 41) = 41
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000015 write(23, "kfgrp: TESTDG number: 2/11346953"..., 81) = 81
7107 0.000014 write(23, "\n", 1) = 1
7107 0.000014 write(23, "timestamp: 430052410 state: 5 fl"..., 68) = 68
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, "KFGPN at a0bb8410 in dependent c"..., 36) = 36
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000013 write(23, "KFGPN at a0bb7e28 in dependent c"..., 36) = 36
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000013 write(23, "KFGPN at a0bb8020 in dependent c"..., 36) = 36
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000016 write(23, " kfdsk:0xa0ba7e60", 18) = 18
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000015 write(23, " disk: TESTDG_0000 num: 0/70845"..., 112) = 112
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " fg: TESTDG_0000 path: /dev/asm"..., 39) = 39
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " mnt: C hdr: M mode: v v(rw) p("..., 62) = 62
7107 0.000014 write(23, "\n", 1) = 1
7107 0.000041 write(23, " slot 65535 ddeslot 65535 numsl"..., 70) = 70
7107 0.000017 write(23, "\n", 1) = 1
7107 0.000015 write(23, " kfts: 2020/06/19 11:40:26.45"..., 36) = 36
7107 0.000014 write(23, "\n", 1) = 1
7107 0.000013 write(23, " kfts: 2020/06/19 11:47:01.47"..., 36) = 36
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000015 write(23, " pcnt: 0 ()", 12) = 12
7107 0.000012 write(23, "\n", 1) = 1
7107 0.000015 write(23, " kfkid: 0xa0bd5a98, kfknm: , "..., 50) = 50
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " fob: (KSFD)a2002c18, magic"..., 54) = 54
7107 0.000012 write(23, "\n", 1) = 1
7107 0.000015 write(23, " kfdds: dn=0 inc=3914523474 d"..., 56) = 56
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfkds 0x7ffff60395d8, kfki"..., 93) = 93
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfdsk:0xa0ba7048", 18) = 18
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000015 write(23, " disk: TESTDG_0001 num: 1/70845"..., 112) = 112
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " fg: TESTDG_0001 path: /dev/asm"..., 39) = 39
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " mnt: C hdr: M mode: v v(rw) p("..., 62) = 62
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " slot 65535 ddeslot 65535 numsl"..., 70) = 70
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfts: 2020/06/19 11:40:26.45"..., 36) = 36
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfts: 2020/06/19 11:47:01.47"..., 36) = 36
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000013 write(23, " pcnt: 0 ()", 12) = 12
7107 0.000017 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfkid: 0xa0bd4ae0, kfknm: , "..., 50) = 50
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000035 write(23, " fob: (KSFD)a2002758, magic"..., 54) = 54
7107 0.000018 write(23, "\n", 1) = 1
7107 0.000015 write(23, " kfdds: dn=1 inc=3914523472 d"..., 56) = 56
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfkds 0x7ffff6039530, kfki"..., 93) = 93
7107 0.000014 lseek(23, 0, SEEK_CUR) = 9892
7107 0.000011 write(23, "\n", 1) = 1
7107 0.000013 write(24, "7?IG~WS1\n", 9) = 9
7107 0.000015 write(23, " kfdsk:0xa0ba7ac8", 18) = 18
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000015 write(23, " disk: TESTDG_0002 num: 2/70845"..., 112) = 112
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " fg: TESTDG_0002 path: /dev/asm"..., 39) = 39
7107 0.000012 write(23, "\n", 1) = 1
7107 0.000014 write(23, " mnt: C hdr: M mode: v v(rw) p("..., 62) = 62
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000015 write(23, " slot 65535 ddeslot 65535 numsl"..., 70) = 70
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfts: 2020/06/19 11:40:26.45"..., 36) = 36
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfts: 2020/06/19 11:47:01.47"..., 36) = 36
7107 0.000012 write(23, "\n", 1) = 1
7107 0.000014 write(23, " pcnt: 0 ()", 12) = 12
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfkid: 0xa0bd5698, kfknm: , "..., 50) = 50
7107 0.000012 write(23, "\n", 1) = 1
7107 0.000014 write(23, " fob: (KSFD)a2002ae8, magic"..., 54) = 54
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000014 write(23, " kfdds: dn=2 inc=3914523473 d"..., 56) = 56
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000013 write(23, " kfkds 0x7ffff6039488, kfki"..., 93) = 93
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000019 lseek(23, 0, SEEK_CUR) = 10543
7107 0.000012 write(23, "POST res = 1 ", 13) = 13
7107 0.000012 write(24, "7?n5~AAD\n", 9) = 9
7107 0.000014 write(23, "\n", 1) = 1
。。。。。
7107 0.000075 pwrite(258, "\1\202\21\2\1\4\0\0\0\0\0\200i\201l(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4198400) = 4096
7107 0.000273 pwrite(258, "\1\202\22\2\2\4\0\0\0\0\0\200\34\33\264\23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4202496) = 4096
7107 0.000245 pwrite(258, "\1\202\21\2\0\4\0\0\0\0\0\200m\231f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4194304) = 4096
7107 0.000219 lseek(23, 0, SEEK_CUR) = 15222
7107 0.000012 write(23, "InvalLck (group 2) downgraded to"..., 34) = 34
7107 0.000015 write(24, "3?ID~1Y\n", 8) = 8
7107 0.000012 write(23, "\n", 1) = 1
7107 0.000047 lseek(23, 0, SEEK_CUR) = 15257
7107 0.000011 write(23, "POST res = 1 ", 13) = 13
7107 0.000011 write(24, "3?R1~1D\n", 8) = 8
7107 0.000011 write(23, "\n", 1) = 1
。。。。。。。
7107 0.000029 pwrite(258, "\1\202\21\2\1\4\0\0\0\0\0\200j\231f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4198400) = 4096
7107 0.000244 pwrite(258, "\1\202\22\2\3\4\0\0\0\0\0\200\t\33\264\23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4206592) = 4096
7107 0.000260 pwrite(258, "\1\202\21\2\0\4\0\0\0\0\0\200o\211f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4194304) = 4096
7107 0.000223 lseek(23, 0, SEEK_CUR) = 17680
7107 0.000012 write(23, "InvalLck (group 2) downgraded to"..., 34) = 34
7107 0.000015 write(24, "3?JC~1Y\n", 8) = 8
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000062 lstat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=345140, ...}) = 0
7107 0.000021 stat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=345140, ...}) = 0
7107 0.000031 lstat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=345140, ...}) = 0
7107 0.000014 open("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", O_WRONLY|O_CREAT|O_APPEND, 0660) = 25
7107 0.000020 fcntl(25, F_SETFD, FD_CLOEXEC) = 0
7107 0.000010 write(25, "<msg time='2020-06-19T12:01:10.8"..., 246) = 246
7107 0.000013 close(25) = 0
7107 0.000013 stat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=345386, ...}) = 0
7107 0.000012 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137941
7107 0.000013 lstat("/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log", {st_mode=S_IFREG|0640, st_size=75708, ...}) = 0
7107 0.000014 open("/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 25
7107 0.000012 fcntl(25, F_SETFD, FD_CLOEXEC) = 0
7107 0.000010 write(25, "NOTE: cache closing disk 2 of gr"..., 48) = 48
7107 0.000029 write(25, "\n", 1) = 1
7107 0.000013 close(25) = 0
7107 0.001428 lseek(23, 0, SEEK_CUR) = 17715
7107 0.000017 write(23, "POST res = 1 ", 13) = 13
7107 0.000014 write(24, "3?RR~1D\n", 8) = 8
7107 0.000013 write(23, "\n", 1) = 1
。。。。。。。
7107 0.000037 pwrite(258, "\1\202\21\2\1\4\0\0\0\0\0\200n\211f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4198400) = 4096
7107 0.000288 pwrite(258, "\1\202\22\2\2\4\0\0\0\0\0\200\23\241\21\202\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4202496) = 4096
7107 0.000227 pwrite(258, "\1\202\21\2\0\4\0\0\0\0\0\200m\255f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4194304) = 4096
7107 0.000228 lseek(23, 0, SEEK_CUR) = 22376
7107 0.000012 write(23, "InvalLck (group 2) downgraded to"..., 34) = 34
7107 0.000016 write(24, "3?bC~1Y\n", 8) = 8
7107 0.000013 write(23, "\n", 1) = 1
7107 0.000047 lseek(23, 0, SEEK_CUR) = 22411
7107 0.000011 write(23, "POST res = 1 ", 13) = 13
7107 0.000011 write(24, "3?W1~1D\n", 8) = 8
7107 0.000011 write(23, "\n", 1) = 1
。。。。。。
7107 0.000010 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000012 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138245
7107 0.000024 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}, {0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 2
7107 0.000052 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\325cn]\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000152 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\327\333\214?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000109 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107 0.000031 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138246
7107 0.000050 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107 3.001232 getrusage(RUSAGE_SELF, {ru_utime={0, 31995}, ru_stime={0, 246962}, ...}) = 0
7107 0.000053 getrusage(RUSAGE_SELF, {ru_utime={0, 31995}, ru_stime={0, 246962}, ...}) = 0
7107 0.000062 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107 0.000062 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107 0.000078 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107 0.000038 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000047 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107 0.000075 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107 0.000052 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107 0.000035 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000038 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107 0.000072 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107 0.000186 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\205a,F\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000454 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\5H\24a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000220 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107 0.000153 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107 0.000052 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107 3.001088 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 246962}, ...}) = 0
7107 0.000019 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 246962}, ...}) = 0
7107 0.000022 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {600, 0}) = 1
7107 0.000037 close(260) = 0
7107 0.000227 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107 0.000023 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107 0.000030 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107 0.000012 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000015 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107 0.000026 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107 0.000015 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107 0.000011 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000013 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107 0.000049 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\235\2017\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000150 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\261\357jv\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000123 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107 0.000057 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107 0.000018 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107 3.000584 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 246962}, ...}) = 0
7107 0.000096 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 246962}, ...}) = 0
7107 0.000297 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107 0.000084 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107 0.000090 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107 0.000045 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000055 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107 0.000088 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107 0.000060 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107 0.000041 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000045 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107 0.000082 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107 0.000139 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\226\344\22(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000648 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200c\276\205\367\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000261 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107 0.000183 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107 0.000063 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107 3.000924 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 248962}, ...}) = 0
7107 0.000041 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 248962}, ...}) = 0
7107 0.000051 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107 0.000049 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107 0.000062 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107 0.000045 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000038 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107 0.000060 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107 0.000040 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107 0.000026 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107 0.000029 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107 0.000054 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107 0.000273 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\246j\371\333\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000257 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\n\345\2120\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 0.000228 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107 0.000123 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107 0.000039 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0} <detached ...>
trace信息很多,但是我们只关注更新磁盘的,即就是与write有关的,我们可以看到,其实操作的就是下面几步:
0.000016 write(23, "\n*** 2020-06-19 12:01:04.657\n", 29) = 29
0.000053 write(23, "GMON querying group 2 at 9 for p"..., 48) = 48
0.000013 write(24, "3?8qUGo~Um\n", 11) = 11
0.000013 write(23, "\n", 1) = 1
0.000032 pwrite(258, "\1\202\21\2\1\4\0\0\0\0\0\200i\315\2\260\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4198400) = 4096
0.000311 pwrite(258, "\1\202\22\2\2\4\0\0\0\0\0\200\232\22\264\23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4202496) = 4096
0.000247 pwrite(258, "\1\202\21\2\0\4\0\0\0\0\0\200j\r\3\325\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4194304) = 4096
0.000264 pwrite(258, "\1\202\22\2\3\4\0\0\0\0\0\200\217\22\264\23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4206592) = 4096
0.000287 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\262\1\354&\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
0.000132 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\252\0173\237\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
查看进程持有的文件句柄:
[root@11gasm ~]# cd /proc/7107/fd
[root@11gasm fd]# ll
total 0
lr-x------ 1 grid oinstall 64 Jun 19 11:55 0 -> /dev/null
l-wx------ 1 grid oinstall 64 Jun 19 11:55 1 -> /dev/null
lr-x------ 1 grid oinstall 64 Jun 19 11:55 10 -> /dev/zero
lrwx------ 1 grid oinstall 64 Jun 19 11:55 11 -> /u01/app/grid/11.2.0/grid/dbs/hc_+ASM.dat
lrwx------ 1 grid oinstall 64 Jun 19 11:55 12 -> socket:[103614]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 13 -> socket:[103615]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 14 -> socket:[103616]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 15 -> socket:[103617]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 16 -> socket:[103618]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 17 -> socket:[103621]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 18 -> socket:[103622]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 19 -> socket:[103669]
l-wx------ 1 grid oinstall 64 Jun 19 11:55 2 -> /dev/null
lrwx------ 1 grid oinstall 64 Jun 19 11:55 20 -> socket:[103670]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 21 -> socket:[103671]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 22 -> socket:[103672]
l-wx------ 1 grid oinstall 64 Jun 19 11:55 23 -> /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_gmon_7107.trc
l-wx------ 1 grid oinstall 64 Jun 19 11:55 24 -> /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_gmon_7107.trm
lrwx------ 1 grid oinstall 64 Jun 19 11:55 256 -> /dev/sdb
lrwx------ 1 grid oinstall 64 Jun 19 11:55 257 -> /dev/sdc
lrwx------ 1 grid oinstall 64 Jun 19 11:55 258 -> /dev/sdd
lrwx------ 1 grid oinstall 64 Jun 19 11:55 259 -> /dev/sde
lrwx------ 1 grid oinstall 64 Jun 19 11:55 3 -> /u01/app/grid/11.2.0/grid/log/11gasm/agent/ohasd/oraagent_grid/oraagent_gridOUT.log
l-wx------ 1 grid oinstall 64 Jun 19 11:55 4 -> /u01/app/grid/11.2.0/grid/log/11gasm/agent/ohasd/oraagent_grid/oraagent_grid.log
lr-x------ 1 grid oinstall 64 Jun 19 11:55 5 -> /dev/null
lr-x------ 1 grid oinstall 64 Jun 19 11:55 6 -> /dev/null
lr-x------ 1 grid oinstall 64 Jun 19 11:55 7 -> /dev/null
lr-x------ 1 grid oinstall 64 Jun 19 11:55 8 -> /u01/app/grid/11.2.0/grid/rdbms/mesg/oraus.msb
lr-x------ 1 grid oinstall 64 Jun 19 11:55 9 -> /proc/7107/fd
[root@11gasm fd]# ls -ltr /dev/asm_*
lrwxrwxrwx 1 root root 3 Jun 19 10:00 /dev/asm_data01 -> sdb
lrwxrwxrwx 1 root root 3 Jun 19 10:00 /dev/asm_data02 -> sdc
lrwxrwxrwx 1 root root 3 Jun 19 11:23 /dev/asm_arch03 -> sdf
lrwxrwxrwx 1 root root 3 Jun 19 11:33 /dev/asm_arch01 -> sdd
lrwxrwxrwx 1 root root 3 Jun 19 11:34 /dev/asm_arch02 -> sde
可以看出,write函数操作的为file 23,24为asm alert日志,也就是asm写日志。
pwrite函数操作的偏移量分别为4198400,4202496,4194304,4206592,对应的为file 258,也就是/dev/asm_arch01,PST所在的磁盘。元数据块大小为4K,上面的偏移量对应的具体块号为:
SQL> select 4198400/4096 from dual;
4198400/4096
------------
1025
SQL> select 4202496/4096 from dual;
4202496/4096
------------
1026
SQL> select 4194304/4096 from dual;
4194304/4096
------------
1024
SQL> select 4206592/4096 from dual;
4206592/4096
------------
1027
查看对应的块信息:
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=0 blkn=1024|grep type
kfbh.type: 17 ; 0x002: KFBTYP_PST_META
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=0 blkn=1025|grep type
kfbh.type: 17 ; 0x002: KFBTYP_PST_META
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=0 blkn=1026|grep type
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=0 blkn=1027|grep type
kfbh.type: 18 ; 0x002: KFBTYP_PST_DTA
由于我们的AU带下为4M,所以一个AU里面存储1024个元数据块,所以从1024开始,已经到第二个AU了,即AU1,所以也等价于下面:
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=0 aus=4194304|grep type
kfbh.type: 17 ; 0x002: KFBTYP_PST_META
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1 aus=4194304|grep type
kfbh.type: 17 ; 0x002: KFBTYP_PST_META
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=2 aus=4194304|grep type
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=3 aus=4194304|grep type
kfbh.type: 18 ; 0x002: KFBTYP_PST_DTA
所以磁盘组中有磁盘的变化时,gmon进程会更新PST的数据,2个meta,1个data,1个none。
上面还有一个函数:
io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\252\0173\237\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"…, nbytes:4096, offset:8384512}}) = 1
调用的pwrite函数,偏移量为8384512。
SQL> select 8384512/4096 from dual;
8384512/4096
------------
2047
对于4M的AU,第二个AU的块号为1024-2047,即2047为第二个AU的最后一个块。磁盘组当有磁盘变化时,gmon也会去更新。如上,AU2的最后一个块为心跳信息。
具体操作系统函数io_submit还有待研究,
下面看下正常磁盘组多久更新次心跳信息,继续strace跟踪,只是参数不同了:
[grid@11gasm ~]$ strace -f -tt -o /tmp/gmon.log -p 7107
tail -200f /tmp/gmon.log
7107 15:15:57.295873 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\321\4\375\322\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 15:15:57.296020 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\27?\323W\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
。。。。。。。
7107 15:16:00.297507 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}, {0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 2
7107 15:16:00.297555 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\3\314\263*\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 15:16:00.297678 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200_+\377\205\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
。。。。。
7107 15:16:03.298822 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\2004\35@'\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107 15:16:03.298951 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\357\273i\350\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
可以看出,每三秒就会更新一次,所以在正常情况,gmon每3秒会更新一次心跳信息。具体的标志位为:kfbh.check。每3秒会改变一次。以此可以判断,该磁盘组是否被挂载。
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check: 1856946567 ; 0x00c: 0x6eaec187
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check: 1580953418 ; 0x00c: 0x5e3b6f4a
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check: 930039782 ; 0x00c: 0x376f47e6
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check: 930039782 ; 0x00c: 0x376f47e6
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check: 930039782 ; 0x00c: 0x376f47e6
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check: 4149201827 ; 0x00c: 0xf74fcba3
总结
- 每个磁盘的AU 1是为PST表预留的。在external冗余的磁盘组中只有一份PST表。
在normal冗余的磁盘组中,至少有两份PST表。如果磁盘组中有三个或更多的failgroups,PST表将会有3份。
在high冗余的磁盘组中,至少有三份PST表的信息。如果有四个failgroups,将会有4份PST表,如果有五个或者更多failgroups,将会有五份PST表。 - 不是每一个磁盘都有PST表的信息。具体在哪个磁盘上面可以通过kfed确定,或者asm alert日志在创建磁盘组的时候会有明确说明。
- 磁盘心跳信息为PST的一部分,跟PST在同一块磁盘上面,具体位置在AU1的最后一个块上面。
- PST(Partnership and Status Table)包含一个磁盘组里所有磁盘的相关信息 – 磁盘号,磁盘状态,partner磁盘号,心跳信息和failgroup信息(11g之后)。
- PST在一些情况下会发生变化,这种变化是可见的,有时磁盘不会发生变化,只是gmon更新了元数据信息;有时磁盘位置会发生变化
- 磁盘组的rebalance会有三个阶段,每个阶段的耗时也不一样。主要的进程就是RBAL和ARBn,ARBn是实际的relocate进程,而RBAL只是调度进程。从trc可以看出
- PST具体元数据可以通过kfed观察,每个结构体的含义可以参考上面。
- 如果PST损坏,是否可以修复,还待研究。
- 正常情况下,gmon每3秒会更新一次磁盘组的心跳信息。