物理元数据PST

PST的位置

每个磁盘的AU 1是为PST表预留的,但是并不是每一个磁盘都有PST表的信息。

PST副本数量

在external冗余的磁盘组中只有一份PST表。
在normal冗余的磁盘组中,至少有两份PST表。如果磁盘组中有三个或更多的failgroups,PST表将会有3份。
在high冗余的磁盘组中,至少有三份PST表的信息。如果有四个failgroups,将会有4份PST表,如果有五个或者更多failgroups,将会有五份PST表。

以下我们可以创建不同冗余度的磁盘组,观察asm alert日志,来看创建磁盘组后的事情:

external

SQL> create diskgroup archdg external redundancy disk ‘/dev/asm_arch01’
NOTE: Assigning number (2,0) to disk (/dev/asm_arch01)
NOTE: initializing header on grp 2 disk ARCHDG_0000
NOTE: initiating PST update: grp = 2
Thu Mar 19 14:28:24 2020
GMON updating group 2 at 5 for pid 21, osid 3585
NOTE: group ARCHDG: initial PST location: disk 0000 (PST copy 0) -----<<<<copy 0确定只有一份,而且再disk0 上面,有前面日志看出,0号盘就是(/dev/asm_arch01),所以PST在/dev/asm_arch01上
NOTE: PST update grp = 2 completed successfully
。。。。。。。。

normal

SQL> create diskgroup archdg normal redundancy disk ‘/dev/asm_arch01’,’/dev/asm_arch02’,’/dev/asm_arch03’
NOTE: Assigning number (2,0) to disk (/dev/asm_arch01) ------<<<<<(2,0)=v$asm_disk.GROUP_NUMBER为2,v$asm_disk.DISK_NUMBER为0
NOTE: Assigning number (2,1) to disk (/dev/asm_arch02)
NOTE: Assigning number (2,2) to disk (/dev/asm_arch03)
NOTE: initializing header on grp 2 disk ARCHDG_0000
NOTE: initializing header on grp 2 disk ARCHDG_0001
NOTE: initializing header on grp 2 disk ARCHDG_0002
Mon Mar 23 14:39:28 2020
GMON updating for reconfiguration, group 2 at 5 for pid 22, osid 4059
NOTE: group 2 PST updated.
NOTE: initiating PST update: grp = 2
GMON updating group 2 at 6 for pid 22, osid 4059
NOTE: group ARCHDG: initial PST location: disk 0000 (PST copy 0)***—<<<copy 0副本1***
NOTE: group ARCHDG: initial PST location: disk 0001 (PST copy 1)***—<<<copy 1副本2***
NOTE: group ARCHDG: initial PST location: disk 0002 (PST copy 2)***—<<<copy 2副本3***
NOTE: PST update grp = 2 completed successfully
。。。。。
这次我们看到创建了3份PST表,分别位于disk 0000,disk 0001,disk 0002,即/dev/asm_arch01,/dev/asm_arch02,/dev/asm_arch03。

high

待验证。

PST包含的内容

可以通过以下两种方式查看pst的具体内容:
1)gmon trace 每次尝试mount磁盘组时,GMON trace文件就会记录PST信息。如果已经mount上,不会再刷新pst信息
2)kfed read读取AU1固定的位置。

  1. gmon trace

先通过第一种方法查看,先dismount testdg。然后再mount的时候,输出gmon日志,观察内容
这里说下进程gmon的作用:
GMON(asm disk group monitor):磁盘组监控进程。这个进程负责维护磁盘组中各个磁盘状态的一致性。当磁盘组中的磁盘成员发生该笔那时(例如:添加、删除磁盘,或者磁盘出现故障),该进程负责离线(offline)或者上线(online)磁盘。可以说。这是ASM实例中最重要的后台进程之一。相关元数据的操作都是该进程来完成的

gmon监控PST的信息只有在磁盘mount的时候,会记录。当其他元数据发生变化的时候,gmon也会输出日志)。
tail -200f +ASM_gmon_7107.trc

*** 2020-06-19 14:53:39.484
Mode of disk 0 (TESTDG_0000) in grp 2 changing from 0x9 to 0x7f
Mode of disk 1 (TESTDG_0001) in grp 2 changing from 0x9 to 0x7f
POST res = 12 
=============== PST ==================== 
grpNum:    2 
state:     1 
callCnt:   17 
(lockvalue) valid=1 ver=0.0 ndisks=1 flags=0x3 from inst=0 (I am 1) last=0
--------------- HDR -------------------- 
next:    5 
last:    5 
pst count:       1 
pst locations:   0 
incarn:          1 
dta size:        3 
version:         1 
ASM version:     186646528 = 11.2.0.0.0
contenttype:     1
partnering pattern:      [ ]
--------------- LOC MAP ---------------- 
0: dirty 0       cur_loc: 0      stable_loc: 0
1: dirty 0       cur_loc: 0      stable_loc: 0
--------------- DTA -------------------- 
0: sts v v(rw) p(rw) a(x) d(x) fg# = 1 addTs = 2443622938 parts: 
1: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2443622938 parts: 
--------------- HBEAT ------------------                   《《《《磁盘心跳
kfdpHbeat_dump: state=3, inst=1, ts=33102446.3598159872, 
        rnd=3909470396.2766819163.4244887513.759222249.
kfk io-queue:    0x7ffff628eb98
kfdpHbeatCB_dump: at 0x7ffff628eb88 with ts=06/19/2020 14:53:39 iop=0x7ffff628eb98, grp=2, disk=0/3914523479, isWrite=1 Hbeat #46 state=2 iostate=4
--------------- KFGRP ------------------                 《《《fail group信息
kfgrp: TESTDG number: 2/3261207486 type: 1 compat: 11.2.0.0.0 dbcompat:1.0.0.0.1
timestamp: 431172646 state: 4 flags: 10 gpnlist: a0bb7ee8 a0bb7ee8
KFGPN at a0bb7e28 in dependent chain
  kfdsk:0xa0ba7e60
  disk: TESTDG_0000 num: 0/7084547265881367383 grp: 2/7084547265228051390 compat: 11.2.0.0.0 dbcompat:10.1.0.0.0
  fg: TESTDG_0000 path: /dev/asm_arch01
  mnt: O hdr: M mode: v v(rw) p(rw) a(x) d(x) sta: N flg: 1001
  slot 65535 ddeslot 65535 numslots 65535 dtype 0 enc 0 part 0 flags 0
    kfts: 2020/06/19 11:40:26.454000
    kfts: 2020/06/19 11:47:01.479000
  pcnt: 0 ()
    kfkid: 0xa0bd5a98, kfknm: , status: IDENTIFIED
      fob: (KSFD)a2002c18, magic: bebe ausize: 4194304
    kfdds: dn=0 inc=3914523479 dsk=0xa0ba7e60 usrp=(nil)
      kfkds 0x7ffff60395d8, kfkid 0xa0bd5a98, magic abbe, libnum 0, bpau 8192, fob 0xa20035b0
  kfdsk:0xa0ba7048
  disk: TESTDG_0001 num: 1/7084547265881367382 grp: 2/7084547265228051390 compat: 11.2.0.0.0 dbcompat:10.1.0.0.0
  fg: TESTDG_0001 path: /dev/asm_arch02
  mnt: O hdr: M mode: v v(rw) p(rw) a(x) d(x) sta: N flg: 1001
  slot 65535 ddeslot 65535 numslots 65535 dtype 0 enc 0 part 0 flags 0
    kfts: 2020/06/19 11:40:26.454000
    kfts: 2020/06/19 11:47:01.479000
  pcnt: 0 ()
    kfkid: 0xa0bd4ae0, kfknm: , status: IDENTIFIED
      fob: (KSFD)a2002758, magic: bebe ausize: 4194304
    kfdds: dn=1 inc=3914523478 dsk=0xa0ba7048 usrp=(nil)
      kfkds 0x7ffff6039530, kfkid 0xa0bd4ae0, magic abbe, libnum 0, bpau 8192, fob 0xa20036e0

gmon的trace信息以====或者-------分割,= PST ===标记的部分显示了磁盘组号(grpNum),类型(grpTyp),和状态。— HDR —标记的部分显示了PST的副本数(pst count)和所在的磁盘号。— DTA —显示了PST所在磁盘的状态。PST(Partnership and Status Table)包含一个磁盘组里所有磁盘的相关信息 – 磁盘号,磁盘状态,partner磁盘号,心跳信息和failgroup信息(11g之后)。

  1. kfed read
    每个ASM磁盘的1号AU为PST表预留,但是只有部分磁盘有PST数据。由于PST是重要的ASM元数据,只要失败组个数足够,在normal冗余中将会有三副本,在high冗余中将会有五副本。
    直接读取AU1全部block即可。

磁盘心跳位置

根据前面创建的external磁盘组,磁盘组状态如下:

Disk Group Name  Fail Group      Path                              File Name                 Status       Status  Status  TYPE      File Size (MB) Used Size (MB) Pct. Used
---------------- --------------- --------------------------------- ------------------------- ------------ ------- ------- --------- -------------- -------------- ---------
TESTDG           TESTDG_0000     /dev/asm_arch01                   TESTDG_0000               MEMBER       CACHED  ONLINE  REGULAR            3,072             40      1.30
                 ***************                                                                                                    -------------- --------------
                 TOTAL                                                                                                                       3,072             40

                 TESTDG_0001     /dev/asm_arch02                   TESTDG_0001               MEMBER       CACHED  ONLINE  REGULAR            2,048             24      1.17
                 ***************                                                                                                    -------------- --------------
                 TOTAL                                                                                                                       2,048             24

                 TESTDG_0002     /dev/asm_arch03                   TESTDG_0002               MEMBER       CACHED  ONLINE  REGULAR            2,048             32      1.56
                 ***************                                                                                                    -------------- --------------
                 TOTAL                                                                                                                       2,048             32

****************                                                                                                                    -------------- --------------
TOTAL                                                                                                                                        7,168             96

asm的disk心跳验证,oracle正是通过这个来判断某个disk是否属于某个磁盘组,以防止其他磁盘组将该disk挂载,尤其实在HA环境中,在主机一边挂载磁盘组的时候,一定要冻结另一个备机,就可以用这个字段来验证,这个磁盘组是否有主机在挂载。
查看每个磁盘组里面的磁盘心跳状态:
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 19 ; 0x002: KFBTYP_HBEAT -----<<<磁盘心跳标志位
kfbh.datfmt: 2 ; 0x003: 0x02
kfbh.block.blk: 2047 ; 0x004: blk=2047
kfbh.block.obj: 2147483648 ; 0x008: disk=0
kfbh.check: 4118038751 ; 0x00c: 0xf57448df
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdpHbeatB.instance: 1 ; 0x000: 0x00000001
kfdpHbeatB.ts.hi: 33102442 ; 0x004: HOUR=0xa DAYS=0x13 MNTH=0x6 YEAR=0x7e4
kfdpHbeatB.ts.lo: 3597499392 ; 0x008: USEC=0x0 MSEC=0x35f SECS=0x26 MINS=0x35
kfdpHbeatB.rnd[0]: 3372924585 ; 0x00c: 0xc90abea9
kfdpHbeatB.rnd[1]: 704815548 ; 0x010: 0x2a02a1bc
kfdpHbeatB.rnd[2]: 3651789415 ; 0x014: 0xd9a9e267
kfdpHbeatB.rnd[3]: 2589087288 ; 0x018: 0x9a525638
[grid@11gasm ~]$ kfed read /dev/asm_arch02 aun=1 blkn=1023 aus=4194304|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2147484671 ; 0x004: blk=1023 (indirect)
kfbh.block.obj: 2147483649 ; 0x008: disk=1
kfbh.check: 17662463 ; 0x00c: 0x010d81ff
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error message
[grid@11gasm ~]$ kfed read /dev/asm_arch03 aun=1 blkn=1023 aus=4194304|more
kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE
可以看的出来,并不是每个asm磁盘上面都有心跳盘,只有每个PST的最后一个block是磁盘心跳
磁盘心跳也属于PST元数据的一部分,位置和PST的位置保持一致。

[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=0 aus=4194304|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META    ---这表示元数据block类型
kfbh.datfmt:                          2 ; 0x003: 0x02               ---<<<<表示元数据block格式
kfbh.block.blk:                    1024 ; 0x004: blk=1024           
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2952973672 ; 0x00c: 0xb002cd68
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdpHdrPairBv1.first.super.time.hi:33102443 ; 0x000: HOUR=0xb DAYS=0x13 MNTH=0x6 YEAR=0x7e4      ---<<<<最后一次pst元数据的更新时间,这是高位
kfdpHdrPairBv1.first.super.time.lo:969494528 ; 0x004: USEC=0x0 MSEC=0x254 SECS=0x1c MINS=0xe     ---<<<<最后一次pst元数据的更新时间,这是地位
kfdpHdrPairBv1.first.super.last:      2 ; 0x008: 0x00000002          ---<<<<最新的disk版本号
kfdpHdrPairBv1.first.super.next:      2 ; 0x00c: 0x00000002          ---<<<<下一个disk版本号
kfdpHdrPairBv1.first.super.copyCnt:   1 ; 0x010: 0x01                ---<<<<表示PST元数据的备份信息,如果是1表示磁盘组是external冗余。如果是3那么则是normal。
kfdpHdrPairBv1.first.super.version:   1 ; 0x011: 0x01
kfdpHdrPairBv1.first.super.ub2spare:  0 ; 0x012: 0x0000
kfdpHdrPairBv1.first.super.incarn:    1 ; 0x014: 0x00000001
kfdpHdrPairBv1.first.super.copy[0]:   0 ; 0x018: 0x0000
kfdpHdrPairBv1.first.super.copy[1]:   0 ; 0x01a: 0x0000
kfdpHdrPairBv1.first.super.copy[2]:   0 ; 0x01c: 0x0000
kfdpHdrPairBv1.first.super.copy[3]:   0 ; 0x01e: 0x0000
kfdpHdrPairBv1.first.super.copy[4]:   0 ; 0x020: 0x0000
kfdpHdrPairBv1.first.super.dtaSz:     3 ; 0x022: 0x0003              ---<<<<表示pst元数据中,diskgroup所包含的disk个数
kfdpHdrPairBv1.first.asmCompat:186646528 ; 0x024: 0x0b200000
kfdpHdrPairBv1.first.newCopy[0]:      0 ; 0x028: 0x0000
kfdpHdrPairBv1.first.newCopy[1]:      0 ; 0x02a: 0x0000
kfdpHdrPairBv1.first.newCopy[2]:      0 ; 0x02c: 0x0000
kfdpHdrPairBv1.first.newCopy[3]:      0 ; 0x02e: 0x0000
kfdpHdrPairBv1.first.newCopy[4]:      0 ; 0x030: 0x0000
kfdpHdrPairBv1.first.newCopyCnt:      0 ; 0x032: 0x00
kfdpHdrPairBv1.first.contType:        1 ; 0x033: 0x01
kfdpHdrPairBv1.first.spare0:          0 ; 0x034: 0x00000000
kfdpHdrPairBv1.first.ppat[0]:         0 ; 0x038: 0x0000
kfdpHdrPairBv1.first.ppat[1]:         0 ; 0x03a: 0x0000
kfdpHdrPairBv1.first.ppat[2]:         0 ; 0x03c: 0x0000
kfdpHdrPairBv1.first.ppat[3]:         0 ; 0x03e: 0x0000
kfdpHdrPairBv1.first.ppatsz:          0 ; 0x040: 0x00
kfdpHdrPairBv1.first.spare1:          0 ; 0x041: 0x00
kfdpHdrPairBv1.first.spare2:          0 ; 0x042: 0x0000
kfdpHdrPairBv1.first.spares[0]:       0 ; 0x044: 0x00000000

[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1 aus=4194304|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                    1025 ; 0x004: blk=1025
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2952974697 ; 0x00c: 0xb002d169
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdpHdrPairBv1.first.super.time.hi:33102443 ; 0x000: HOUR=0xb DAYS=0x13 MNTH=0x6 YEAR=0x7e4
kfdpHdrPairBv1.first.super.time.lo:969493504 ; 0x004: USEC=0x0 MSEC=0x253 SECS=0x1c MINS=0xe
kfdpHdrPairBv1.first.super.last:      1 ; 0x008: 0x00000001
kfdpHdrPairBv1.first.super.next:      2 ; 0x00c: 0x00000002
kfdpHdrPairBv1.first.super.copyCnt:   1 ; 0x010: 0x01
kfdpHdrPairBv1.first.super.version:   1 ; 0x011: 0x01
kfdpHdrPairBv1.first.super.ub2spare:  0 ; 0x012: 0x0000
kfdpHdrPairBv1.first.super.incarn:    1 ; 0x014: 0x00000001
kfdpHdrPairBv1.first.super.copy[0]:   0 ; 0x018: 0x0000
kfdpHdrPairBv1.first.super.copy[1]:   0 ; 0x01a: 0x0000
kfdpHdrPairBv1.first.super.copy[2]:   0 ; 0x01c: 0x0000
kfdpHdrPairBv1.first.super.copy[3]:   0 ; 0x01e: 0x0000
kfdpHdrPairBv1.first.super.copy[4]:   0 ; 0x020: 0x0000
kfdpHdrPairBv1.first.super.dtaSz:     3 ; 0x022: 0x0003
kfdpHdrPairBv1.first.asmCompat:186646528 ; 0x024: 0x0b200000
kfdpHdrPairBv1.first.newCopy[0]:      0 ; 0x028: 0x0000
kfdpHdrPairBv1.first.newCopy[1]:      0 ; 0x02a: 0x0000
kfdpHdrPairBv1.first.newCopy[2]:      0 ; 0x02c: 0x0000
kfdpHdrPairBv1.first.newCopy[3]:      0 ; 0x02e: 0x0000
kfdpHdrPairBv1.first.newCopy[4]:      0 ; 0x030: 0x0000
kfdpHdrPairBv1.first.newCopyCnt:      0 ; 0x032: 0x00
kfdpHdrPairBv1.first.contType:        1 ; 0x033: 0x01
kfdpHdrPairBv1.first.spare0:          0 ; 0x034: 0x00000000
kfdpHdrPairBv1.first.ppat[0]:         0 ; 0x038: 0x0000
kfdpHdrPairBv1.first.ppat[1]:         0 ; 0x03a: 0x0000
kfdpHdrPairBv1.first.ppat[2]:         0 ; 0x03c: 0x0000
kfdpHdrPairBv1.first.ppat[3]:         0 ; 0x03e: 0x0000
kfdpHdrPairBv1.first.ppatsz:          0 ; 0x040: 0x00
kfdpHdrPairBv1.first.spare1:          0 ; 0x041: 0x00
kfdpHdrPairBv1.first.spare2:          0 ; 0x042: 0x0000
kfdpHdrPairBv1.first.spares[0]:       0 ; 0x044: 0x00000000

[grid@11gasm ~]$ kfed read /dev/asm_arch02 aun=1 blkn=0 aus=4194304|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:              2147483648 ; 0x004: blk=0 (indirect)
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                    17662464 ; 0x00c: 0x010d8200
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error message
[grid@11gasm ~]$ kfed read /dev/asm_arch02 aun=1 blkn=1 aus=4194304|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:              2147483649 ; 0x004: blk=1 (indirect)
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                    17662465 ; 0x00c: 0x010d8201
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error message
[grid@11gasm ~]$ kfed read /dev/asm_arch03 aun=1 blkn=0 aus=4194304|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:              2147483648 ; 0x004: blk=0 (indirect)
kfbh.block.obj:              2147483650 ; 0x008: disk=2
kfbh.check:                    17662467 ; 0x00c: 0x010d8203
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error message
[grid@11gasm ~]$ kfed read /dev/asm_arch03 aun=1 blkn=1 aus=4194304|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:              2147483649 ; 0x004: blk=1 (indirect)
kfbh.block.obj:              2147483650 ; 0x008: disk=2
kfbh.check:                    17662466 ; 0x00c: 0x010d8202
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error messa    《《《《无效信息

进一步验证,PST只在asm磁盘0上面,1和2上面都没有

PST位置发生变化

PST在下列情况下会再分配:
• PST所在磁盘不可用(ASM实例启动时)
• 磁盘offline
• PST的读写发生IO错误
• 磁盘被正常的删除掉(手工或后台)
这些情况下,如果相同失败组中有其他可用的磁盘,PST将会被重分配到其他磁盘;否则会重分配到其他没有PST副本的失败组。

基于以上testdg磁盘组,模拟删除磁盘,验证PST是否发生变化:

  1. PST位置不发生变化
SQL> alter diskgroup testdg drop disk TESTDG_0002;

Diskgroup altered.

asm alert日志:

Fri Jun 19 12:00:58 2020
SQL> alter diskgroup testdg drop disk TESTDG_0002 
NOTE: requesting all-instance membership refresh for group=2
Fri Jun 19 12:01:01 2020
NOTE: membership refresh pending for group 2/0x43a213ba (TESTDG)
Fri Jun 19 12:01:04 2020
GMON querying group 2 at 9 for pid 13, osid 7105
SUCCESS: refreshed membership for 2/0x43a213ba (TESTDG)
SUCCESS: alter diskgroup testdg drop disk TESTDG_0002                    ------<<<<<drop已经成功,前台会话返回删除成功信息。
NOTE: starting rebalance of group 2/0x43a213ba (TESTDG) at power 1
Starting background process ARB0               													 ------<<<<<启动arbn进程,该进程主要负责在磁盘组中进行数据extent的重平衡,可以有arb0-arb9和arba。实际真正干活的
Fri Jun 19 12:01:04 2020
ARB0 started with pid=20, OS id=7348 
NOTE: assigning ARB0 to group 2/0x43a213ba (TESTDG) with 1 parallel I/O
cellip.ora not found.
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 2/0x43a213ba (TESTDG)              ------<<<<<rebalance完成      
NOTE: Attempting voting file refresh on diskgroup TESTDG
Fri Jun 19 12:01:07 2020
NOTE: requesting all-instance membership refresh for group=2
GMON updating for reconfiguration, group 2 at 10 for pid 20, osid 7384
NOTE: group 2 PST updated.
SUCCESS: grp 2 disk TESTDG_0002 emptied
NOTE: erasing header on grp 2 disk TESTDG_0002
NOTE: process _x000_+asm (7384) initiating offline of disk 2.3914523473 (TESTDG_0002) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 2/0xe952e351, mask = 0x6a, op = clear   ------<<<<<开始更新PST,group2disk2进行clear
GMON updating disk modes for group 2 at 11 for pid 20, osid 7384
NOTE: PST update grp = 2 completed successfully 
NOTE: initiating PST update: grp = 2, dsk = 2/0xe952e351, mask = 0x7e, op = clear
GMON updating disk modes for group 2 at 12 for pid 20, osid 7384
NOTE: cache closing disk 2 of grp 2: TESTDG_0002
NOTE: PST update grp = 2 completed successfully 												------<<<<<PST更新完成
GMON updating for reconfiguration, group 2 at 13 for pid 20, osid 7384
NOTE: cache closing disk 2 of grp 2: (not open) TESTDG_0002
NOTE: group 2 PST updated.
NOTE: membership refresh pending for group 2/0x43a213ba (TESTDG)
GMON querying group 2 at 14 for pid 13, osid 7105
GMON querying group 2 at 15 for pid 13, osid 7105
Fri Jun 19 12:01:13 2020
NOTE: Disk TESTDG_0002 in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 2/0x43a213ba (TESTDG)
NOTE: Attempting voting file refresh on diskgroup TESTDG

这里补充下RBAL和ARBn进程的作用:
RBAL(ASM rebalance master):rebalance主进程。这个进程负责协调磁盘组的rebalance操作。而具体的平衡磁盘组extent的操作是由ARBn进程完成的。在数据库实例中也存在RBAL进程,但是它的功能和ASM实例中的RBAL是不同的。数据库中的RBAL进程负责为数据库管理ASM磁盘。
通过他们的trc也可以看出点内容:
abr的trc

[root@11gasm trace]# tail -200f +ASM_arb0_7348.trc
Trace file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_arb0_7348.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Automatic Storage Management option
ORACLE_HOME = /u01/app/grid/11.2.0/grid
System name:    Linux
Node name:      11gasm
Release:        2.6.32-642.el6.x86_64
Version:        #1 SMP Wed Apr 13 00:51:26 EDT 2016
Machine:        x86_64
VM name:        VMWare Version: 6
Instance name: +ASM
Redo thread mounted by this instance: 0 <none>
Oracle process number: 20
Unix process pid: 7348, image: oracle@11gasm (ARB0)


*** 2020-06-19 12:01:04.766
*** SESSION ID:(10.1) 2020-06-19 12:01:04.766
*** CLIENT ID:() 2020-06-19 12:01:04.766
*** SERVICE NAME:() 2020-06-19 12:01:04.766
*** MODULE NAME:() 2020-06-19 12:01:04.766
*** ACTION NAME:() 2020-06-19 12:01:04.766
 
ARB0 relocating file +TESTDG.2.1 (1 entries)
ARB0 relocating file +TESTDG.3.1 (5 entries)
ARB0 relocating file +TESTDG.5.1 (1 entries)
ARB0 relocating file +TESTDG.5.1 (1 entries)
ARB0 relocating file +TESTDG.3.1 (1 entries)      -------<<<<<relocate文件

上面abrn的trc文件出现的relocate过程,属于rebalance3个过程中的第二个阶段,而且通过trc可以看到,rebalance是以文件(relocating file)为基本操作单位的,跟过arb进程的跟踪文件可以观察到。第一阶段-rebalance plan,ASM会计算出重平衡的计划。计划取决于很多因素,例如磁盘组大小、磁盘组中的文件个数、磁盘的partnership是否需要调整等等。这个过程时间不会太长,一般不会超过几分钟。
第二阶段-extent relocating,是真正干活的阶段,这个阶段,ASM的区会在磁盘组中的磁盘间移动,这个过程会花费大部分的时间。这个过程中,ASM会记录区的移动数量,以及实际的I/O性能,从而估算该过程需要花费的时间(GV A S M O P E R A T I O N . E S T M I N U T E S 记 录 估 算 出 的 时 间 ) 。 不 过 要 注 意 的 是 , 这 只 是 估 算 的 时 间 , 真 正 的 花 费 时 间 还 取 决 于 整 体 负 载 ( 特 别 是 磁 盘 相 关 的 负 载 ) 。 如 果 重 平 衡 是 由 于 磁 盘 组 中 的 一 个 或 者 多 个 磁 盘 损 坏 造 成 的 , 那 么 这 个 阶 段 还 会 对 不 满 足 冗 余 度 要 求 的 数 据 做 镜 像 。 第 三 阶 段 是 磁 盘 的 c o m p a c t i n g 阶 段 ( A S M 11.1.0.7 版 本 及 以 上 支 持 ) , 这 个 过 程 是 将 磁 盘 上 存 的 数 据 尽 可 能 的 移 动 到 磁 盘 的 外 圈 磁 道 上 去 ( 机 械 盘 的 外 圈 速 度 更 快 ) , 以 提 供 更 高 的 性 能 。 需 要 注 意 的 是 , 在 这 个 阶 段 , G V ASM_OPERATION.EST_MINUTES记录估算出的时间)。不过要注意的是,这只是估算的时间,真正的花费时间还取决于整体负载(特别是磁盘相关的负载)。如果重平衡是由于磁盘组中的一个或者多个磁盘损坏造成的,那么这个阶段还会对不满足冗余度要求的数据做镜像。 第三阶段是磁盘的compacting阶段(ASM 11.1.0.7版本及以上支持),这个过程是将磁盘上存的数据尽可能的移动到磁盘的外圈磁道上去(机械盘的外圈速度更快),以提供更高的性能。需要注意的是,在这个阶段,GV ASMOPERATION.ESTMINUTES)compactingASM11.1.0.7GVASM_OPERATION.EST_MINUTES会一直显示为0,这个显示为0的“特性”将来可能会被优化。这个阶段所花费的时间,取决于磁盘组中磁盘的数量、重平衡的原因等等,一般情况下,这个阶段花费的时间会比第二阶段短很多。
rbal的trc感觉就是一些调度信息:

[root@11gasm trace]# tail -200f +ASM_rbal_2373.trc
Stale disk (2.1) is being freed
Stale disk (2.2) is being freed
KGGPNP_SIHA: attribute value for 'ASM_DISKSTRING' is '/dev/asm*'
kggpnpSIHAGetItem 2 = /dev/asm* 

*** 2020-06-19 11:40:26.422
KGGPNP_SIHA: attribute value for 'ASM_DISKSTRING' is '/dev/asm*'
kggpnpSIHAGetItem 2 = /dev/asm* 

*** 2020-06-19 11:40:27.796
KGGPNP_SIHA: attribute value for 'ASM_DISKSTRING' is '/dev/asm*'
kggpnpSIHAGetItem 2 = /dev/asm* 

*** 2020-06-19 11:40:33.811
kfgbRegister: registering group 2/0xCD685734 (TESTDG)
kfgbBind: binding kfgpn for group 2/0xCD685734 (TESTDG)
kfdp_query(TESTDG): 41 
----- Abridged Call Stack Trace -----
ksedsts()+465<-kfdp_query()+530<-kfiRefresh()+147<-kfi_set_cb()+2149<-kfeInvokeLockCallback()+2733<-kfgbAttrLckRegister()+309<-kfgbDriver()+2264<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201
<-__libc_start_main()+253 
----- End of Abridged Call Stack Trace -----
  1. PST位置发生变化
SQL>  alter diskgroup testdg drop disk TESTDG_0000 ;

Diskgroup altered.

asm alert日志:
Starting background process ARB0
Fri Jun 19 20:07:41 2020
ARB0 started with pid=22, OS id=11147
NOTE: assigning ARB0 to group 2/0x811213c7 (TESTDG) with 1 parallel I/O
cellip.ora not found.
NOTE: F1X0 copy 1 relocating from 0:2 to 1:9 for diskgroup 2 (TESTDG)
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 2/0x811213c7 (TESTDG)
NOTE: Attempting voting file refresh on diskgroup TESTDG
Fri Jun 19 20:07:44 2020
NOTE: requesting all-instance membership refresh for group=2
GMON updating for reconfiguration, group 2 at 37 for pid 21, osid 11086
NOTE: group 2 PST updated.
SUCCESS: grp 2 disk TESTDG_0000 emptied
NOTE: erasing header on grp 2 disk TESTDG_0000
NOTE: process x000+asm (11086) initiating offline of disk 0.3914523490 (TESTDG_0000) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 0/0xe952e362, mask = 0x6a, op = clear
GMON updating disk modes for group 2 at 38 for pid 21, osid 11086
NOTE: group TESTDG: updated PST location: disk 0001 (PST copy 0) ----《《《PST位置有disk0变为了disk 1
NOTE: PST update grp = 2 completed successfully
NOTE: initiating PST update: grp = 2, dsk = 0/0xe952e362, mask = 0x7e, op = clear
GMON updating disk modes for group 2 at 39 for pid 21, osid 11086
NOTE: cache closing disk 0 of grp 2: TESTDG_0000
NOTE: PST update grp = 2 completed successfully
GMON updating for reconfiguration, group 2 at 40 for pid 21, osid 11086
NOTE: cache closing disk 0 of grp 2: (not open) TESTDG_0000
NOTE: group 2 PST updated.
NOTE: membership refresh pending for group 2/0x811213c7 (TESTDG)
GMON querying group 2 at 41 for pid 13, osid 7105
GMON querying group 2 at 42 for pid 13, osid 7105
Fri Jun 19 20:07:50 2020
NOTE: Disk TESTDG_0000 in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 2/0x811213c7 (TESTDG)
NOTE: Attempting voting file refresh on diskgroup TESTDG

gmon trace

*** 2020-06-19 20:07:47.762
GMON updating for reconfiguration, group 2 at 37 for pid 21, osid 11086
PRE
=============== PST ==================== 
grpNum:    2 
state:     1 
callCnt:   37 
(lockvalue) valid=1 ver=0.0 ndisks=1 flags=0x3 from inst=0 (I am 1) last=0
。。。。。。。
NOTE: GMON selects PST disk TESTDG_0001 in failgroup TESTDG_0001    ----《《《gmon也显示PST表更新到了disk1上面。
。。。。。。。

strace跟踪gmon进程观察PST的变化

如前面所述,gmon进程和PST有密切的关系,可以通过gmon的trc观察PST的内容。下面我们可以使用操作系统命令strace,来具体观察gmon进程干了什么?
strace开启:
[root@11gasm trace]# ps -ef|grep gmon
grid 7107 1 0 11:46 ? 00:00:02 asm_gmon_+ASM
root 8319 5955 0 14:38 pts/3 00:00:00 grep gmon
[root@11gasm trace]# strace -fr -o /tmp/7107.log -p 7107
more /tmp/7107.log

。。。。。。。。
7107       0.000023 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107       0.000046 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200s\300.$\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000128 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200i=(W\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000096 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107       0.000027 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137345
7107       0.000033 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107       3.001138 getrusage(RUSAGE_SELF, {ru_utime={0, 22996}, ru_stime={0, 242963}, ...}) = 0
7107       0.000078 getrusage(RUSAGE_SELF, {ru_utime={0, 22996}, ru_stime={0, 242963}, ...}) = 0
7107       0.000083 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107       0.000077 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107       0.000179 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107       0.000062 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000060 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107       0.000097 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107       0.000068 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107       0.000046 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000050 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107       0.000090 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107       0.000149 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200..\220>\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000527 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107       0.000130 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200|\242W\275\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000299 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137645
7107       0.000183 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = 0
7107       2.921696 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137937
7107       0.000023 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137937
7107       0.000024 write(23, "\n*** 2020-06-19 12:01:10.804\n", 29) = 29
7107       0.000030 lseek(23, 0, SEEK_CUR) = 7149
7107       0.000013 write(23, "GMON updating for reconfiguratio"..., 70) = 70
7107       0.000016 write(24, "J?ikSN~U61\n", 11) = 11
7107       0.000017 write(23, "\n", 1)  = 1
7107       0.000023 lstat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=342530, ...}) = 0
7107       0.000026 stat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=342530, ...}) = 0
7107       0.000043 lstat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=342530, ...}) = 0
7107       0.000020 open("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", O_WRONLY|O_CREAT|O_APPEND, 0660) = 25
7107       0.000019 fcntl(25, F_SETFD, FD_CLOEXEC) = 0
7107       0.000015 write(25, "<msg time='2020-06-19T12:01:10.8"..., 268) = 268
7107       0.000018 close(25)           = 0
7107       0.000019 stat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=342798, ...}) = 0
7107       0.000017 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137937
7107       0.000049 lstat("/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log", {st_mode=S_IFREG|0640, st_size=75068, ...}) = 0
7107       0.000023 open("/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 25
7107       0.000017 fcntl(25, F_SETFD, FD_CLOEXEC) = 0
7107       0.000013 write(25, "GMON updating for reconfiguratio"..., 70) = 70
7107       0.000043 write(25, "\n", 1)  = 1
7107       0.000017 close(25)           = 0
7107       0.000015 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137937
7107       0.000135 lseek(23, 0, SEEK_CUR) = 7220
7107       0.000017 write(23, "PRE", 3) = 3
7107       0.000015 write(24, "3?w9~13\n", 8) = 8
7107       0.000015 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "=============== PST ============"..., 41) = 41
7107       0.000014 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "grpNum:    2 ", 13) = 13
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "state:     1 ", 13) = 13
7107       0.000012 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "callCnt:   10 ", 14) = 14
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000016 write(23, "(lockvalue) valid=1 ver=0.0 ndis"..., 74) = 74
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "--------------- HDR ------------"..., 41) = 41
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "next: \t 2 ", 10) = 10
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "last: \t 2 ", 10) = 10
7107       0.000012 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "pst count: \t 1 ", 15) = 15
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "pst locations: \t 0 ", 19) = 19
7107       0.000014 write(23, "\n", 1)  = 1
7107       0.000013 write(23, "incarn:   \t 1 ", 14) = 14
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000013 write(23, "dta size: \t 3 ", 14) = 14
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "version: \t 1 ", 13) = 13
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "ASM version: \t 186646528 = 11.2."..., 37) = 37
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "contenttype: \t 1", 16) = 16
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "partnering pattern: \t [ ]", 25) = 25
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000013 write(23, "--------------- LOC MAP --------"..., 41) = 41
7107       0.000014 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "0: dirty 0 \t cur_loc: 0 \t stable"..., 39) = 39
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "1: dirty 0 \t cur_loc: 1 \t stable"..., 39) = 39
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "--------------- DTA ------------"..., 41) = 41
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000017 write(23, "0: sts v v(rw) p(rw) a(x) d(x) f"..., 65) = 65
7107       0.000017 write(23, "\n", 1)  = 1
7107       0.000016 write(23, "1: sts v v(rw) p(rw) a(x) d(x) f"..., 65) = 65
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000036 write(23, "2: sts v v(rw) p(rw) a(x) d(x) f"..., 65) = 65
7107       0.000017 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "--------------- HBEAT ----------"..., 41) = 41
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000015 lseek(23, 0, SEEK_CUR) = 8013
7107       0.000011 write(23, "kfdpHbeat_dump: state=1, inst=1,"..., 55) = 55
7107       0.000013 write(24, "7?jB~MCt\n", 9) = 9
7107       0.000014 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "\trnd=141522539.1586440573.197865"..., 48) = 48
7107       0.000014 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "kfk io-queue:    (nil)", 22) = 22
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000045 write(23, "kfdpHbeatCB_dump: at 0x7ffff628e"..., 147) = 147
7107       0.000024 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "--------------- KFGRP ----------"..., 41) = 41
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "kfgrp: TESTDG number: 2/11346953"..., 81) = 81
7107       0.000014 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "timestamp: 430052410 state: 5 fl"..., 68) = 68
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "KFGPN at a0bb8410 in dependent c"..., 36) = 36
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000013 write(23, "KFGPN at a0bb7e28 in dependent c"..., 36) = 36
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000013 write(23, "KFGPN at a0bb8020 in dependent c"..., 36) = 36
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000016 write(23, "  kfdsk:0xa0ba7e60", 18) = 18
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "  disk: TESTDG_0000 num: 0/70845"..., 112) = 112
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "  fg: TESTDG_0000 path: /dev/asm"..., 39) = 39
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "  mnt: C hdr: M mode: v v(rw) p("..., 62) = 62
7107       0.000014 write(23, "\n", 1)  = 1
7107       0.000041 write(23, "  slot 65535 ddeslot 65535 numsl"..., 70) = 70
7107       0.000017 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "    kfts: 2020/06/19 11:40:26.45"..., 36) = 36
7107       0.000014 write(23, "\n", 1)  = 1
7107       0.000013 write(23, "    kfts: 2020/06/19 11:47:01.47"..., 36) = 36
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "  pcnt: 0 ()", 12) = 12
7107       0.000012 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "    kfkid: 0xa0bd5a98, kfknm: , "..., 50) = 50
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "      fob: (KSFD)a2002c18, magic"..., 54) = 54
7107       0.000012 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "    kfdds: dn=0 inc=3914523474 d"..., 56) = 56
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "      kfkds 0x7ffff60395d8, kfki"..., 93) = 93
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "  kfdsk:0xa0ba7048", 18) = 18
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "  disk: TESTDG_0001 num: 1/70845"..., 112) = 112
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "  fg: TESTDG_0001 path: /dev/asm"..., 39) = 39
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "  mnt: C hdr: M mode: v v(rw) p("..., 62) = 62
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "  slot 65535 ddeslot 65535 numsl"..., 70) = 70
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "    kfts: 2020/06/19 11:40:26.45"..., 36) = 36
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "    kfts: 2020/06/19 11:47:01.47"..., 36) = 36
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000013 write(23, "  pcnt: 0 ()", 12) = 12
7107       0.000017 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "    kfkid: 0xa0bd4ae0, kfknm: , "..., 50) = 50
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000035 write(23, "      fob: (KSFD)a2002758, magic"..., 54) = 54
7107       0.000018 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "    kfdds: dn=1 inc=3914523472 d"..., 56) = 56
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "      kfkds 0x7ffff6039530, kfki"..., 93) = 93
7107       0.000014 lseek(23, 0, SEEK_CUR) = 9892
7107       0.000011 write(23, "\n", 1)  = 1
7107       0.000013 write(24, "7?IG~WS1\n", 9) = 9
7107       0.000015 write(23, "  kfdsk:0xa0ba7ac8", 18) = 18
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "  disk: TESTDG_0002 num: 2/70845"..., 112) = 112
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "  fg: TESTDG_0002 path: /dev/asm"..., 39) = 39
7107       0.000012 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "  mnt: C hdr: M mode: v v(rw) p("..., 62) = 62
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000015 write(23, "  slot 65535 ddeslot 65535 numsl"..., 70) = 70
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "    kfts: 2020/06/19 11:40:26.45"..., 36) = 36
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "    kfts: 2020/06/19 11:47:01.47"..., 36) = 36
7107       0.000012 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "  pcnt: 0 ()", 12) = 12
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "    kfkid: 0xa0bd5698, kfknm: , "..., 50) = 50
7107       0.000012 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "      fob: (KSFD)a2002ae8, magic"..., 54) = 54
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000014 write(23, "    kfdds: dn=2 inc=3914523473 d"..., 56) = 56
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000013 write(23, "      kfkds 0x7ffff6039488, kfki"..., 93) = 93
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000019 lseek(23, 0, SEEK_CUR) = 10543
7107       0.000012 write(23, "POST res = 1 ", 13) = 13
7107       0.000012 write(24, "7?n5~AAD\n", 9) = 9
7107       0.000014 write(23, "\n", 1)  = 1
。。。。。
7107       0.000075 pwrite(258, "\1\202\21\2\1\4\0\0\0\0\0\200i\201l(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4198400) = 4096
7107       0.000273 pwrite(258, "\1\202\22\2\2\4\0\0\0\0\0\200\34\33\264\23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4202496) = 4096
7107       0.000245 pwrite(258, "\1\202\21\2\0\4\0\0\0\0\0\200m\231f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4194304) = 4096
7107       0.000219 lseek(23, 0, SEEK_CUR) = 15222
7107       0.000012 write(23, "InvalLck (group 2) downgraded to"..., 34) = 34
7107       0.000015 write(24, "3?ID~1Y\n", 8) = 8
7107       0.000012 write(23, "\n", 1)  = 1
7107       0.000047 lseek(23, 0, SEEK_CUR) = 15257
7107       0.000011 write(23, "POST res = 1 ", 13) = 13
7107       0.000011 write(24, "3?R1~1D\n", 8) = 8
7107       0.000011 write(23, "\n", 1)  = 1
。。。。。。。
7107       0.000029 pwrite(258, "\1\202\21\2\1\4\0\0\0\0\0\200j\231f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4198400) = 4096
7107       0.000244 pwrite(258, "\1\202\22\2\3\4\0\0\0\0\0\200\t\33\264\23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4206592) = 4096
7107       0.000260 pwrite(258, "\1\202\21\2\0\4\0\0\0\0\0\200o\211f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4194304) = 4096
7107       0.000223 lseek(23, 0, SEEK_CUR) = 17680
7107       0.000012 write(23, "InvalLck (group 2) downgraded to"..., 34) = 34
7107       0.000015 write(24, "3?JC~1Y\n", 8) = 8
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000062 lstat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=345140, ...}) = 0
7107       0.000021 stat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=345140, ...}) = 0
7107       0.000031 lstat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=345140, ...}) = 0
7107       0.000014 open("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", O_WRONLY|O_CREAT|O_APPEND, 0660) = 25
7107       0.000020 fcntl(25, F_SETFD, FD_CLOEXEC) = 0
7107       0.000010 write(25, "<msg time='2020-06-19T12:01:10.8"..., 246) = 246
7107       0.000013 close(25)           = 0
7107       0.000013 stat("/u01/app/grid/diag/asm/+asm/+ASM/alert/log.xml", {st_mode=S_IFREG|0640, st_size=345386, ...}) = 0
7107       0.000012 times({tms_utime=2, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430137941
7107       0.000013 lstat("/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log", {st_mode=S_IFREG|0640, st_size=75708, ...}) = 0
7107       0.000014 open("/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 25
7107       0.000012 fcntl(25, F_SETFD, FD_CLOEXEC) = 0
7107       0.000010 write(25, "NOTE: cache closing disk 2 of gr"..., 48) = 48
7107       0.000029 write(25, "\n", 1)  = 1
7107       0.000013 close(25)           = 0
7107       0.001428 lseek(23, 0, SEEK_CUR) = 17715
7107       0.000017 write(23, "POST res = 1 ", 13) = 13
7107       0.000014 write(24, "3?RR~1D\n", 8) = 8
7107       0.000013 write(23, "\n", 1)  = 1
。。。。。。。
7107       0.000037 pwrite(258, "\1\202\21\2\1\4\0\0\0\0\0\200n\211f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4198400) = 4096
7107       0.000288 pwrite(258, "\1\202\22\2\2\4\0\0\0\0\0\200\23\241\21\202\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4202496) = 4096
7107       0.000227 pwrite(258, "\1\202\21\2\0\4\0\0\0\0\0\200m\255f\215\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4194304) = 4096
7107       0.000228 lseek(23, 0, SEEK_CUR) = 22376
7107       0.000012 write(23, "InvalLck (group 2) downgraded to"..., 34) = 34
7107       0.000016 write(24, "3?bC~1Y\n", 8) = 8
7107       0.000013 write(23, "\n", 1)  = 1
7107       0.000047 lseek(23, 0, SEEK_CUR) = 22411
7107       0.000011 write(23, "POST res = 1 ", 13) = 13
7107       0.000011 write(24, "3?W1~1D\n", 8) = 8
7107       0.000011 write(23, "\n", 1)  = 1
。。。。。。
7107       0.000010 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000012 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138245
7107       0.000024 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}, {0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 2
7107       0.000052 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\325cn]\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000152 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\327\333\214?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000109 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107       0.000031 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138246
7107       0.000050 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107       3.001232 getrusage(RUSAGE_SELF, {ru_utime={0, 31995}, ru_stime={0, 246962}, ...}) = 0
7107       0.000053 getrusage(RUSAGE_SELF, {ru_utime={0, 31995}, ru_stime={0, 246962}, ...}) = 0
7107       0.000062 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107       0.000062 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107       0.000078 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107       0.000038 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000047 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107       0.000075 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107       0.000052 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107       0.000035 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000038 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107       0.000072 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107       0.000186 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\205a,F\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000454 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\5H\24a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000220 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107       0.000153 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138546
7107       0.000052 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107       3.001088 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 246962}, ...}) = 0
7107       0.000019 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 246962}, ...}) = 0
7107       0.000022 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {600, 0}) = 1
7107       0.000037 close(260)          = 0
7107       0.000227 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107       0.000023 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107       0.000030 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107       0.000012 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000015 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107       0.000026 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107       0.000015 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107       0.000011 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000013 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107       0.000049 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\235\2017\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000150 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\261\357jv\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000123 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107       0.000057 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430138846
7107       0.000018 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107       3.000584 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 246962}, ...}) = 0
7107       0.000096 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 246962}, ...}) = 0
7107       0.000297 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107       0.000084 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107       0.000090 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107       0.000045 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000055 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107       0.000088 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107       0.000060 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107       0.000041 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000045 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107       0.000082 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107       0.000139 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\226\344\22(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000648 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200c\276\205\367\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000261 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 1
7107       0.000183 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139146
7107       0.000063 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
7107       3.000924 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 248962}, ...}) = 0
7107       0.000041 getrusage(RUSAGE_SELF, {ru_utime={0, 32994}, ru_stime={0, 248962}, ...}) = 0
7107       0.000051 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107       0.000049 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107       0.000062 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107       0.000045 poll([{fd=12, events=POLLIN}, {fd=20, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000038 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107       0.000060 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107       0.000040 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107       0.000026 poll([{fd=12, events=POLLIN}, {fd=22, events=POLLIN}], 2, 0) = 0 (Timeout)
7107       0.000029 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107       0.000054 io_getevents(140737349398528, 1, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107       0.000273 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\246j\371\333\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000257 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\n\345\2120\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107       0.000228 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}}, {0, 0}) = 1
7107       0.000123 times({tms_utime=3, tms_stime=24, tms_cutime=0, tms_cstime=0}) = 430139446
7107       0.000039 semtimedop(327682, {{18, -1, 0}}, 1, {3, 0} <detached ...>

trace信息很多,但是我们只关注更新磁盘的,即就是与write有关的,我们可以看到,其实操作的就是下面几步:

0.000016 write(23, "\n*** 2020-06-19 12:01:04.657\n", 29) = 29
0.000053 write(23, "GMON querying group 2 at 9 for p"..., 48) = 48
0.000013 write(24, "3?8qUGo~Um\n", 11) = 11
0.000013 write(23, "\n", 1)  = 1
0.000032 pwrite(258, "\1\202\21\2\1\4\0\0\0\0\0\200i\315\2\260\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4198400) = 4096
0.000311 pwrite(258, "\1\202\22\2\2\4\0\0\0\0\0\200\232\22\264\23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4202496) = 4096
0.000247 pwrite(258, "\1\202\21\2\0\4\0\0\0\0\0\200j\r\3\325\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4194304) = 4096
0.000264 pwrite(258, "\1\202\22\2\3\4\0\0\0\0\0\200\217\22\264\23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4206592) = 4096
0.000287 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\262\1\354&\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
0.000132 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\252\0173\237\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1

查看进程持有的文件句柄:

[root@11gasm ~]# cd /proc/7107/fd
[root@11gasm fd]# ll
total 0
lr-x------ 1 grid oinstall 64 Jun 19 11:55 0 -> /dev/null
l-wx------ 1 grid oinstall 64 Jun 19 11:55 1 -> /dev/null
lr-x------ 1 grid oinstall 64 Jun 19 11:55 10 -> /dev/zero
lrwx------ 1 grid oinstall 64 Jun 19 11:55 11 -> /u01/app/grid/11.2.0/grid/dbs/hc_+ASM.dat
lrwx------ 1 grid oinstall 64 Jun 19 11:55 12 -> socket:[103614]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 13 -> socket:[103615]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 14 -> socket:[103616]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 15 -> socket:[103617]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 16 -> socket:[103618]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 17 -> socket:[103621]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 18 -> socket:[103622]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 19 -> socket:[103669]
l-wx------ 1 grid oinstall 64 Jun 19 11:55 2 -> /dev/null
lrwx------ 1 grid oinstall 64 Jun 19 11:55 20 -> socket:[103670]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 21 -> socket:[103671]
lrwx------ 1 grid oinstall 64 Jun 19 11:55 22 -> socket:[103672]
l-wx------ 1 grid oinstall 64 Jun 19 11:55 23 -> /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_gmon_7107.trc
l-wx------ 1 grid oinstall 64 Jun 19 11:55 24 -> /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_gmon_7107.trm
lrwx------ 1 grid oinstall 64 Jun 19 11:55 256 -> /dev/sdb
lrwx------ 1 grid oinstall 64 Jun 19 11:55 257 -> /dev/sdc
lrwx------ 1 grid oinstall 64 Jun 19 11:55 258 -> /dev/sdd
lrwx------ 1 grid oinstall 64 Jun 19 11:55 259 -> /dev/sde
lrwx------ 1 grid oinstall 64 Jun 19 11:55 3 -> /u01/app/grid/11.2.0/grid/log/11gasm/agent/ohasd/oraagent_grid/oraagent_gridOUT.log
l-wx------ 1 grid oinstall 64 Jun 19 11:55 4 -> /u01/app/grid/11.2.0/grid/log/11gasm/agent/ohasd/oraagent_grid/oraagent_grid.log
lr-x------ 1 grid oinstall 64 Jun 19 11:55 5 -> /dev/null
lr-x------ 1 grid oinstall 64 Jun 19 11:55 6 -> /dev/null
lr-x------ 1 grid oinstall 64 Jun 19 11:55 7 -> /dev/null
lr-x------ 1 grid oinstall 64 Jun 19 11:55 8 -> /u01/app/grid/11.2.0/grid/rdbms/mesg/oraus.msb
lr-x------ 1 grid oinstall 64 Jun 19 11:55 9 -> /proc/7107/fd
[root@11gasm fd]# ls -ltr /dev/asm_*
lrwxrwxrwx 1 root root 3 Jun 19 10:00 /dev/asm_data01 -> sdb
lrwxrwxrwx 1 root root 3 Jun 19 10:00 /dev/asm_data02 -> sdc
lrwxrwxrwx 1 root root 3 Jun 19 11:23 /dev/asm_arch03 -> sdf
lrwxrwxrwx 1 root root 3 Jun 19 11:33 /dev/asm_arch01 -> sdd
lrwxrwxrwx 1 root root 3 Jun 19 11:34 /dev/asm_arch02 -> sde

可以看出,write函数操作的为file 23,24为asm alert日志,也就是asm写日志。
pwrite函数操作的偏移量分别为4198400,4202496,4194304,4206592,对应的为file 258,也就是/dev/asm_arch01,PST所在的磁盘。元数据块大小为4K,上面的偏移量对应的具体块号为:

SQL> select 4198400/4096 from dual;

4198400/4096
------------
        1025

SQL> select 4202496/4096 from dual;

4202496/4096
------------
        1026

SQL> select 4194304/4096 from dual;

4194304/4096
------------
        1024

SQL> select 4206592/4096 from dual;

4206592/4096
------------
        1027

查看对应的块信息:

[grid@11gasm ~]$  kfed read /dev/asm_arch01 aun=0 blkn=1024|grep type
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
[grid@11gasm ~]$  kfed read /dev/asm_arch01 aun=0 blkn=1025|grep type
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
[grid@11gasm ~]$  kfed read /dev/asm_arch01 aun=0 blkn=1026|grep type
kfbh.type:                           13 ; 0x002: KFBTYP_PST_NONE
[grid@11gasm ~]$  kfed read /dev/asm_arch01 aun=0 blkn=1027|grep type
kfbh.type:                           18 ; 0x002: KFBTYP_PST_DTA

由于我们的AU带下为4M,所以一个AU里面存储1024个元数据块,所以从1024开始,已经到第二个AU了,即AU1,所以也等价于下面:

[grid@11gasm ~]$  kfed read /dev/asm_arch01 aun=1 blkn=0 aus=4194304|grep type
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
[grid@11gasm ~]$  kfed read /dev/asm_arch01 aun=1 blkn=1 aus=4194304|grep type
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
[grid@11gasm ~]$  kfed read /dev/asm_arch01 aun=1 blkn=2 aus=4194304|grep type
kfbh.type:                           13 ; 0x002: KFBTYP_PST_NONE
[grid@11gasm ~]$  kfed read /dev/asm_arch01 aun=1 blkn=3 aus=4194304|grep type
kfbh.type:                           18 ; 0x002: KFBTYP_PST_DTA

所以磁盘组中有磁盘的变化时,gmon进程会更新PST的数据,2个meta,1个data,1个none。
上面还有一个函数:
io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\252\0173\237\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"…, nbytes:4096, offset:8384512}}) = 1
调用的pwrite函数,偏移量为8384512。

SQL> select 8384512/4096 from dual;

8384512/4096
------------
        2047

对于4M的AU,第二个AU的块号为1024-2047,即2047为第二个AU的最后一个块。磁盘组当有磁盘变化时,gmon也会去更新。如上,AU2的最后一个块为心跳信息。
具体操作系统函数io_submit还有待研究,
下面看下正常磁盘组多久更新次心跳信息,继续strace跟踪,只是参数不同了:
[grid@11gasm ~]$ strace -f -tt -o /tmp/gmon.log -p 7107
tail -200f /tmp/gmon.log

7107  15:15:57.295873 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\321\4\375\322\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107  15:15:57.296020 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\27?\323W\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
。。。。。。。
7107  15:16:00.297507 io_getevents(140737349398528, 2, 128, {{0x7ffff5d34210, 0x7ffff5d34210, 4096, 0}, {0x7ffff5d34480, 0x7ffff5d34480, 4096, 0}}, {0, 0}) = 2
7107  15:16:00.297555 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\200\3\314\263*\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107  15:16:00.297678 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200_+\377\205\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
。。。。。
7107  15:16:03.298822 io_submit(140737349398528, 1, {{data:0x7ffff5d34210, pwrite, filedes:256, str:"\1\202\23\2\377\7\0\0\0\0\0\2004\35@'\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1
7107  15:16:03.298951 io_submit(140737349398528, 1, {{data:0x7ffff5d34480, pwrite, filedes:258, str:"\1\202\23\2\377\7\0\0\0\0\0\200\357\273i\350\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., nbytes:4096, offset:8384512}}) = 1

可以看出,每三秒就会更新一次,所以在正常情况,gmon每3秒会更新一次心跳信息。具体的标志位为:kfbh.check。每3秒会改变一次。以此可以判断,该磁盘组是否被挂载。

[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check:                  1856946567 ; 0x00c: 0x6eaec187
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check:                  1580953418 ; 0x00c: 0x5e3b6f4a
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check:                   930039782 ; 0x00c: 0x376f47e6
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check:                   930039782 ; 0x00c: 0x376f47e6
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check:                   930039782 ; 0x00c: 0x376f47e6
[grid@11gasm ~]$ kfed read /dev/asm_arch01 aun=1 blkn=1023 aus=4194304|grep check
kfbh.check:                  4149201827 ; 0x00c: 0xf74fcba3

总结

  1. 每个磁盘的AU 1是为PST表预留的。在external冗余的磁盘组中只有一份PST表。
    在normal冗余的磁盘组中,至少有两份PST表。如果磁盘组中有三个或更多的failgroups,PST表将会有3份。
    在high冗余的磁盘组中,至少有三份PST表的信息。如果有四个failgroups,将会有4份PST表,如果有五个或者更多failgroups,将会有五份PST表。
  2. 不是每一个磁盘都有PST表的信息。具体在哪个磁盘上面可以通过kfed确定,或者asm alert日志在创建磁盘组的时候会有明确说明。
  3. 磁盘心跳信息为PST的一部分,跟PST在同一块磁盘上面,具体位置在AU1的最后一个块上面。
  4. PST(Partnership and Status Table)包含一个磁盘组里所有磁盘的相关信息 – 磁盘号,磁盘状态,partner磁盘号,心跳信息和failgroup信息(11g之后)。
  5. PST在一些情况下会发生变化,这种变化是可见的,有时磁盘不会发生变化,只是gmon更新了元数据信息;有时磁盘位置会发生变化
  6. 磁盘组的rebalance会有三个阶段,每个阶段的耗时也不一样。主要的进程就是RBAL和ARBn,ARBn是实际的relocate进程,而RBAL只是调度进程。从trc可以看出
  7. PST具体元数据可以通过kfed观察,每个结构体的含义可以参考上面。
  8. 如果PST损坏,是否可以修复,还待研究。
  9. 正常情况下,gmon每3秒会更新一次磁盘组的心跳信息。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值