mysql+tmp满+crash_分布式存储crash导致ORACLE坏块原因分析

oracle运行在分布式存储中,我们在实验室模拟所有存储节点故障(整个存储断电),然后加电,数据库启动没有坏块,但是有客户在自己的环境做测试,发现数据库启动有大量坏块(几次测试问题依旧)。

客户异常环境报错汇总

模拟存储故障后,数据库报错信息,主要体现犹豫存储模拟故障,数据库读写io失败ORA-15081,ORA-27072,ORA-27061

Thread 2 cannot allocate new log, sequence 488

Checkpoint not complete

Current log# 3 seq# 487 mem# 0: +DATADG/testdb/onlinelog/group_3.318.970157559

Current log# 3 seq# 487 mem# 1: +ARCHDG/testdb/onlinelog/group_3.616.970157559

Thu Mar 08 16:17:08 2018

WARNING: Read Failed. group:2 disk:0 AU:6925 offset:49152 size:16384

WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of

file 313 in group [2.1286527068] from disk DATADG_0000 allocation unit 6925 reason error;

if possible, will try another mirror side

Thu Mar 08 16:17:08 2018

WARNING: Read Failed. group:2 disk:0 AU:6921 offset:16384 size:16384

WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0

of file 313 in group [2.1286527068] from disk DATADG_0000 allocation unit 6921 reason error;

if possible, will try another mirror side

Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_lmon_21091.trc:

ORA-00202: control file: '+DATADG/testdb/controlfile/current.313.970157461'

ORA-15081: failed to submit an I/O operation to a disk

ORA-27072: File I/O error

Linux-x86_64 Error: 5: Input/output error

Additional information: 4

Additional information: 56729696

Additional information: -1

Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_lgwr_21111.trc:

ORA-00202: control file: '+DATADG/testdb/controlfile/current.313.970157461'

ORA-15081: failed to submit an I/O operation to a disk

ORA-27072: File I/O error

Linux-x86_64 Error: 5: Input/output error

Additional information: 4

Additional information: 56696864

Additional information: -1

Thu Mar 08 16:17:08 2018

WARNING: Write Failed. group:1 disk:0 AU:56970 offset:65536 size:16384

WARNING: Write Failed. group:2 disk:0 AU:6921 offset:65536 size:16384

…………

WARNING: failed to write mirror side 1 of virtual extent 147 logical extent

0 of file 335 in group 2 on disk 0 allocation unit 11789

Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_dbw0_21109.trc:

ORA-15080: synchronous I/O operation to a disk failed

ORA-27061: waiting for async I/Os failed

Linux-x86_64 Error: 5: Input/output error

Additional information: -1

Additional information: 8192

WARNING: failed to write mirror side 1 of virtual extent 147 logical extent

0 of file 335 in group 2 on disk 0 allocation unit 11789

KCF: read, write or open error, block=0xddec online=1

file=18 '+DATADG/testdb/datafile/soe.333.970157833'

error=15081 txt: ''

Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_dbw0_21109.trc:

Thu Mar 08 16:17:09 2018

System state dump requested by (instance=2, osid=21113 (CKPT)), summary=[abnormal instance termination].

System State dumped to trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_diag_21081_20180308161709.trc

Thu Mar 08 16:17:09 2018

ORA-1092 : opitsk aborting process

Thu Mar 08 16:17:12 2018

ORA-1092 : opitsk aborting process

Instance terminated by CKPT, pid = 21113

asm日志报错信息,主要也是体现在io的读写错误上Linux-x86_64 Error: 5: Input/output error

Thu Mar 08 16:17:20 2018

Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_9636.trc:

ORA-17503: ksfdopn:2 Failed to open file +DATADG/testdb/spfiletestdb.ora

ORA-15001: diskgroup "DATADG" does not exist or is not mounted

Thu Mar 08 16:17:37 2018

WARNING: Write Failed. group:3 disk:0 AU:1 offset:4190208 size:4096

Thu Mar 08 16:17:37 2018

WARNING: Write Failed. group:3 disk:0 AU:22 offset:143360 size:4096

Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:

ORA-15080: synchronous I/O operation to a disk failed

ORA-27072: File I/O error

Linux-x86_64 Error: 5: Input/output error

Additional information: 4

Additional information: 180504

Additional information: -1

WARNING: Hbeat write to PST disk 0.3916310704 in group 3 failed. [4]

WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0

of file 255 in group 3 on disk 0 allocation unit 22

Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:

ORA-15081: failed to submit an I/O operation to a disk

ORA-27072: File I/O error

Linux-x86_64 Error: 5: Input/output error

Additional information: 4

Additional information: 180504

Additional information: -1

Thu Mar 08 16:17:37 2018

NOTE: process _b001_+asm2 (9546) initiating offline of

disk 0.3916310704 (OCRDG_0000) with mask 0x7e in group 3

NOTE: initiating PST update: grp = 3, dsk = 0/0xe96e28b0, mask = 0x6a, op = clear

GMON updating disk modes for group 3 at 15 for pid 27, osid 9546

ERROR: no read quorum in group: required 1, found 0 disks

Thu Mar 08 16:17:37 2018

NOTE: cache dismounting (not clean) group 3/0x4CAED85D (OCRDG)

WARNING: Offline for disk OCRDG_0000 in mode 0x7f failed.

NOTE: messaging CKPT to quiesce pins Unix process pid: 9505, image: oracle@dbtest02 (B000)

Thu Mar 08 16:17:37 2018

NOTE: halting all I/Os to diskgroup 3 (OCRDG)

Thu Mar 08 16:17:37 2018

NOTE: LGWR doing non-clean dismount of group 3 (OCRDG)

NOTE: LGWR sync ABA=11.6 last written ABA 11.6

WARNING: Write Failed. group:3 disk:0 AU:22 offset:102400 size:4096

Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:

ORA-15080: synchronous I/O operation to a disk failed

ORA-27072: File I/O error

Linux-x86_64 Error: 5: Input/output error

Additional information: 4

Additional information: 180424

Additional information: -1

WARNING: failed to write mirror side 1 of virtual extent 0 logical extent

0 of file 255 in group 3 on disk 0 allocation unit 22

Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:

ORA-15081: failed to submit an I/O operation to a disk

ORA-27072: File I/O error

Linux-x86_64 Error: 5: Input/output error

Additional information: 4

Additional information: 180424

Additional information: -1

WARNING: Write Failed. group:3 disk:0 AU:22 offset:147456 size:4096

Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:

ORA-15080: synchronous I/O operation to a disk failed

ORA-27072: File I/O error

…………

Thu Mar 08 16:17:37 2018

ERROR: -9(Error 27061, OS Error (Linux-x86_64 Error: 5: Input/output error

Additional information: -1

Additional information: 512)

)

ERROR: -9(Error 27061, OS Error (Linux-x86_64 Error: 5: Input/output error

Additional information: -1

Additional information: 512)

)

ERROR: -9(Error 27061, OS Error (Linux-x86_64 Error: 5: Input/output error

Additional information: -1

Additional information: 512)

)

存储恢复之后,asm mount磁盘组没有报任何错误

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */

NOTE: Diskgroups listed in ASM_DISKGROUPS are

ARCHDG

DATADG

NOTE: Diskgroup used for Voting files is:

OCRDG

Diskgroup with spfile:OCRDG

Diskgroup used for OCR is:OCRDG

NOTE: cache registered group ARCHDG number=1 incarn=0xb81889ff

NOTE: cache began mount (first) of group ARCHDG number=1 incarn=0xb81889ff

NOTE: cache registered group DATADG number=2 incarn=0xb8288a00

NOTE: cache began mount (first) of group DATADG number=2 incarn=0xb8288a00

NOTE: cache registered group OCRDG number=3 incarn=0xb8288a01

NOTE: cache began mount (first) of group OCRDG number=3 incarn=0xb8288a01

NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so

NOTE: Assigning number (1,0) to disk (/dev/oracleasm/disks/ARCHDISK01)

NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATADISK01)

NOTE: Assigning number (3,0) to disk (/dev/oracleasm/disks/OCRDISK01)

NOTE: GMON heartbeating for grp 1

GMON querying group 1 at 5 for pid 23, osid 3303

NOTE: cache opening disk 0 of grp 1: ARCHDG_0000 path:/dev/oracleasm/disks/ARCHDISK01

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 1/0xB81889FF (ARCHDG)

* allocate domain 1, invalid = TRUE

NOTE: attached to recovery domain 1

NOTE: starting recovery of thread=1 ckpt=13.6287 group=1 (ARCHDG)

NOTE: starting recovery of thread=2 ckpt=12.7985 group=1 (ARCHDG)

NOTE: advancing ckpt for group 1 (ARCHDG) thread=1 ckpt=13.6287

NOTE: advancing ckpt for group 1 (ARCHDG) thread=2 ckpt=12.7985

NOTE: cache recovered group 1 to fcn 0.122934

NOTE: redo buffer size is 256 blocks (1053184 bytes)

NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (ARCHDG)

Process LGWR (pid 3287) is running at high priority QoS for Exadata I/O

NOTE: LGWR found thread 1 closed at ABA 13.6286

NOTE: LGWR mounted thread 1 for diskgroup 1 (ARCHDG)

NOTE: LGWR opening thread 1 at fcn 0.122934 ABA 14.6287

NOTE: cache mounting group 1/0xB81889FF (ARCHDG) succeeded

NOTE: cache ending mount (success) of group ARCHDG number=1 incarn=0xb81889ff

NOTE: GMON heartbeating for grp 2

GMON querying group 2 at 7 for pid 23, osid 3303

NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/oracleasm/disks/DATADISK01

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 2/0xB8288A00 (DATADG)

* allocate domain 2, invalid = TRUE

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=12.2323 group=2 (DATADG)

NOTE: starting recovery of thread=2 ckpt=11.2681 group=2 (DATADG)

NOTE: advancing ckpt for group 2 (DATADG) thread=1 ckpt=12.2326

NOTE: advancing ckpt for group 2 (DATADG) thread=2 ckpt=11.2687

NOTE: cache recovered group 2 to fcn 0.21395

NOTE: redo buffer size is 512 blocks (2101760 bytes)

NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATADG)

NOTE: LGWR found thread 1 closed at ABA 12.2325

NOTE: LGWR mounted thread 1 for diskgroup 2 (DATADG)

NOTE: LGWR opening thread 1 at fcn 0.21395 ABA 13.2326

NOTE: cache mounting group 2/0xB8288A00 (DATADG) succeeded

NOTE: cache ending mount (success) of group DATADG number=2 incarn=0xb8288a00

NOTE: GMON heartbeating for grp 3

GMON querying group 3 at 9 for pid 23, osid 3303

NOTE: cache opening disk 0 of grp 3: OCRDG_0000 path:/dev/oracleasm/disks/OCRDISK01

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 3/0xB8288A01 (OCRDG)

* allocate domain 3, invalid = TRUE

Thu Mar 08 17:00:24 2018

NOTE: attached to recovery domain 3

NOTE: starting recovery of thread=1 ckpt=13.55 group=3 (OCRDG)

NOTE: starting recovery of thread=2 ckpt=11.7 group=3 (OCRDG)

NOTE: advancing ckpt for group 3 (OCRDG) thread=1 ckpt=13.55

NOTE: advancing ckpt for group 3 (OCRDG) thread=2 ckpt=11.7

NOTE: cache recovered group 3 to fcn 0.851

NOTE: redo buffer size is 512 blocks (2101760 bytes)

Thu Mar 08 17:00:24 2018

NOTE: LGWR attempting to mount thread 1 for diskgroup 3 (OCRDG)

NOTE: LGWR found thread 1 closed at ABA 13.54

NOTE: LGWR mounted thread 1 for diskgroup 3 (OCRDG)

NOTE: LGWR opening thread 1 at fcn 0.851 ABA 14.55

NOTE: cache mounting group 3/0xB8288A01 (OCRDG) succeeded

NOTE: cache ending mount (success) of group OCRDG number=3 incarn=0xb8288a01

Thu Mar 08 17:00:24 2018

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1

SUCCESS: diskgroup ARCHDG was mounted

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2

SUCCESS: diskgroup DATADG was mounted

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3

SUCCESS: diskgroup OCRDG was mounted

SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */

数据库虽然启动成功了,但是大量的坏块生产ORA-01578,ORA-01110

ALTER DATABASE OPEN /* db agent *//* {2:40834:2} */

This instance was first to open

Beginning crash recovery of 2 threads

parallel recovery started with 7 processes

Started redo scan

Completed redo scan

read 107566 KB redo, 10569 data blocks need recovery

Started redo application at

Thread 1: logseq 767, block 101851

Thread 2: logseq 486, block 91861

Recovery of Online Redo Log: Thread 1 Group 1 Seq 767 Reading mem 0

Mem# 0: +DATADG/testdb/onlinelog/group_1.314.970157463

Mem# 1: +ARCHDG/testdb/onlinelog/group_1.614.970157463

Recovery of Online Redo Log: Thread 2 Group 4 Seq 486 Reading mem 0

Mem# 0: +DATADG/testdb/onlinelog/group_4.319.970157561

Mem# 1: +ARCHDG/testdb/onlinelog/group_4.617.970157561

Recovery of Online Redo Log: Thread 1 Group 2 Seq 768 Reading mem 0

Mem# 0: +DATADG/testdb/onlinelog/group_2.315.970157463

Mem# 1: +ARCHDG/testdb/onlinelog/group_2.615.970157463

Recovery of Online Redo Log: Thread 2 Group 3 Seq 487 Reading mem 0

Mem# 0: +DATADG/testdb/onlinelog/group_3.318.970157559

Mem# 1: +ARCHDG/testdb/onlinelog/group_3.616.970157559

Thu Mar 08 17:01:11 2018

Hex dump of (file 12, block 126469) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p001_4090.trc

Corrupt block relative dba: 0x0301ee05 (file 12, block 126469)

Fractured block found during crash/instance recovery

Data in bad block:

type: 32 format: 2 rdba: 0x0301ee05

last change scn: 0x0000.00446cbb seq: 0x1 flg: 0x04

spare1: 0x0 spare2: 0x0 spare3: 0x0

consistency value in tail: 0xeac22001

check value in block header: 0xbd0c

computed block checksum: 0x8679

Reading datafile '+DATADG/testdb/datafile/soe.327.970157825' for corruption at rdba: 0x0301ee05 (file 12, block 126469)

Reread (file 12, block 126469) found same corrupt data (no logical check)

Completed redo application of 84.56MB

Completed crash recovery at

Thread 1: logseq 768, block 102395, scn 4586960

Thread 2: logseq 487, block 101664, scn 4587050

10569 data blocks read, 10569 data blocks written, 107566 redo k-bytes read

ARCH: STARTING ARCH PROCESSES

Thu Mar 08 17:01:12 2018

ARC0 started with pid=43, OS id=4151

Thu Mar 08 17:01:13 2018

ARC0: Archival started

ARCH: STARTING ARCH PROCESSES COMPLETE

ARC0: STARTING ARCH PROCESSES

Thu Mar 08 17:01:13 2018

ARC1 started with pid=44, OS id=4153

Thu Mar 08 17:01:13 2018

ARC2 started with pid=45, OS id=4155

ARC1: Archival started

ARC2: Archival started

Thu Mar 08 17:01:13 2018

ARC3 started with pid=46, OS id=4157

ARC1: Becoming the 'no FAL' ARCH

ARC1: Becoming the 'no SRL' ARCH

ARC2: Becoming the heartbeat ARCH

Thread 1 advanced to log sequence 769 (thread recovery)

Picked broadcast on commit scheme to generate SCNs

Thu Mar 08 17:01:13 2018

Thread 2 advanced to log sequence 488 (thread open)

Thread 2 opened at log sequence 488

Current log# 4 seq# 488 mem# 0: +DATADG/testdb/onlinelog/group_4.319.970157561

Current log# 4 seq# 488 mem# 1: +ARCHDG/testdb/onlinelog/group_4.617.970157561

Successful open of redo thread 2

MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

Thu Mar 08 17:01:13 2018

SMON: enabling cache recovery

Archived Log entry 1249 added for thread 2 sequence 487 ID 0xa41f5c11 dest 1:

[4046] Successfully onlined Undo Tablespace 2.

Undo initialization finished serial:0 start:4294332240 end:4294332460 diff:220 (2 seconds)

Verifying file header compatibility for 11g tablespace encryption..

Verifying 11g file header compatibility for tablespace encryption completed

SMON: enabling tx recovery

Database Characterset is AL32UTF8

No Resource Manager plan active

Starting background process GTX0

Thu Mar 08 17:01:14 2018

GTX0 started with pid=55, OS id=4181

Starting background process RCBG

Thu Mar 08 17:01:14 2018

RCBG started with pid=57, OS id=4185

replication_dependency_tracking turned off (no async multimaster replication found)

Starting background process QMNC

Thu Mar 08 17:01:14 2018

QMNC started with pid=61, OS id=4195

Thu Mar 08 17:01:15 2018

Hex dump of (file 9, block 43523) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p009_4175.trc

Corrupt block relative dba: 0x0240aa03 (file 9, block 43523)

Fractured block found during buffer read

Data in bad block:

type: 32 format: 2 rdba: 0x0240aa03

last change scn: 0x0000.0044ccc4 seq: 0x1 flg: 0x04

spare1: 0x0 spare2: 0x0 spare3: 0x0

consistency value in tail: 0xd94f2001

check value in block header: 0xb09c

computed block checksum: 0x158b

Reading datafile '+DATADG/testdb/datafile/soe.324.970157821' for corruption at rdba: 0x0240aa03 (file 9, block 43523)

Reread (file 9, block 43523) found same corrupt data (no logical check)

Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p009_4175.trc (incident=216418):

ORA-01578: ORACLE data block corrupted (file # 9, block # 43523)

ORA-01110: data file 9: '+DATADG/testdb/datafile/soe.324.970157821'

Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216418/testdb2_p009_4175_i216418.trc

ARC3: Archival started

ARC0: STARTING ARCH PROCESSES COMPLETE

Thu Mar 08 17:01:15 2018

Hex dump of (file 14, block 71173) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p003_4094.trc

Corrupt block relative dba: 0x03811605 (file 14, block 71173)

Fractured block found during buffer read

Data in bad block:

type: 32 format: 2 rdba: 0x03811605

last change scn: 0x0000.0043916d seq: 0x1 flg: 0x04

spare1: 0x0 spare2: 0x0 spare3: 0x0

consistency value in tail: 0x93662001

check value in block header: 0xad98

computed block checksum: 0x20b

Reading datafile '+DATADG/testdb/datafile/soe.329.970157831' for corruption at rdba: 0x03811605 (file 14, block 71173)

Reread (file 14, block 71173) found same corrupt data (no logical check)

Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p003_4094.trc (incident=216306):

ORA-01578: ORACLE data block corrupted (file # 14, block # 71173)

ORA-01110: data file 14: '+DATADG/testdb/datafile/soe.329.970157831'

Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216306/testdb2_p003_4094_i216306.trc

Hex dump of (file 12, block 112385) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_smon_4005.trc

Corrupt block relative dba: 0x0301b701 (file 12, block 112385)

Fractured block found during buffer read

Data in bad block:

type: 6 format: 2 rdba: 0x0301b701

last change scn: 0x0000.003cb146 seq: 0x1 flg: 0x06

spare1: 0x0 spare2: 0x0 spare3: 0x0

consistency value in tail: 0x1d030601

check value in block header: 0x1e62

computed block checksum: 0x7a47

Reading datafile '+DATADG/testdb/datafile/soe.327.970157825' for corruption at rdba: 0x0301b701 (file 12, block 112385)

Reread (file 12, block 112385) found same corrupt data (no logical check)

Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_smon_4005.trc (incident=216170):

ORA-01578: ORACLE data block corrupted (file # 12, block # 112385)

ORA-01110: data file 12: '+DATADG/testdb/datafile/soe.327.970157825'

Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216170/testdb2_smon_4005_i216170.trc

Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p003_4094.trc (incident=216307):

ORA-01578: ORACLE data block corrupted (file # , block # )

Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216307/testdb2_p003_4094_i216307.trc

Completed: ALTER DATABASE OPEN /* db agent *//* {2:40834:2} */

rman 检查坏块信息,发现有大量坏块,而且主要坏块类型是FRACTURED(折断)

SQL> select * from v$database_block_corruption;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO

---------- ---------- ---------- ------------------ ---------

12 126469 1 0 FRACTURED

12 112385 1 0 FRACTURED

1471173 1 0 FRACTURED

943523 1 0 FRACTURED

1544801 1 0 FRACTURED

1544855 1 0 FRACTURED

1544862 1 0 FRACTURED

1156252 1 0 FRACTURED

12 116481 1 0 CHECKSUM

12 116535 1 0 FRACTURED

1546903 1 0 FRACTURED

12 108289 1 0 FRACTURED

12 108343 1 0 FRACTURED

895872 1 0 FRACTURED

899712 1 0 FRACTURED

8 102783 1 0 FRACTURED

8 104974 1 0 FRACTURED

8 105856 1 0 FRACTURED

8 105998 1 0 FRACTURED

8 109120 1 0 FRACTURED

8 110016 1 0 FRACTURED

8 110092 1 0 FRACTURED

8 116226 1 0 FRACTURED

8 116231 1 0 FRACTURED

8 119295 1 0 FRACTURED

8 130559 1 0 FRACTURED

12 113163 1 0 FRACTURED

12 118283 1 0 FRACTURED

12 119553 1 0 FRACTURED

12 119608 1 0 FRACTURED

12 120576 1 0 FRACTURED

12 120632 1 0 FRACTURED

12 120639 1 0 FRACTURED

12 123407 1 0 FRACTURED

938411 1 0 FRACTURED

967647 1 0 FRACTURED

9 109068 1 0 FRACTURED

9 109070 1 0 FRACTURED

1199839 1 0 FRACTURED

11 104960 1 0 FRACTURED

1363214 1 0 FRACTURED

1365097 1 0 FRACTURED

1371183 1 0 FRACTURED

1377319 1 0 FRACTURED

13 103104 1 0 FRACTURED

13 107072 1 0 FRACTURED

13 110082 1 0 FRACTURED

13 111106 1 0 FRACTURED

13 114752 1 0 FRACTURED

1472196 1 0 FRACTURED

1475275 1 0 FRACTURED

776293 1 0 FRACTURED

776296 1 0 FRACTURED

776415 1 0 FRACTURED

776864 1 0 FRACTURED

1549976 1 0 FRACTURED

1581413 1 0 FRACTURED

261512 1 0 FRACTURED

1935338 1 0 FRACTURED

1942687 1 0 FRACTURED

60 rows selected.

对比实验室和客户环境配置

实验室环境

[oracle@i-q2ghx82t ~]$ more /etc/issue

CentOS release 6.8 (Final)

Kernel \r on an \m

SQL> select path from v$asm_disk;

PATH

----------------------------------------

/dev/asm-disk011

/dev/asm-disk001

/dev/asm-disk014

/dev/asm-disk002

/dev/asm-disk003

/dev/asm-disk015

/dev/asm-disk012

/dev/asm-disk013

8 rows selected.

[root@i-q2ghx82t ~]# fdisk -l|grep vd

Disk /dev/vda: 107.4 GB, 107374182400 bytes

/dev/vda1 * 1 13055 104856576 83 Linux

Disk /dev/vdb: 34.4 GB, 34359738368 bytes

Disk /dev/vdc: 107.4 GB, 107374182400 bytes

Disk /dev/vdd: 268.4 GB, 268435456000 bytes

/dev/vdd1 1 32635 262140606 83 Linux

Disk /dev/vde: 1073.7 GB, 1073741824000 bytes

Disk /dev/vdf: 53.7 GB, 53687091200 bytes

/dev/vdf1 1 104025 52428568+ 83 Linux

Disk /dev/vdg: 107.4 GB, 107374182400 bytes

Disk /dev/vdh: 1073.7 GB, 1073741824000 bytes

Disk /dev/vdi: 107.4 GB, 107374182400 bytes

Disk /dev/vdj: 1073.7 GB, 1073741824000 bytes

Disk /dev/vdk: 1073.7 GB, 1073741824000 bytes

Disk /dev/vdl: 1073.7 GB, 1073741824000 bytes

[oracle@i-q2ghx82t ~]$ ls -l /dev/asm-*

lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk001 -> vdg

lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk002 -> vdc

lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk003 -> vdi

lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk011 -> vde

lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk012 -> vdh

lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk013 -> vdl

lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk014 -> vdj

lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk015 -> vdk

[root@i-q2ghx82t ~]# fdisk -l /dev/vdj

Disk /dev/vdj: 1073.7 GB, 1073741824000 bytes

16 heads, 63 sectors/track, 2080507 cylinders

Units = cylinders of 1008 * 512 = 516096 bytes

Sector size (logical/physical): 512 bytes / 4096 bytes

I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk identifier: 0x00000000

客户环境

[oracle@dbtest02 ~]$ more /etc/issue

Red Hat Enterprise Linux Server release 6.6 (Santiago)

Kernel \r on an \m

SQL> select path from v$asm_disk;

PATH

----------------------------------------

/dev/oracleasm/disks/ARCHDISK01

/dev/oracleasm/disks/OCRDISK01

/dev/oracleasm/disks/DATADISK01

[root@dbtest02 ~]# oracleasm listdisks

ARCHDISK01

DATADISK01

OCRDISK01

[root@dbtest02 ~]# oracleasm querydisk -p DATADISK01

Disk "DATADISK01" is a valid ASM disk

/dev/vdc1: LABEL="DATADISK01" TYPE="oracleasm"

[root@dbtest02 ~]# oracleasm querydisk -p ARCHDISK01

Disk "ARCHDISK01" is a valid ASM disk

/dev/vde1: LABEL="ARCHDISK01" TYPE="oracleasm"

[root@dbtest02 ~]# oracleasm querydisk -p OCRDISK01

Disk "OCRDISK01" is a valid ASM disk

/dev/vdd1: LABEL="OCRDISK01" TYPE="oracleasm"

[root@dbtest02 ~]# fdisk -l|grep vd

Disk /dev/vda: 53.7 GB, 53687091200 bytes

/dev/vda1 1 6528 52427776 83 Linux

Disk /dev/vdb: 17.2 GB, 17179869184 bytes

Disk /dev/vdc: 1073.7 GB, 1073741824000 bytes

/dev/vdc1 1 2080507 1048575496+ 83 Linux

Disk /dev/vdd: 107.4 GB, 107374182400 bytes

/dev/vdd1 1 208050 104857168+ 83 Linux

Disk /dev/vde: 536.9 GB, 536870912000 bytes

/dev/vde1 1 1040253 524287480+ 83 Linux

Disk /dev/vdf: 859.0 GB, 858993459200 bytes

/dev/vdf1 1 1664406 838860592+ 83 Linux

[root@dbtest02 ~]# fdisk -l /dev/vdf

Disk /dev/vdf: 859.0 GB, 858993459200 bytes

16 heads, 63 sectors/track, 1664406 cylinders

Units = cylinders of 1008 * 512 = 516096 bytes

Sector size (logical/physical): 512 bytes / 4096 bytes

I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk identifier: 0x1bdbf439

Device Boot Start End Blocks Id System

/dev/vdf1 1 1664406 838860592+ 83 Linux

Partition 1 does not start on physical sector boundary.

这里比较明显:1)实验室环境使用的是udev,而客户环境使用的是asmlib.2)客户环境使用了分区,而实验室环境直接使用裸盘.

尝试在实验室环境使用asmlib

[root@i-q2ghx82t tmp]# ls -l *oracleasm*

-rw-r--r-- 1 root root 36288 Mar 10 13:09 kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm

-rw-r--r-- 1 root root 17168 Mar 10 13:09 oracleasmlib-2.0.12-1.el6.x86_64.rpm

-rw-r--r-- 1 root root 74984 Mar 10 13:09 oracleasm-support-2.1.8-1.el6.x86_64.rpm

[root@i-q2ghx82t tmp]# rpm -ivh kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm

warning: kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY

Preparing... ########################################### [100%]

1:kmod-oracleasm ########################################### [100%]

[root@i-q2ghx82t tmp]# rpm -ivh oracleasm*.rpm

warning: oracleasmlib-2.0.12-1.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY

Preparing... ########################################### [100%]

1:oracleasm-support ########################################### [ 50%]

2:oracleasmlib ########################################### [100%]

[root@i-q2ghx82t tmp]#

[root@i-q2ghx82t tmp]# rpm -aq|grep oracleasm

kmod-oracleasm-2.0.8-13.el6_8.x86_64

oracleasmlib-2.0.12-1.el6.x86_64

oracleasm-support-2.1.8-1.el6.x86_64

[root@i-q2ghx82t dev]# oracleasm createdisk ASMVDL /dev/vdl

Device "/dev/vdl" is not a partition

测试比较明显,对于没有分区的分布式存储中的盘,无法直接被asmlib创建asm disk.如果要使用asmlib,需要对磁盘进行分区.

存储特征

通过咨询分布式存储开发,确认几点:

0. 整个存储是基于网络技术实现分布式存储内部数据同步

1. 整个分布式存储使用ssd卡,默认三副本

2. 三副本中有两个副本写入成功才会反馈给应用(数据库),io成功

3. 整个分布式存储,没有引入任何的cache.

4. 存储的最小io单元是4k,由于ssd卡决定的

5. 分布式存储划分出来的lun是自动实现快对齐的

故障原因

由于存储最小单元是4k,但是在os层面物理扇区为4k,逻辑扇区为512,客户为了使用asmlib,对磁盘进行了默认分区,而没有考虑块对齐.会使得在存储整体crash的时候,很多block由于没有块对齐,很可能被进行了拆分(本来一个4k的io,存储上对对应的一个原子io完成,但是现在这个os层面的4k被拆分成了多个io的可能性,使得os层面的4k的io无法有了原则性),进而使得在数据库主机,存储突发异常的时候,有可能发生坏块.

处理建议

1. 在linux 6开始,对于asmlib,建议使用udev

2. 如果坚持asmlib,分区的时候考虑块对齐,不然会出现较大的性能下降,而且还会引起坏块的风险

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值