记一次Oracle RAC使用ASMLIB标记磁盘导致的数据库系统宕机案例

在对某医院HIS数据库环境搜集过程中,发现这套Oracle RAC数据库没有正确使用到multipath提供的多路径磁盘,本着对用户及合作伙伴负责的态度,将过程做一描述说明,以及提出一点解决问题的思路建议。

系统环境:

操作系统为Linux

数据库Oracle 19c

采用ASMLIB创建共享磁盘查看相关信息如下(仅以DATA01为例):

执行multipath -ll命令返回如下

DATA01 (3600507680c80833288000000000000ca) dm-3 IBM   ,2145      
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:0 sdb       8:16 active ready running
|- 16:0:0:0 sdae       65:224 active ready running
|- 7:0:1:0 sdi       8:128 active ready running
|- 16:0:2:0 sdas       66:192 active ready running
|- 7:0:2:0 sdp       8:240 active ready running
|- 16:0:3:0 sdar       66:176 active ready running
|- 7:0:3:0 sdw       65:96 active ready running
`- 16:0:1:0 sdad       65:208 active ready running
OCR03 (3600507680c80833288000000000000e3) dm-9 IBM   ,2145      
size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:6 sdh       8:112 active ready running
|- 16:0:0:6 sdak       66:64 active ready running
|- 7:0:1:6 sdo       8:224 active ready running
|- 16:0:2:6 sday       67:32 active ready running
|- 7:0:2:6 sdv       65:80 active ready running
|- 16:0:3:6 sdbe       67:128 active ready running
|- 7:0:3:6 sdac       65:192 active ready running
`- 16:0:1:6 sdaq       66:160 active ready running
OCR02 (3600507680c80833288000000000000e2) dm-8 IBM   ,2145      
size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:5 sdg       8:96 active ready running
|- 16:0:0:5 sdaj       66:48 active ready running
|- 7:0:1:5 sdn       8:208 active ready running
|- 16:0:2:5 sdax       67:16 active ready running
|- 7:0:2:5 sdu       65:64 active ready running
|- 16:0:3:5 sdbd       67:112 active ready running
|- 7:0:3:5 sdab       65:176 active ready running
`- 16:0:1:5 sdap       66:144 active ready running
OCR01 (3600507680c80833288000000000000e1) dm-7 IBM   ,2145      
size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:4 sdf       8:80 active ready running
|- 16:0:0:4 sdai       66:32 active ready running
|- 7:0:1:4 sdm       8:192 active ready running
|- 16:0:2:4 sdaw       67:0 active ready running
|- 7:0:2:4 sdt       65:48 active ready running
|- 16:0:3:4 sdbc       67:96 active ready running
|- 7:0:3:4 sdaa       65:160 active ready running
`- 16:0:1:4 sdao       66:128 active ready running
DATA04 (3600507680c80833288000000000000cd) dm-6 IBM   ,2145      
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:3 sde       8:64 active ready running
|- 16:0:0:3 sdah       66:16 active ready running
|- 7:0:1:3 sdl       8:176 active ready running
|- 16:0:2:3 sdav       66:240 active ready running
|- 7:0:2:3 sds       65:32 active ready running
|- 16:0:3:3 sdbb       67:80 active ready running
|- 7:0:3:3 sdz       65:144 active ready running
`- 16:0:1:3 sdan       66:112 active ready running
DATA03 (3600507680c80833288000000000000cc) dm-5 IBM   ,2145      
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:2 sdd       8:48 active ready running
|- 16:0:0:2 sdag       66:0 active ready running
|- 7:0:1:2 sdk       8:160 active ready running
|- 16:0:2:2 sdau       66:224 active ready running
|- 7:0:2:2 sdr       65:16 active ready running
|- 16:0:3:2 sdba       67:64 active ready running
|- 7:0:3:2 sdy       65:128 active ready running
`- 16:0:1:2 sdam       66:96 active ready running
DATA02 (3600507680c80833288000000000000cb) dm-4 IBM   ,2145      
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:1 sdc       8:32 active ready running
|- 16:0:0:1 sdaf       65:240 active ready running
|- 7:0:1:1 sdj         8:144 active ready running
|- 16:0:2:1 sdat       66:208 active ready running
|- 7:0:2:1 sdq       65:0 active ready running
|- 16:0:3:1 sdaz       67:48 active ready running
|- 7:0:3:1 sdx         65:112 active ready running
`- 16:0:1:1 sdal       66:80 active ready running


查看multipath.conf配置文件

defaults {
polling_interval 10
path_selector   "round-robin 0"
path_grouping_policy multibus
uid_attribute   ID_SERIAL
prio     alua
path_checker   readsector0
rr_min_io   100
max_fds     8192
rr_weight   priorities
failback   immediate
no_path_retry   fail
user_friendly_names yes
}
multipaths {
multipath {      
  wwid     3600507680c80833288000000000000ca
  alias     DATA01
  path_grouping_policy multibus
  path_selector   "round-robin 0"
  failback   manual
  rr_weight   priorities
  no_path_retry   5
}
multipath {
  wwid     3600507680c80833288000000000000cb
  alias     DATA02
  path_grouping_policy multibus
  path_selector   "round-robin 0"
  failback   manual
  rr_weight   priorities
  no_path_retry   5
}
multipath {
  wwid     3600507680c80833288000000000000cc
  alias     DATA03
  path_grouping_policy multibus
  path_selector   "round-robin 0"
  failback   manual
  rr_weight   priorities
  no_path_retry   5
}
multipath {
  wwid     3600507680c80833288000000000000cd
  alias     DATA04
  path_grouping_policy multibus
  path_selector   "round-robin 0"
  failback   manual
  rr_weight   priorities
  no_path_retry   5
}
multipath {
  wwid     3600507680c80833288000000000000e1
  alias     OCR01
  path_grouping_policy multibus
  path_selector   "round-robin 0"
  failback   manual
  rr_weight   priorities
  no_path_retry   5
}
multipath {
  wwid     3600507680c80833288000000000000e2
  alias     OCR02
  path_grouping_policy multibus
  path_selector   "round-robin 0"
  failback   manual
  rr_weight   priorities
  no_path_retry   5
}
multipath {
  wwid     3600507680c80833288000000000000e3
  alias     OCR03
  path_grouping_policy multibus
  path_selector   "round-robin 0"
  failback   manual
  rr_weight   priorities
  no_path_retry   5
}
}
blacklist {
wwid 36f4ee08029ce6e002a007008d5711e3b
}

查看ASM信息

col path for a30
col name for a10
set line 300
SQL> select mode_status,name,state,path from v$asm_disk;
MODE_STATUS     NAME   STATE       PATH
--------------------- ---------- ------------------------ ---------------------------
ONLINE       OCR_0000 NORMAL     /dev/oracleasm/disks/OCR01
ONLINE       OCR_0001 NORMAL     /dev/oracleasm/disks/OCR02
ONLINE       OCR_0002 NORMAL     /dev/oracleasm/disks/OCR03
ONLINE       DATA_0003 NORMAL     /dev/oracleasm/disks/DATA04
ONLINE       DATA_0002 NORMAL     /dev/oracleasm/disks/DATA03
ONLINE       DATA_0001 NORMAL     /dev/oracleasm/disks/DATA02
ONLINE       DATA_0000 NORMAL     /dev/oracleasm/disks/DATA01
7 rows selected.
SQL> show parameter asm_disk
NAME         TYPE         VALUE
------------------------------------ --------------------------------- ------------
asm_diskgroups       string         DATA
asm_diskstring       string         /dev/oracleasm/disks/*

图片

图片

可以看到,8个盘的标签都是DATA01,其中sdb、sdi、sdp、sdw、sdae、sdar、sdas、sdad为单路径,8个路径绑定成了DATA01,当前正使⽤着[8, 16]这个设备,这个设备为sdb。

查看设备的主设备号和次设备号,看这个设备是对应到/dev/目录下的那个设备时,发现/dev/oracleasm/disks下面的盘不是对应到/dev/dm-xx盘,由此基本确定了asmlib没有使用多路径的盘。

ASMLIB包的基本原理是对盘起一个名字,如“DATA01”然后把这个名字存入磁盘的内容的头部。下次机器自动启动时,会自动运行/etc/rc.d/init.d/oracleasm start,这时会自动扫描硬盘,扫描过程中,是会读前面我们写入名称,由于使用了多路径,那么在/dev/下会有几个设备名对应着同一个硬盘,其中/dev/sdxxx的是各个路径盘,/dev/dm-xx是把这些路径合并了一个盘,正常情况下oracle会要求ASMLIB使用/dev/dm-xx盘,但ASMLIB的扫描规则是使用最先扫描到的盘,后面再扫描到的设备,只要上面的名称与前面相同,就使用前面的设备名,不管再次扫描到的了。由此极有可能导致链路宕掉随之而来ASM卷组盘也掉了,进而引发数据文件损坏或者宕机。其实oracle的官方网站也说明了此问题。

Metalink Note<How To Setup ASM & ASMLIB On Native Linux Multipath Mapper disks?

[ID 602952.1]

ASMLIB Installation & Configuration On MultiPath Mapper Devices (Step by Step Demo) On RAC Or Standalone Configurations. (文档 ID 1594584.1)

建议修改/etc/sysconfig/oracleasm(oracleasm-_dev_oracleasm)配置文中ORACLEASM_SCANORDER及ORACLEASM_SCANEXCLUDE,以便ASMLIB能找到正确的设备文件

 Configure ASMLIB to use multipath  (from each node on RAC environments):

By any path the ASMLIB can found the disks, but, the best path is using the multipath :

Modify in /etc/sysconfig/oracleasm :
ORACLEASM_SCANORDER=”dm”
ORACLEASM_SCANEXCLUDE=”sd”

note: The Oracle ASMLib configuration file is located at /etc/sysconfig/oracleasm. It is a link to file /etc/sysconfig/oracleasm-_dev_oracleasm.

Restart ASMLIB (from each node on RAC environments):
/etc/init.d/oracleasm stop
/etc/init.d/oracleasm start

参考链接:

http://www.help2ora.com/index.php/2011/08/16/how-to-setup-asm-asmlib-on-native-linux-multipath-mapper-disks/

https://www.oracle.com/linux/technologies/multipath-disks.html

如有可能尽量采用udev绑定,因为Oracle官方也建议RedHat/OLE 5以上建议采用udev。因为采用ASMLIB包的形式,只要Linux Kernel更新,都需要替换新的ASMLIB包,这就意味着ASMLIB需要花费时间去维护,同时可能引入未知的Bug。

也欢迎关注我的公众号【徐sir的IT之路】,一起学习!
————————————————————————————
公众号:徐sir的IT之路
CSDN :https://blog.csdn.net/xxddxhyz?type=blog
墨天轮:https://www.modb.pro/u/3605
PGFANS:https://www.pgfans.cn/user/home?userId=5568
————————————————————————————

  • 12
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

徐sir(徐慧阳)

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值