环境:
Oracle 11gR2 11.2.0.4 RAC
Centos7
虚拟机直接关闭电源,没有先停止集群服务,导致开机后,crs状态不正常
[grid@hxdb1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online 这个正常,是systecm status ohas.service 表示Orace高可用服务
CRS-4535: Cannot communicate with Cluster Ready Services 这个是crsd服务器
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[grid@hxdb1 ~]$
这个一般是由于磁盘组权限不足,查看共享磁盘目录(自定义)
此类问题一般由于私网不同,HAIP或者ASM disk的权限不足等导致
[root@hxdb1 ~]# ll /dev/as*
ls: cannot access /dev/as*: No such file or directory
[root@hxdb1 ~]#
没有了?这虚拟机重启后共享磁盘还在,但是生成的asm磁盘没有了
[root@hxdb1 ~]# ll /dev/sd*
brw-rw---- 1 root disk 8, 0 Mar 6 11:48 /dev/sda
brw-rw---- 1 root disk 8, 1 Mar 6 11:48 /dev/sda1
brw-rw---- 1 root disk 8, 2 Mar 6 11:48 /dev/sda2
brw-rw---- 1 root disk 8, 16 Mar 6 11:48 /dev/sdb
brw-rw---- 1 root disk 8, 32 Mar 6 11:48 /dev/sdc
brw-rw---- 1 root disk 8, 48 Mar 6 11:48 /dev/sdd
brw-rw---- 1 root disk 8, 64 Mar 6 11:48 /dev/sde
brw-rw---- 1 root disk 8, 80 Mar 6 11:48 /dev/sdf
[root@hxdb1 ~]# ls
查看udve服务状态
重新生成/etc/udev/rules.d/99-oracle-asmdevices.rules
使用脚本oracleasm.sh
[root@hxdb1 ~]# cat oracleasm.sh
for i in b c d e f;
do
echo "KERNEL==\"sd$i\", ENV{DEVTYPE}==\"disk\", SUBSYSTEM==\"block\", PROGRAM==\"/usr/lib/udev/scsi_id -g -u -d \$devnode\", RESULT==\"`/usr/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/sd$i`\", RUN+=\"/bin/sh -c 'mknod /dev/asm-disk$i b \$major \$minor; chown grid:asmadmin /dev/asm-disk$i; chmod 0660 /dev/asm-disk$i'\"">> /etc/udev/rules.d/99-oracle-asmdevices.rules
done
[root@hxdb1 ~]#
[grid@hxdb1 ~]$ cat /etc/udev/rules.d/99-oracle-asmdevices.rules
KERNEL=="sdb", ENV{DEVTYPE}=="disk", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d $devnode", RESULT=="36000c29d6e509424ba7e32caefa52f45", RUN+="/bin/sh -c 'mknod /dev/asm-diskb b $major $minor; chown grid:asmadmin /dev/asm-diskb; chmod 0660 /dev/asm-diskb'"
KERNEL=="sdc", ENV{DEVTYPE}=="disk", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d $devnode", RESULT=="36000c29b1afe5632d6323d4963430e71", RUN+="/bin/sh -c 'mknod /dev/asm-diskc b $major $minor; chown grid:asmadmin /dev/asm-diskc; chmod 0660 /dev/asm-diskc'"
KERNEL=="sdd", ENV{DEVTYPE}=="disk", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d $devnode", RESULT=="36000c29455eee523d97d20c4b52b46e0", RUN+="/bin/sh -c 'mknod /dev/asm-diskd b $major $minor; chown grid:asmadmin /dev/asm-diskd; chmod 0660 /dev/asm-diskd'"
KERNEL=="sde", ENV{DEVTYPE}=="disk", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d $devnode", RESULT=="36000c29cf578b61b90916d80909b520b", RUN+="/bin/sh -c 'mknod /dev/asm-diske b $major $minor; chown grid:asmadmin /dev/asm-diske; chmod 0660 /dev/asm-diske'"
KERNEL=="sdf", ENV{DEVTYPE}=="disk", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d $devnode", RESULT=="36000c29ecadce24226261c9e038e2e15", RUN+="/bin/sh -c 'mknod /dev/asm-diskf b $major $minor; chown grid:asmadmin /dev/asm-diskf; chmod 0660 /dev/asm-diskf'"
[grid@hxdb1 ~]$
然后重新加载分区
/sbin/partprobe /dev/sdb
/sbin/partprobe /dev/sdc
/sbin/partprobe /dev/sdd
/sbin/partprobe /dev/sde
/sbin/partprobe /dev/sdf
/usr/sbin/udevadm control --reload-rules
systemctl restart systemd-udevd.service
systemctl status systemd-udevd.service
systemctl enable systemd-udevd.service
/sbin/udevadm trigger --type=devices --action=change
都给执行一次,一次不行就几次,然后再次查看asm磁盘
[grid@hxdb1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[grid@hxdb1 ~]$
[root@hxdb2 ~]# su - grid
Last login: Sat Mar 6 11:49:44 CST 2021 on tty1
[grid@hxdb2 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[grid@hxdb2 ~]$
如上,节点1和节点2都正常了
集群正常情况如何关闭?