今天在3节点测试库进行了一下压力测试,导致一个节点hang住。
另外一个节点也出现I/O error:
重启后发现节点crs起不来:
1 检查crs是否启动
[root@rac3 log]# ps -ef | grep -i crs
root 28137 1 0 11:17 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 31789 30006 0 11:29 pts/1 00:00:00 grep -i crs
[root@rac3 log]#
[root@rac3 log]#
[root@rac3 log]# crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
[root@rac3 log]#
[root@rac3 log]# crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
[root@rac3 log]# ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
root 31938 30006 0 11:30 pts/1 00:00:00 egrep crsd.bin|ocssd.bin|evmd.bin|oprocd
[root@rac3 log]#
[root@rac3 log]#
2 检查OCR and voting disk设备是否正确配置;
检查OCR设备的配置文件/etc/oracle/ocr.loc:
[root@rac3 log]# crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
[root@rac3 log]# crsctl query css votedisk
0. 0 /dev/raw/raw3
1. 0 /dev/raw/raw4
2. 0 /dev/raw/raw5
located 3 votedisk(s).
[root@rac3 log]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 106564
Used space (kbytes) : 5444
Available space (kbytes) : 101120
ID : 1393012097
Device/File Name : /dev/raw/raw1
Device/File integrity check succeeded
Device/File Name : /dev/raw/raw2
Device/File integrity check succeeded
Cluster registry integrity check succeeded
[root@rac3 log]# crsctl stop crs
Stopping resources. This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
[root@rac3 log]# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[root@rac3 log]#
[root@rac3 log]# ps -ef |grep d.bin
root 607 30006 0 11:39 pts/1 00:00:00 grep d.bin
[root@rac3 log]# ps -ef |grep d.bin
root 609 30006 0 11:39 pts/1 00:00:00 grep d.bin
[root@rac3 log]# ps -ef |grep d.bin
root 618 30006 0 11:39 pts/1 00:00:00 grep d.bin
[root@rac3 log]# ps -ef |grep d.bin
root 620 30006 0 11:39 pts/1 00:00:00 grep d.bin
[root@rac3 log]# ps -ef |grep d.bin
root 896 30006 0 11:41 pts/1 00:00:00 grep d.bin
[root@rac3 mapper]# cat /etc/oracle/ocr.loc
ocrconfig_loc=/dev/raw/raw1
ocrmirrorconfig_loc=/dev/raw/raw2
local_only=FALSE
3 检查 crsd.log,发现没有任何日志:
[oracle@rac3 rac3]$ cd crsd/
[oracle@rac3 crsd]$ ls -al
total 76
drwxr-x--- 2 root oinstall 4096 May 3 19:03 .
drwxr-xr-t 8 root oinstall 4096 May 3 19:03 ..
-rw-r--r-- 1 root root 53171 May 11 10:38 crsd.log
[oracle@rac3 crsd]$ tail -n 50 crsd.log
2011-05-11 10:15:09.400: [ CRSRES][1484962144]0startRunnable: setting CLI values
2011-05-11 10:16:29.767: [ OCRUTL][1252067680]u_freem: mem passed is null
2011-05-11 10:32:25.140: [ CRSEVT][1487063392]0CAAMonitorHandler :: 0:Could not join /home/oracle/oracle/product/10.2.0/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child
2011-05-11 10:32:25.140: [ CRSEVT][1487063392]0CAAMonitorHandler :: 0:Action Script. /home/oracle/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.rac3.vip!
(timeout=60)
2011-05-11 10:32:25.140: [ CRSAPP][1487063392]0CheckResource error for ora.rac3.vip error code = -2
2011-05-11 10:38:14.731: [ CRSEVT][1499654496]0CAAMonitorHandler :: 0:Could not join /home/oracle/oracle/product/10.2.0/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child
2011-05-11 10:38:25.335: [ CRSEVT][1499654496]0CAAMonitorHandler :: 0:Action Script. /home/oracle/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.rac3.vip!
(timeout=60)
2011-05-11 10:38:25.335: [ CRSAPP][1499654496]0CheckResource error for ora.rac3.vip error code = -2
2011-05-11 10:38:33.140: [ CRSRES][1503856992]0In stateChanged, ora.rac.rac3.inst target is ONLINE
2011-05-11 10:38:33.141: [ CRSRES][1503856992]0ora.rac.rac3.inst on rac3 went OFFLINE unexpectedly
2011-05-11 10:38:33.141: [ CRSRES][1503856992]0StopResource: setting CLI values
2011-05-11 10:38:38.143: [ CRSRES][1503856992]0Attempting to stop `ora.rac.rac3.inst` on member `rac3`
4 注意到/etc/init.d/init.cssd startcheck ,于是检查/tmp
[root@rac3 log]# ps -ef
root 28128 1 0 11:17 ? 00:00:00 /usr/bin/gdm-binary -nodaemon
root 28132 1 0 11:17 ? 00:00:00 /bin/sh /etc/init.d/init.evmd run
root 28135 1 0 11:17 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root 28137 1 0 11:17 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 28678 28132 0 11:17 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root 28982 28135 0 11:17 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root 29026 28128 0 11:17 ? 00:00:00 /usr/bin/gdm-binary -nodaemon
root 29068 29026 0 11:17 ? 00:00:05 /usr/X11R6/bin/X :0 -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7
root 29201 28137 0 11:17 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
[root@rac3 mapper]# cd /tmp
[root@rac3 tmp]# ls -altr
-rw-r--r-- 1 oracle oinstall 148 May 11 12:34 crsctl.29201
-rw-r--r-- 1 oracle oinstall 148 May 11 12:34 crsctl.28982
-rw-r--r-- 1 oracle oinstall 148 May 11 12:34 crsctl.28678
[root@rac3 tmp]# cat crsctl.28678
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]
[root@rac3 tmp]#
[root@rac3 tmp]#
5 通过上述错误,应该是权限问题:
[root@rac3 tmp]# cd /dev/raw
[root@rac3 raw]# ls -al
total 0
drwxr-xr-x 2 root root 180 May 11 11:17 .
drwxr-xr-x 11 root root 7620 May 11 11:17 ..
crw-rw---- 1 root disk 162, 1 May 11 11:17 raw1
crw-rw---- 1 root disk 162, 2 May 11 11:17 raw2
crw-rw---- 1 root disk 162, 3 May 11 11:17 raw3
crw-rw---- 1 root disk 162, 4 May 11 11:17 raw4
crw-rw---- 1 root disk 162, 5 May 11 11:17 raw5
crw-rw---- 1 root disk 162, 6 May 11 11:17 raw6
crw-rw---- 1 root disk 162, 7 May 11 11:17 raw7
[root@rac3 raw]# chown -R oracle:dba /dev/raw
[root@rac3 raw]# chmod -R 777 /dev/raw
[root@rac3 raw]#
6 检查权限情况,启动crs:
[root@rac3 raw]# ls -al
total 0
drwxrwxrwx 2 oracle dba 180 May 11 11:17 .
drwxr-xr-x 11 root root 7620 May 11 11:17 ..
crwxrwxrwx 1 oracle dba 162, 1 May 11 11:17 raw1
crwxrwxrwx 1 oracle dba 162, 2 May 11 11:17 raw2
crwxrwxrwx 1 oracle dba 162, 3 May 11 11:17 raw3
crwxrwxrwx 1 oracle dba 162, 4 May 11 11:17 raw4
crwxrwxrwx 1 oracle dba 162, 5 May 11 11:17 raw5
crwxrwxrwx 1 oracle dba 162, 6 May 11 11:17 raw6
crwxrwxrwx 1 oracle dba 162, 7 May 11 11:17 raw7
[root@rac3 raw]#
[root@rac3 raw]# crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
[root@rac3 raw]# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[root@rac3 raw]# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
[root@rac3 raw]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.rac.db application ONLINE ONLINE rac2
ora....c1.inst application ONLINE ONLINE rac1
ora....c2.inst application ONLINE ONLINE rac2
ora....c3.inst application ONLINE ONLINE rac3
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
ora....SM3.asm application ONLINE ONLINE rac3
ora....C3.lsnr application ONLINE ONLINE rac3
ora.rac3.gsd application ONLINE ONLINE rac3
ora.rac3.ons application ONLINE ONLINE rac3
ora.rac3.vip application ONLINE ONLINE rac3
[root@rac3 raw]# ps -ef |grep d.bin
oracle 11568 11563 0 12:38 ? 00:00:00 /home/oracle/oracle/product/10.2.0/crs/bin/evmd.bin
root 11773 10601 0 12:38 ? 00:00:02 /home/oracle/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 12310 11853 0 12:38 ? 00:00:00 /home/oracle/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
oracle 12454 11890 0 12:38 ? 00:00:01 /home/oracle/oracle/product/10.2.0/crs/bin/ocssd.bin
root 22908 30006 0 13:03 pts/1 00:00:00 grep d.bin
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/758322/viewspace-695008/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/758322/viewspace-695008/