客户把rac两个机器重启后,rac101无法正常启动。
[root@bogon crsd]# crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ora....o1.inst application 0/5 0/0 ONLINE OFFLINE
ora....o2.inst application 0/5 0/0 ONLINE ONLINE rac102
ora....uo2.srv application 0/1 0/0 ONLINE UNKNOWN rac101
ora....rver.cs application 0/1 0/1 ONLINE UNKNOWN rac101
ora.benguo.db application 0/1 0/1 ONLINE ONLINE rac101
ora....SM1.asm application 0/5 0/0 ONLINE UNKNOWN rac101
ora....01.lsnr application 0/5 0/0 ONLINE UNKNOWN rac101
ora.rac101.gsd application 0/5 0/0 ONLINE UNKNOWN rac101
ora.rac101.ons application 0/3 0/0 ONLINE UNKNOWN rac101
ora.rac101.vip application 0/0 0/0 ONLINE ONLINE rac101
ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac102
ora....02.lsnr application 0/5 0/0 ONLINE ONLINE rac102
ora.rac102.gsd application 0/5 0/0 ONLINE ONLINE rac102
ora.rac102.ons application 0/3 0/0 ONLINE ONLINE rac102
ora.rac102.vip application 0/0 0/0 ONLINE ONLINE rac102
由于资源全部是unknown,其实10.2.0.1的clusterware db本来就bug比较多,经常会碰见资源unknown的状态。
这里尝试关闭crs而后重启crs
[root@bogon crsd]# crsctl stop crs
clsz init failed while trying to stop resources.
Possible cause: CRSD is down.
Failure at scls_scr_create with code 1
Internal Error Information:
Category: 1234
Operation: scls_scr_create
Location: mkdir
Other: Unable to make user dir
Dep: 2
[root@bogon crsd]# crsctl check crs
Failure 1 contacting CSS daemon
CRS appears healthy
EVM appears healthy
Css模块无法去连接,节点rac102可以正常启动的,不过查看rac101的css进程确是运行的。
[root@bogon cssd]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
192.168.0.2 rac101
192.168.0.3 rac102
192.168.1.2 priv101
192.168.1.3 priv102
192.168.0.12 vip101
192.168.0.13 vip102
root@bogon原来客户跟换主板后,主机名直接取的域名,原因找到了,其实质而客户对rac101换了相应的主板,也是因为主机名改后,导致节点互ping心跳机制出现问题,而后rac101也就无法去定位css资源。
而后crs正常启动后又出现了如下问题:
Tomcat中出现大量的:
ORA-12519, TNS:no appropriate service handler found
[root@rac101 ~]# ps -ef|grep lsn
oracle 10641 1 0 16:04 ? 00:00:00 /db/oracle10gasm/product/10.2.0/asm/bin/tnslsnr LISTENER_RAC101 -inherit
root 16629 15404 0 16:08 pts/2 00:00:00 grep lsn
启动的是asm目录下监听程序,也就是这个监听程序仅仅只监听是关于asm实例的,当然客户端无法连接,手动kill掉这个监听后,手动启动rac101的监听。
[oracle@rac101 ~]$ lsnrctl start
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 17-AUG-2012 16:09:20
Copyright (c) 1991, 2005, Oracle. All rights reserved.
Starting /db/oracle10grac/product/10.2.0/db/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 10.2.0.1.0 - Production
System parameter file is /db/oracle10grac/product/10.2.0/db/network/admin/listener.ora
Log messages written to /db/oracle10grac/product/10.2.0/db/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac101)(PORT=1521)))
处理rac asm的问题自己还仅仅只停留在理论上面,特别对于主机和网络上面要优先排查,以后在高可用上面还要多花功夫!
[@more@]
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/25362835/viewspace-1059205/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/25362835/viewspace-1059205/