由于客户更换HBA 卡和光纤交换机接口后,后来发现数据库没起来,下面是处理过程
客户环境 两个 ibm p570 os 6100-04-01-0944 oracle 10.2.0.4
远程发现 第2 node ORACLE 安装软件的文件按系统 已经100%了,哎,肯定是哪个进程疯狂的写吧lv撑满。
查看 crs.log 发现基本所有信息都是这个
2014-10-02 21:54:15.523: [ OCRRAW][1]proprdc_propr_fcl: proprhandle_fcl->propr_fcl_page[3980]=0x0
2014-10-02 21:54:15.523: [ OCRRAW][1]proprdc_propr_fcl: proprhandle_fcl->propr_fcl_page[3981]=0x0
2014-10-02 21:54:15.523: [ OCRRAW][1]proprdc_propr_fcl: proprhandle_fcl->propr_fcl_page[3982]=0x0
这种报错在google上根本查不到,好吧,去MOS 看看 ,mos也比较少,找到了一些相似的问题,说是10.2.0.4 bug。
先查看 crs alert 日志文件,发现了重大信息
crsd(201070)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.
2014-10-02 22:32:28.215
[crsd(164818)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.
磁盘有问题啦。。
查看你2号节点 hdisk2 hdisk6 磁盘组属性,用户,权限等都是正常
crw-rw---- 1 oracle oinstall 24, 10 Oct 03 09:55 /dev/rhdisk10
crw-rw---- 1 oracle oinstall 24, 11 Oct 03 09:55 /dev/rhdisk11
crw-rw---- 1 oracle oinstall 24, 12 Oct 03 09:48 /dev/rhdisk12
crw-rw---- 1 oracle oinstall 24, 13 Oct 03 09:48 /dev/rhdisk13
crw-rw---- 1 oracle oinstall 24, 14 Oct 03 09:47 /dev/rhdisk14
crw-rw---- 1 oracle oinstall 24, 15 Oct 03 09:45 /dev/rhdisk15
crw-rw---- 1 root oinstall 24, 2 Oct 03 09:55 /dev/rhdisk2
crw-rw---- 1 oracle oinstall 24, 3 Oct 03 09:55 /dev/rhdisk3
crw-rw---- 1 oracle oinstall 24, 4 Oct 03 09:55 /dev/rhdisk4
crw-rw---- 1 oracle oinstall 24, 5 Oct 03 09:55 /dev/rhdisk5
crw-rw---- 1 root oinstall 24, 6 Oct 03 09:55 /dev/rhdisk6
crw-rw---- 1 oracle oinstall 24, 7 Oct 03 09:55 /dev/rhdisk7
crw-rw---- 1 oracle oinstall 24, 8 Oct 03 09:30 /dev/rhdisk8
crw-rw---- 1 oracle oinstall 24, 9 Oct 03 09:09 /dev/rhdisk9
再查看1号机器
crw-rw---- 1 oracle system 24, 10 Oct 03 09:55 /dev/rhdisk10
crw-rw---- 1 oracle system 24, 11 Oct 03 09:55 /dev/rhdisk11
crw-rw---- 1 root system 24, 12 Oct 03 09:48 /dev/rhdisk12
crw-rw---- 1 root system 24, 13 Oct 03 09:48 /dev/rhdisk13
crw-rw---- 1 root system 24, 14 Oct 03 09:47 /dev/rhdisk14
crw-rw---- 1 root system 24, 15 Oct 03 09:45 /dev/rhdisk15
crw-rw---- 1 root system 24, 2 Oct 03 09:55 /dev/rhdisk2
crw-rw---- 1 root system 24, 3 Oct 03 09:55 /dev/rhdisk3
crw-rw---- 1 root system 24, 4 Oct 03 09:55 /dev/rhdisk4
crw-rw---- 1 root system 24, 5 Oct 03 09:55 /dev/rhdisk5
crw-rw---- 1 root system 24, 6 Oct 03 09:55 /dev/rhdisk6
crw-rw---- 1 root system 24, 7 Oct 03 09:55 /dev/rhdisk7
crw-rw---- 1 root system 24, 8 Oct 03 09:30 /dev/rhdisk8
crw-rw---- 1 root system 24, 9 Oct 03 09:09 /dev/rhdisk9
把1号机器的磁盘权限和,数组改成和2号机器一样
crw-rw---- 1 oracle oinstall 24, 10 Oct 03 09:55 /dev/rhdisk10
crw-rw---- 1 oracle oinstall 24, 11 Oct 03 09:55 /dev/rhdisk11
crw-rw---- 1 oracle oinstall 24, 12 Oct 03 09:48 /dev/rhdisk12
crw-rw---- 1 oracle oinstall 24, 13 Oct 03 09:48 /dev/rhdisk13
crw-rw---- 1 oracle oinstall 24, 14 Oct 03 09:47 /dev/rhdisk14
crw-rw---- 1 oracle oinstall 24, 15 Oct 03 09:45 /dev/rhdisk15
crw-rw---- 1 root oinstall 24, 2 Oct 03 09:55 /dev/rhdisk2
crw-rw---- 1 oracle oinstall 24, 3 Oct 03 09:55 /dev/rhdisk3
crw-rw---- 1 oracle oinstall 24, 4 Oct 03 09:55 /dev/rhdisk4
crw-rw---- 1 oracle oinstall 24, 5 Oct 03 09:55 /dev/rhdisk5
crw-rw---- 1 root oinstall 24, 6 Oct 03 09:55 /dev/rhdisk6
crw-rw---- 1 oracle oinstall 24, 7 Oct 03 09:55 /dev/rhdisk7
crw-rw---- 1 oracle oinstall 24, 8 Oct 03 09:30 /dev/rhdisk8
crw-rw---- 1 oracle oinstall 24, 9 Oct 03 09:09 /dev/rhdisk9
但是2号好节点还是起不来,依然报同样的错误,
查看1号机器和2号机器的hdisk2 ,hdisk6 属性
PCM PCM/friend/otherapdisk Path Control Module False
PR_key_value none Persistant Reserve Key Value True
algorithm fail_over Algorithm True
autorecovery no Path/Ownership Autorecovery True
clr_q no Device CLEARS its Queue on error True
cntl_delay_time 0 Controller Delay Time True
cntl_hcheck_int 0 Controller Health Check Interval True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
hcheck_cmd inquiry Health Check Command True
hcheck_interval 60 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
location Location Label True
lun_id 0x0 Logical Unit Number ID False
lun_reset_spt yes LUN Reset Supported True
max_retry_delay 60 Maximum Quiesce Time True
max_transfer 0x40000 Maximum TRANSFER Size True
node_name 0x200400a0b811758c FC Node Name False
pvid none Physical volume identifier False
q_err yes Use QERR bit True
q_type simple Queuing TYPE True
queue_depth 10 Queue DEPTH True
reassign_to 120 REASSIGN time out value True
reserve_policy no_reserve Reserve Policy True
rw_timeout 30 READ/WRITE time out value True
scsi_id 0x10300 SCSI ID False
start_timeout 60 START unit time out value True
unique_id 3E213600A0B800011758C0000C04C4BBE8D500F1815 FAStT03IBMfcp Unique device identifier False
ww_name 0x201500a0b811758c FC World Wide Name False
再看1 号节点
PCM PCM/friend/otherapdisk Path Control Module False
PR_key_value none Persistant Reserve Key Value True
algorithm fail_over Algorithm True
autorecovery no Path/Ownership Autorecovery True
clr_q no Device CLEARS its Queue on error True
cntl_delay_time 0 Controller Delay Time True
cntl_hcheck_int 0 Controller Health Check Interval True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
hcheck_cmd inquiry Health Check Command True
hcheck_interval 60 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
location Location Label True
lun_id 0x0 Logical Unit Number ID False
lun_reset_spt yes LUN Reset Supported True
max_retry_delay 60 Maximum Quiesce Time True
max_transfer 0x40000 Maximum TRANSFER Size True
node_name 0x200400a0b811758c FC Node Name False
pvid none Physical volume identifier False
q_err yes Use QERR bit True
q_type simple Queuing TYPE True
queue_depth 10 Queue DEPTH True
reassign_to 120 REASSIGN time out value True
reserve_policy single_path Reserve Policy True
rw_timeout 30 READ/WRITE time out value True
scsi_id 0x10300 SCSI ID False
start_timeout 60 START unit time out value True
unique_id 3E213600A0B800011758C0000C04C4BBE8D500F1815 FAStT03IBMfcp Unique device identifier False
ww_name 0x201500a0b811758c FC World Wide Name False
发现 1号机器 hdisk2 和hdisk6 (ocr 磁盘)怎么是single_path 按道理应该是共享的。后来发现1号机器的所有rac 磁盘都是这样的。
立刻改掉
Root用户
for i in 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do chdev -l hdisk$i -a reserve_policy=no_reserve
do
结果发现 hdisk2 和hdisk6 改不了,设备比较busy
0514-062 Cannot perform the requested function because the
specified device is busy.
删除磁盘还是不行
# rmdev -dl hdisk6
Method error (/usr/lib/methods/ucfgdevice):
0514-062 Cannot perform the requested function because the
specified device is busy.
想想应该是2号机器 把ocr磁盘占用了,所以我怎么操作都不允许
查看crs进程
oracle 196786 155908 0 09:05:18 - 0:00 /oracle/product/10.2.0/crs/bin/oclsomon.bin
root 103266 102694 1 09:05:17 - 0:47 /oracle/product/10.2.0/crs/bin/crsd.bin reboot
oracle 107362 192550 0 09:05:19 - 0:05 /oracle/product/10.2.0/crs/bin/ocssd.bin
1号机器 停止crs,发现crs的进程还是存在,这里介绍一下1号节点自从前几天换了hba,手动停止 crsctl stop crs 命令感觉不好使了
重启1号机器还是更改不了磁盘,停止不了crs,索性root用户进制crs自动启动,再重启两个机器
As root user on all node
cd /etc/
# ./init.crs disable crs
启动之后这下没有任何crs进程 ,1号机器尝试更改磁盘属性,这下可以了。。哈哈
# ps -ef |grep crs
root 102694 1 0 08:54:41 - 0:00 /bin/sh /etc/init.crsd run
root 151958 180262 0 08:59:54 pts/0 0:00 grep crs
# chdev -l hdisk2 -a reserve_policy=no_reserve
hdisk2 changed
# chdev -l hdisk6 -a reserve_policy=no_reserve
hdisk6 changed
#
现在在2号节点启动crs
# ./crsct start crs
查看 crs alertlog
[crsd(201070)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.
2014-10-02 22:32:28.215
[crsd(164818)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.
2014-10-02 22:32:28.476
[crsd(164818)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870336 to 169870336. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.
2014-10-02 22:32:28.477
[crsd(164818)]CRS-1012:The OCR service started on node jxsmdb2.
2014-10-02 22:32:28.751
[crsd(164818)]CRS-1201:CRSD started on node jxsmdb2.
[cssd(70408)]CRS-1603:CSSD on node jxsmdb2 shutdown by user.
2014-10-03 09:05:23.615
[cssd(107362)]CRS-1605:CSSD voting file is online: /dev/rhdisk4. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/cssd/ocssd.log.
2014-10-03 09:05:23.815
[cssd(107362)]CRS-1605:CSSD voting file is online: /dev/rhdisk3. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/cssd/ocssd.log.
2014-10-03 09:05:23.815
[cssd(107362)]CRS-1605:CSSD voting file is online: /dev/rhdisk5. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/cssd/ocssd.log.
[cssd(107362)]CRS-1601:CSSD Reconfiguration complete. Active nodes are jxsmdb2 .
2014-10-03 09:08:44.541
[evmd(99266)]CRS-1401:EVMD started on node jxsmdb2.
2014-10-03 09:08:44.585
[crsd(103266)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870336 to 169870336. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.
2014-10-03 09:08:44.586
[crsd(103266)]CRS-1012:The OCR service started on node jxsmdb2.
2014-10-03 09:08:46.874
[crsd(103266)]CRS-1201:CRSD started on node jxsmdb2.
2014-10-03 09:08:47.163
[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.
2014-10-03 09:08:47.183
[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.
2014-10-03 09:09:43.287
[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.
2014-10-03 09:09:43.297
[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.
2014-10-03 09:09:45.746
[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.
查看crsd.log
2014-10-03 09:05:19.356: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2014-10-03 09:05:19.357: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2014-10-03 09:05:20.702: [ COMMCRS][261]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))
2014-10-03 09:05:20.702: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2014-10-03 09:05:20.702: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2014-10-03 09:05:22.041: [ COMMCRS][263]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))
2014-10-03 09:05:22.041: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2014-10-03 09:05:22.041: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2014-10-03 09:05:23.380: [ COMMCRS][265]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))
2014-10-03 09:05:23.380: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2014-10-03 09:05:23.380: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2014-10-03 09:08:44.482: [ CLSVER][1]32Active Version from OCR:10.2.0.4.0
2014-10-03 09:08:44.482: [ CLSVER][1]32Active Version and Software Version are same
2014-10-03 09:08:44.485: [ CRSMAIN][1]32Initializing OCR
2014-10-03 09:08:44.491: [ OCRRAW][1]proprioo: for disk 0 (/dev/rhdisk2), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)
2014-10-03 09:08:44.491: [ OCRRAW][1]proprioo: for disk 1 (/dev/rhdisk6), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)
2014-10-03 09:08:44.574: [ OCRMAS][3352]th_master:12: I AM THE NEW OCR MASTER at incar 1. Node Number 2
2014-10-03 09:08:44.575: [ OCRRAW][3352]proprioo: for disk 0 (/dev/rhdisk2), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)
2014-10-03 09:08:44.575: [ OCRRAW][3352]proprioo: for disk 1 (/dev/rhdisk6), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)
2014-10-03 09:08:44.596: [ OCRMAS][3352]th_master: Deleted ver keys from cache (master)
2014-10-03 09:08:44.596: [ CRSD][1]32ENV Logging level for Module: allcomp 0
2014-10-03 09:08:44.597: [ CRSD][1]32ENV Logging level for Module: default 0
2014-10-03 09:08:44.598: [ CRSD][1]32ENV Logging level for Module: COMMCRS 0
2014-10-03 09:08:44.598: [ CRSD][1]32ENV Logging level for Module: COMMNS 0
2014-10-03 09:08:44.599: [ CRSD][1]32ENV Logging level for Module: CRSUI 0
2014-10-03 09:08:44.600: [ CRSD][1]32ENV Logging level for Module: CRSCOMM 0
2014-10-03 09:08:44.600: [ CRSD][1]32ENV Logging level for Module: CRSRTI 0
2014-10-03 09:08:44.601: [ CRSD][1]32ENV Logging level for Module: CRSMAIN 0
2014-10-03 09:08:44.602: [ CRSD][1]32ENV Logging level for Module: CRSPLACE 0
2014-10-03 09:08:44.603: [ CRSD][1]32ENV Logging level for Module: CRSAPP 0
2014-10-03 09:08:44.603: [ CRSD][1]32ENV Logging level for Module: CRSRES 0
2014-10-03 09:08:44.604: [ CRSD][1]32ENV Logging level for Module: CRSOCR 0
2014-10-03 09:08:44.605: [ CRSD][1]32ENV Logging level for Module: CRSTIMER 0
2014-10-03 09:08:44.605: [ CRSD][1]32ENV Logging level for Module: CRSEVT 0
2014-10-03 09:08:44.606: [ CRSD][1]32ENV Logging level for Module: CRSD 0
2014-10-03 09:08:44.607: [ CRSD][1]32ENV Logging level for Module: CLUCLS 0
2014-10-03 09:08:44.607: [ CRSD][1]32ENV Logging level for Module: CLSVER 0
2014-10-03 09:08:44.608: [ CRSD][1]32ENV Logging level for Module: OCRRAW 0
2014-10-03 09:08:44.609: [ CRSD][1]32ENV Logging level for Module: OCROSD 0
2014-10-03 09:08:44.609: [ CRSD][1]32ENV Logging level for Module: CSSCLNT 0
2014-10-03 09:08:44.610: [ CRSD][1]32ENV Logging level for Module: OCRAPI 0
2014-10-03 09:08:44.611: [ CRSD][1]32ENV Logging level for Module: OCRUTL 0
2014-10-03 09:08:44.612: [ CRSD][1]32ENV Logging level for Module: OCRMSG 0
2014-10-03 09:08:44.612: [ CRSD][1]32ENV Logging level for Module: OCRCLI 0
2014-10-03 09:08:44.613: [ CRSD][1]32ENV Logging level for Module: OCRCAC 0
2014-10-03 09:08:44.614: [ CRSD][1]32ENV Logging level for Module: OCRSRV 0
2014-10-03 09:08:44.614: [ CRSD][1]32ENV Logging level for Module: OCRMAS 0
2014-10-03 09:08:44.615: [ CRSMAIN][1]32Filename is /oracle/product/10.2.0/crs/crs/init/jxsmdb2.pid
2014-10-03 09:08:44.651: [ CRSMAIN][1]32Using Authorizer location: /oracle/product/10.2.0/crs/crs/auth/
[ clsdmt][8235]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=jxsmdb2DBG_CRSD))
2014-10-03 09:08:44.667: [ CRSMAIN][1]32Initializing RTI
2014-10-03 09:08:44.719: [CRSTIMER][8749]32Timer Thread Starting.
2014-10-03 09:08:44.740: [ CRSRES][1]32Parameter SECURITY = 1, running in USER Mode
2014-10-03 09:08:44.743: [ CRSMAIN][1]32Initializing EVMMgr
2014-10-03 09:08:44.942: [ COMMCRS][9006]clsc_connect: (1139c41d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2014-10-03 09:08:46.745: [ CRSMAIN][1]32CRSD locked during state recovery, please wait.
2014-10-03 09:08:46.824: [ CRSMAIN][1]32CRSD recovered, unlocked.
2014-10-03 09:08:46.847: [ CRSMAIN][1]32QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))
2014-10-03 09:08:46.847: [ CRSMAIN][1]32QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))
2014-10-03 09:08:46.855: [ CRSMAIN][1]32CRSD UI socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2014-10-03 09:08:46.873: [ CRSMAIN][1]32E2E socket on: (ADDRESS=(PROTOCOL=tcp)(HOST=jxsmdb2_priv)(PORT=49896))
2014-10-03 09:08:46.873: [ CRSMAIN][1]32Starting Threads
2014-10-03 09:08:46.874: [ CRSMAIN][10292]32Starting runCommandServer for (UI = 1, E2E = 0). 0
2014-10-03 09:08:46.874: [ CRSMAIN][10549]32Starting runCommandServer for (UI = 1, E2E = 0). 1
2014-10-03 09:08:46.874: [ CRSMAIN][1]32CRS Daemon Started.
2014-10-03 09:08:46.888: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.901: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.911: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.925: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.934: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.942: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.950: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.958: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.966: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.974: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.983: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.991: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:46.999: [ CRSRES][1]32 startup = 1
2014-10-03 09:08:47.173: [ CRSRES][11834]32startRunnable: setting CLI values
2014-10-03 09:08:47.188: [ CRSRES][11834]32Attempting to start `ora.jxsmdb2.vip` on member `jxsmdb2`
2014-10-03 09:08:47.189: [ CRSRES][11577]32startRunnable: setting CLI values
2014-10-03 09:08:47.199: [ CRSRES][11577]32Attempting to start `ora.jxsmdb2.ASM2.asm` on member `jxsmdb2`
2014-10-03 09:08:49.742: [ CRSRES][11834]32Start of `ora.jxsmdb2.vip` on member `jxsmdb2` succeeded.
2014-10-03 09:08:49.775: [ CRSRES][11834]32startRunnable: setting CLI values
2014-10-03 09:08:49.783: [ CRSRES][11834]32Attempting to start `ora.jxsmdb2.LISTENER_JXSMDB2.lsnr` on member `jxsmdb2`
2014-10-03 09:08:53.948: [ CRSRES][11834]32Start of `ora.jxsmdb2.LISTENER_JXSMDB2.lsnr` on member `jxsmdb2` succeeded.
2014-10-03 09:08:54.410: [ CRSRES][12619]32CRS-1002: Resource 'ora.jxsmdb2.LISTENER_JXSMDB2.lsnr' is already running on member 'jxsmdb2'
2014-10-03 09:09:08.992: [ CRSRES][12625]32startRunnable: setting CLI values
2014-10-03 09:09:08.999: [ CRSRES][12625]32Attempting to start `ora.jxsmdb2.ons` on member `jxsmdb2`
2014-10-03 09:09:11.139: [ CRSRES][12625]32Start of `ora.jxsmdb2.ons` on member `jxsmdb2` succeeded.
2014-10-03 09:09:11.216: [ CRSRES][11577]32Start of `ora.jxsmdb2.ASM2.asm` on member `jxsmdb2` succeeded.
2014-10-03 09:09:11.239: [ CRSRES][11577]32startRunnable: setting CLI values
2014-10-03 09:09:11.244: [ CRSRES][11577]32Attempting to start `ora.jxsmk.jxsmk2.inst` on member `jxsmdb2`
2014-10-03 09:09:43.269: [ CRSRES][11577]32Start of `ora.jxsmk.jxsmk2.inst` on member `jxsmdb2` succeeded.
2014-10-03 09:09:43.277: [ CRSRES][12894]32Skip online resource: ora.jxsmdb2.ons
2014-10-03 09:09:43.319: [ CRSRES][13151]32startRunnable: setting CLI values
2014-10-03 09:09:43.345: [ CRSRES][12637]32startRunnable: setting CLI values
2014-10-03 09:09:43.349: [ CRSRES][13151]32Attempting to start `ora.jxsmk.db` on member `jxsmdb2`
2014-10-03 09:09:43.358: [ CRSRES][11610]32startRunnable: setting CLI values
2014-10-03 09:09:43.365: [ CRSRES][12637]32Attempting to start `ora.jxsmdb2.gsd` on member `jxsmdb2`
2014-10-03 09:09:43.371: [ CRSRES][11610]32Attempting to start `ora.jxsmdb1.vip` on member `jxsmdb2`
2014-10-03 09:09:43.916: [ CRSRES][13151]32Start of `ora.jxsmk.db` on member `jxsmdb2` succeeded.
2014-10-03 09:09:44.378: [ CRSRES][12637]32Start of `ora.jxsmdb2.gsd` on member `jxsmdb2` succeeded.
2014-10-03 09:09:44.416: [ CRSRES][13668]32CRS-1002: Resource 'ora.jxsmk.db' is already running on member 'jxsmdb2'
2014-10-03 09:09:45.730: [ CRSRES][11610]32Start of `ora.jxsmdb1.vip` on member `jxsmdb2` succeeded.
查看ocssd.log
jxsmdb2->cd cssd
jxsmdb2->tail -f ocssd.log
[ CSSD]2014-10-03 09:05:23.603 [1] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2014-10-03 09:05:23.692 [2829] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=jxsmdb2_priv)(PORT=49895))
[ CSSD]2014-10-03 09:05:23.699 [2829] >TRACE: clssnmconnect: connecting to node(1), con(1112d8b10), flags 0x0003
[ CSSD]2014-10-03 09:05:23.700 [2829] >TRACE: clssnmDiscHelper: jxsmdb1, node(1) connection failed, con (1112d8b10), probe(0)
[ CSSD]2014-10-03 09:05:23.741 [3086] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_2))
[ CSSD]2014-10-03 09:05:23.741 [3086] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))
[ CSSD]2014-10-03 09:05:23.752 [3857] >TRACE: clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=25)(HOST=191.191.191.101)(PORT=32823))
[ CSSD]2014-10-03 09:05:23.804 [1544] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(3) wrtcnt(78639) LATS(190241056) Disk lastSeqNo(78639)
[ CSSD]2014-10-03 09:05:30.781 [4628] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2014-10-03 09:08:44.082 [4628] >WARNING: clssnmLocalJoinEvent: takeover succ
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmDoSyncUpdate: Initiating sync 1
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (27000)ms
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmSendSync: syncSeqNo(1)
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1)
[ CSSD]2014-10-03 09:08:44.082 [2829] >TRACE: clssnmHandleSync: diskTimeout set to (27000)ms
[ CSSD]2014-10-03 09:08:44.082 [2829] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[jxsmdb2] seq[1] sync[1]
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmDoSyncUpdate: node(2) is transitioning from joining state to active state
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1)
[ CSSD]2014-10-03 09:08:44.082 [2829] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(1)
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmWaitForAcks: done, msg type(13)
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmCheckDskInfo: diskTimeout set to (200000)ms
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmEvict: Start
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmSetupAckWait: Ack message type (15)
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2014-10-03 09:08:44.082 [4628] >TRACE: clssnmSendUpdate: syncSeqNo(1)
[ CSSD]2014-10-03 09:08:44.083 [4628] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1)
[ CSSD]2014-10-03 09:08:44.083 [2829] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2014-10-03 09:08:44.083 [2829] >TRACE: clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2014-10-03 09:08:44.083 [2829] >TRACE: clssnmUpdateNodeState: node 2, state (2/3) unique (1412298321/1412298321) prevConuni(0) birth (1/1) (old/new)
[ CSSD]2014-10-03 09:08:44.083 [2829] >USER: clssnmHandleUpdate: SYNC(1) from node(2) completed
[ CSSD]2014-10-03 09:08:44.083 [2829] >USER: clssnmHandleUpdate: NODE 2 (jxsmdb2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2014-10-03 09:08:44.083 [2829] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2014-10-03 09:08:44.083 [4628] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2014-10-03 09:08:44.083 [4628] >TRACE: clssnmDoSyncUpdate: Sync 1 complete!
[ CSSD]2014-10-03 09:08:44.101 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2014-10-03 09:08:44.105 [4885] >TRACE: clssgmReconfigThread: started for reconfig (1)
[ CSSD]2014-10-03 09:08:44.105 [4885] >USER: NMEVENT_RECONFIG [00][00][00][04]
[ CSSD]2014-10-03 09:08:44.105 [4885] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
[ CSSD]2014-10-03 09:08:44.105 [3857] >TRACE: clssgmPeerListener: connects done (1/1)
[ CSSD]2014-10-03 09:08:44.105 [4885] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(2) birth(1)
[ CSSD]2014-10-03 09:08:44.105 [4885] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2014-10-03 09:08:44.105 [4885] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2014-10-03 09:08:44.105 [4885] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
[ CSSD]CLSS-3001: local node number 2, master node number 2
[ CSSD]2014-10-03 09:08:44.105 [4885] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
[ CSSD]2014-10-03 09:08:44.266 [3086] >TRACE: clssgmCommonAddMember: clsomon joined (2/0x1000000/#CSS_CLSSOMON
查看crs 服务
crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE OFFLINE
ora....B1.lsnr application ONLINE OFFLINE
ora....db1.gsd application ONLINE OFFLINE
ora....db1.ons application ONLINE OFFLINE
ora....db1.vip application ONLINE ONLINE jxsmdb2
ora....SM2.asm application ONLINE ONLINE jxsmdb2
ora....B2.lsnr application ONLINE ONLINE jxsmdb2
ora....db2.gsd application ONLINE ONLINE jxsmdb2
ora....db2.ons application ONLINE ONLINE jxsmdb2
ora....db2.vip application ONLINE ONLINE jxsmdb2
ora.jxsmk.db application ONLINE ONLINE jxsmdb2
ora....k1.inst application ONLINE OFFLINE
ora....k2.inst application ONLINE ONLINE jxsmdb2
数据库终于在2号节点起来了
现在想想最开始的mos说的bug的原因估计是 ocr无法访问,导致的。Mos上说的打pach 应该是在磁盘,硬件,系统都没啥问题的情形下。
启动1号机器crs
->tail -f al*
[cssd(143604)]CRS-1605:CSSD voting file is online: /dev/rhdisk5. Details in /oracle/product/10.2.0/crs/log/jxsmdb1/cssd/ocssd.log.
[cssd(143604)]CRS-1601:CSSD Reconfiguration complete. Active nodes are jxsmdb1 jxsmdb2 .
2014-09-30 17:39:11.803
[crsd(139286)]CRS-1012:The OCR service started on node jxsmdb1.
2014-09-30 17:39:12.848
[evmd(151694)]CRS-1401:EVMD started on node jxsmdb1.
2014-09-30 17:39:15.807
[crsd(139286)]CRS-1201:CRSD started on node jxsmdb1.
2014-10-02 00:05:49.042
[crsd(159746)]CRS-1011:OCR cannot determine that the OCR content contains the latest updates. Details in /oracle/product/10.2.0/crs/log/jxsmdb1/crsd/crsd.log.
Terminated
可以看到crs起来了
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/26175573/viewspace-1290649/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/26175573/viewspace-1290649/