客户一套ORACLE 10.2.0.4 的crs 问题处理

由于客户更换HBA 卡和光纤交换机接口后,后来发现数据库没起来,下面是处理过程

 客户环境 两个 ibm p570  os  6100-04-01-0944 oracle 10.2.0.4

 

远程发现 第2 node ORACLE 安装软件的文件按系统 已经100%了,哎,肯定是哪个进程疯狂的写吧lv撑满。

 查看 crs.log 发现基本所有信息都是这个

 

2014-10-02 21:54:15.523: [  OCRRAW][1]proprdc_propr_fcl: proprhandle_fcl->propr_fcl_page[3980]=0x0

2014-10-02 21:54:15.523: [  OCRRAW][1]proprdc_propr_fcl: proprhandle_fcl->propr_fcl_page[3981]=0x0

2014-10-02 21:54:15.523: [  OCRRAW][1]proprdc_propr_fcl: proprhandle_fcl->propr_fcl_page[3982]=0x0

这种报错在google上根本查不到,好吧,去MOS 看看 ,mos也比较少,找到了一些相似的问题,说是10.2.0.4 bug。

 

   先查看 crs alert 日志文件,发现了重大信息

 

crsd(201070)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-02 22:32:28.215

[crsd(164818)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

 

磁盘有问题啦。。

查看你2号节点 hdisk2 hdisk6 磁盘组属性,用户,权限等都是正常

 

crw-rw----    1 oracle   oinstall     24, 10 Oct 03 09:55 /dev/rhdisk10

crw-rw----    1 oracle   oinstall     24, 11 Oct 03 09:55 /dev/rhdisk11

crw-rw----    1 oracle   oinstall     24, 12 Oct 03 09:48 /dev/rhdisk12

crw-rw----    1 oracle   oinstall     24, 13 Oct 03 09:48 /dev/rhdisk13

crw-rw----    1 oracle   oinstall     24, 14 Oct 03 09:47 /dev/rhdisk14

crw-rw----    1 oracle   oinstall     24, 15 Oct 03 09:45 /dev/rhdisk15

crw-rw----    1 root     oinstall     24,  2 Oct 03 09:55 /dev/rhdisk2

crw-rw----    1 oracle   oinstall     24,  3 Oct 03 09:55 /dev/rhdisk3

crw-rw----    1 oracle   oinstall     24,  4 Oct 03 09:55 /dev/rhdisk4

crw-rw----    1 oracle   oinstall     24,  5 Oct 03 09:55 /dev/rhdisk5

crw-rw----    1 root     oinstall     24,  6 Oct 03 09:55 /dev/rhdisk6

crw-rw----    1 oracle   oinstall     24,  7 Oct 03 09:55 /dev/rhdisk7

crw-rw----    1 oracle   oinstall     24,  8 Oct 03 09:30 /dev/rhdisk8

crw-rw----    1 oracle   oinstall     24,  9 Oct 03 09:09 /dev/rhdisk9

 

再查看1号机器

crw-rw----    1 oracle   system     24, 10 Oct 03 09:55 /dev/rhdisk10

crw-rw----    1 oracle   system     24, 11 Oct 03 09:55 /dev/rhdisk11

crw-rw----    1 root   system     24, 12 Oct 03 09:48 /dev/rhdisk12

crw-rw----    1 root   system     24, 13 Oct 03 09:48 /dev/rhdisk13

crw-rw----    1 root   system     24, 14 Oct 03 09:47 /dev/rhdisk14

crw-rw----    1 root   system     24, 15 Oct 03 09:45 /dev/rhdisk15

crw-rw----    1 root     system     24,  2 Oct 03 09:55 /dev/rhdisk2

crw-rw----    1 root   system     24,  3 Oct 03 09:55 /dev/rhdisk3

crw-rw----    1 root   system     24,  4 Oct 03 09:55 /dev/rhdisk4

crw-rw----    1 root   system     24,  5 Oct 03 09:55 /dev/rhdisk5

crw-rw----    1 root     system     24,  6 Oct 03 09:55 /dev/rhdisk6

crw-rw----    1 root   system     24,  7 Oct 03 09:55 /dev/rhdisk7

crw-rw----    1 root   system     24,  8 Oct 03 09:30 /dev/rhdisk8

crw-rw----    1 root   system     24,  9 Oct 03 09:09 /dev/rhdisk9

 

把1号机器的磁盘权限和,数组改成和2号机器一样

crw-rw----    1 oracle   oinstall     24, 10 Oct 03 09:55 /dev/rhdisk10

crw-rw----    1 oracle   oinstall     24, 11 Oct 03 09:55 /dev/rhdisk11

crw-rw----    1 oracle   oinstall     24, 12 Oct 03 09:48 /dev/rhdisk12

crw-rw----    1 oracle   oinstall     24, 13 Oct 03 09:48 /dev/rhdisk13

crw-rw----    1 oracle   oinstall     24, 14 Oct 03 09:47 /dev/rhdisk14

crw-rw----    1 oracle   oinstall     24, 15 Oct 03 09:45 /dev/rhdisk15

crw-rw----    1 root     oinstall     24,  2 Oct 03 09:55 /dev/rhdisk2

crw-rw----    1 oracle   oinstall     24,  3 Oct 03 09:55 /dev/rhdisk3

crw-rw----    1 oracle   oinstall     24,  4 Oct 03 09:55 /dev/rhdisk4

crw-rw----    1 oracle   oinstall     24,  5 Oct 03 09:55 /dev/rhdisk5

crw-rw----    1 root     oinstall     24,  6 Oct 03 09:55 /dev/rhdisk6

crw-rw----    1 oracle   oinstall     24,  7 Oct 03 09:55 /dev/rhdisk7

crw-rw----    1 oracle   oinstall     24,  8 Oct 03 09:30 /dev/rhdisk8

crw-rw----    1 oracle   oinstall     24,  9 Oct 03 09:09 /dev/rhdisk9

 

但是2号好节点还是起不来,依然报同样的错误,

查看1号机器和2号机器的hdisk2 ,hdisk6 属性

PCM             PCM/friend/otherapdisk                                         Path Control Module              False

PR_key_value    none                                                           Persistant Reserve Key Value     True

algorithm       fail_over                                                      Algorithm                        True

autorecovery    no                                                             Path/Ownership Autorecovery      True

clr_q           no                                                             Device CLEARS its Queue on error True

cntl_delay_time 0                                                              Controller Delay Time            True

cntl_hcheck_int 0                                                              Controller Health Check Interval True

dist_err_pcnt   0                                                              Distributed Error Percentage     True

dist_tw_width   50                                                             Distributed Error Sample Time    True

hcheck_cmd      inquiry                                                        Health Check Command             True

hcheck_interval 60                                                             Health Check Interval            True

hcheck_mode     nonactive                                                      Health Check Mode                True

location                                                                       Location Label                   True

lun_id          0x0                                                            Logical Unit Number ID           False

lun_reset_spt   yes                                                            LUN Reset Supported              True

max_retry_delay 60                                                             Maximum Quiesce Time             True

max_transfer    0x40000                                                        Maximum TRANSFER Size            True

node_name       0x200400a0b811758c                                             FC Node Name                     False

pvid            none                                                           Physical volume identifier       False

q_err           yes                                                            Use QERR bit                     True

q_type          simple                                                         Queuing TYPE                     True

queue_depth     10                                                             Queue DEPTH                      True

reassign_to     120                                                            REASSIGN time out value          True

reserve_policy  no_reserve                                                     Reserve Policy                   True

rw_timeout      30                                                             READ/WRITE time out value        True

scsi_id         0x10300                                                        SCSI ID                          False

start_timeout   60                                                             START unit time out value        True

unique_id       3E213600A0B800011758C0000C04C4BBE8D500F1815      FAStT03IBMfcp Unique device identifier         False

ww_name         0x201500a0b811758c                                             FC World Wide Name               False

再看1 号节点

PCM             PCM/friend/otherapdisk                                         Path Control Module              False

PR_key_value    none                                                           Persistant Reserve Key Value     True

algorithm       fail_over                                                      Algorithm                        True

autorecovery    no                                                             Path/Ownership Autorecovery      True

clr_q           no                                                             Device CLEARS its Queue on error True

cntl_delay_time 0                                                              Controller Delay Time            True

cntl_hcheck_int 0                                                              Controller Health Check Interval True

dist_err_pcnt   0                                                              Distributed Error Percentage     True

dist_tw_width   50                                                             Distributed Error Sample Time    True

hcheck_cmd      inquiry                                                        Health Check Command             True

hcheck_interval 60                                                             Health Check Interval            True

hcheck_mode     nonactive                                                      Health Check Mode                True

location                                                                       Location Label                   True

lun_id          0x0                                                            Logical Unit Number ID           False

lun_reset_spt   yes                                                            LUN Reset Supported              True

max_retry_delay 60                                                             Maximum Quiesce Time             True

max_transfer    0x40000                                                        Maximum TRANSFER Size            True

node_name       0x200400a0b811758c                                             FC Node Name                     False

pvid            none                                                           Physical volume identifier       False

q_err           yes                                                            Use QERR bit                     True

q_type          simple                                                         Queuing TYPE                     True

queue_depth     10                                                             Queue DEPTH                      True

reassign_to     120                                                            REASSIGN time out value          True

reserve_policy  single_path                                                     Reserve Policy                   True

rw_timeout      30                                                             READ/WRITE time out value        True

scsi_id         0x10300                                                        SCSI ID                          False

start_timeout   60                                                             START unit time out value        True

unique_id       3E213600A0B800011758C0000C04C4BBE8D500F1815      FAStT03IBMfcp Unique device identifier         False

ww_name         0x201500a0b811758c                                             FC World Wide Name               False

 

发现 1号机器 hdisk2 和hdisk6 (ocr 磁盘)怎么是single_path  按道理应该是共享的。后来发现1号机器的所有rac 磁盘都是这样的。

立刻改掉

 Root用户

for i in 2 3 4 5 6 7 8 9 10 11 12 13 14 15

          do chdev -l hdisk$i -a reserve_policy=no_reserve

          do

结果发现 hdisk2 和hdisk6 改不了,设备比较busy

0514-062 Cannot perform the requested function because the

                 specified device is busy.

删除磁盘还是不行

#  rmdev -dl hdisk6

Method error (/usr/lib/methods/ucfgdevice):

        0514-062 Cannot perform the requested function because the

                 specified device is busy.

想想应该是2号机器 把ocr磁盘占用了,所以我怎么操作都不允许

查看crs进程

oracle 196786 155908   0 09:05:18      -  0:00 /oracle/product/10.2.0/crs/bin/oclsomon.bin

root 103266 102694   1 09:05:17      -  0:47 /oracle/product/10.2.0/crs/bin/crsd.bin reboot

  oracle 107362 192550   0 09:05:19      -  0:05 /oracle/product/10.2.0/crs/bin/ocssd.bin

1号机器 停止crs,发现crs的进程还是存在,这里介绍一下1号节点自从前几天换了hba,手动停止 crsctl stop crs 命令感觉不好使了

  重启1号机器还是更改不了磁盘,停止不了crs,索性root用户进制crs自动启动,再重启两个机器

As root user on all node

cd /etc/

# ./init.crs disable crs

 

启动之后这下没有任何crs进程 ,1号机器尝试更改磁盘属性,这下可以了。。哈哈

# ps -ef |grep crs

    root 102694      1   0 08:54:41      -  0:00 /bin/sh /etc/init.crsd run

root 151958 180262   0 08:59:54  pts/0  0:00 grep crs

# chdev -l hdisk2 -a reserve_policy=no_reserve

hdisk2 changed

# chdev -l hdisk6 -a reserve_policy=no_reserve

hdisk6 changed

#

现在在2号节点启动crs

# ./crsct start crs

  查看 crs  alertlog

 [crsd(201070)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-02 22:32:28.215

[crsd(164818)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-02 22:32:28.476

[crsd(164818)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870336 to 169870336. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-02 22:32:28.477

[crsd(164818)]CRS-1012:The OCR service started on node jxsmdb2.

2014-10-02 22:32:28.751

[crsd(164818)]CRS-1201:CRSD started on node jxsmdb2.

[cssd(70408)]CRS-1603:CSSD on node jxsmdb2 shutdown by user.

2014-10-03 09:05:23.615

[cssd(107362)]CRS-1605:CSSD voting file is online: /dev/rhdisk4. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/cssd/ocssd.log.

2014-10-03 09:05:23.815

[cssd(107362)]CRS-1605:CSSD voting file is online: /dev/rhdisk3. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/cssd/ocssd.log.

2014-10-03 09:05:23.815

[cssd(107362)]CRS-1605:CSSD voting file is online: /dev/rhdisk5. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/cssd/ocssd.log.

[cssd(107362)]CRS-1601:CSSD Reconfiguration complete. Active nodes are jxsmdb2 .

2014-10-03 09:08:44.541

[evmd(99266)]CRS-1401:EVMD started on node jxsmdb2.

2014-10-03 09:08:44.585

[crsd(103266)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870336 to 169870336. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-03 09:08:44.586

[crsd(103266)]CRS-1012:The OCR service started on node jxsmdb2.

2014-10-03 09:08:46.874

[crsd(103266)]CRS-1201:CRSD started on node jxsmdb2.

2014-10-03 09:08:47.163

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

2014-10-03 09:08:47.183

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

2014-10-03 09:09:43.287

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

2014-10-03 09:09:43.297

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

2014-10-03 09:09:45.746

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

查看crsd.log

 

2014-10-03 09:05:19.356: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

 

2014-10-03 09:05:19.357: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

2014-10-03 09:05:20.702: [ COMMCRS][261]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))

 

2014-10-03 09:05:20.702: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

 

2014-10-03 09:05:20.702: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

2014-10-03 09:05:22.041: [ COMMCRS][263]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))

 

2014-10-03 09:05:22.041: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

 

2014-10-03 09:05:22.041: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

2014-10-03 09:05:23.380: [ COMMCRS][265]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))

 

2014-10-03 09:05:23.380: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

 

2014-10-03 09:05:23.380: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

2014-10-03 09:08:44.482: [  CLSVER][1]32Active Version from OCR:10.2.0.4.0

2014-10-03 09:08:44.482: [  CLSVER][1]32Active Version and Software Version are same

2014-10-03 09:08:44.485: [ CRSMAIN][1]32Initializing OCR

2014-10-03 09:08:44.491: [  OCRRAW][1]proprioo: for disk 0 (/dev/rhdisk2), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)

2014-10-03 09:08:44.491: [  OCRRAW][1]proprioo: for disk 1 (/dev/rhdisk6), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)

2014-10-03 09:08:44.574: [  OCRMAS][3352]th_master:12: I AM THE NEW OCR MASTER at incar 1. Node Number 2

2014-10-03 09:08:44.575: [  OCRRAW][3352]proprioo: for disk 0 (/dev/rhdisk2), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)

2014-10-03 09:08:44.575: [  OCRRAW][3352]proprioo: for disk 1 (/dev/rhdisk6), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)

2014-10-03 09:08:44.596: [  OCRMAS][3352]th_master: Deleted ver keys from cache (master)

2014-10-03 09:08:44.596: [    CRSD][1]32ENV Logging level for Module: allcomp  0

2014-10-03 09:08:44.597: [    CRSD][1]32ENV Logging level for Module: default  0

2014-10-03 09:08:44.598: [    CRSD][1]32ENV Logging level for Module: COMMCRS  0

2014-10-03 09:08:44.598: [    CRSD][1]32ENV Logging level for Module: COMMNS  0

2014-10-03 09:08:44.599: [    CRSD][1]32ENV Logging level for Module: CRSUI  0

2014-10-03 09:08:44.600: [    CRSD][1]32ENV Logging level for Module: CRSCOMM  0

2014-10-03 09:08:44.600: [    CRSD][1]32ENV Logging level for Module: CRSRTI  0

2014-10-03 09:08:44.601: [    CRSD][1]32ENV Logging level for Module: CRSMAIN  0

2014-10-03 09:08:44.602: [    CRSD][1]32ENV Logging level for Module: CRSPLACE  0

2014-10-03 09:08:44.603: [    CRSD][1]32ENV Logging level for Module: CRSAPP  0

2014-10-03 09:08:44.603: [    CRSD][1]32ENV Logging level for Module: CRSRES  0

2014-10-03 09:08:44.604: [    CRSD][1]32ENV Logging level for Module: CRSOCR  0

2014-10-03 09:08:44.605: [    CRSD][1]32ENV Logging level for Module: CRSTIMER  0

2014-10-03 09:08:44.605: [    CRSD][1]32ENV Logging level for Module: CRSEVT  0

2014-10-03 09:08:44.606: [    CRSD][1]32ENV Logging level for Module: CRSD  0

2014-10-03 09:08:44.607: [    CRSD][1]32ENV Logging level for Module: CLUCLS  0

2014-10-03 09:08:44.607: [    CRSD][1]32ENV Logging level for Module: CLSVER  0

2014-10-03 09:08:44.608: [    CRSD][1]32ENV Logging level for Module: OCRRAW  0

2014-10-03 09:08:44.609: [    CRSD][1]32ENV Logging level for Module: OCROSD  0

2014-10-03 09:08:44.609: [    CRSD][1]32ENV Logging level for Module: CSSCLNT  0

2014-10-03 09:08:44.610: [    CRSD][1]32ENV Logging level for Module: OCRAPI  0

2014-10-03 09:08:44.611: [    CRSD][1]32ENV Logging level for Module: OCRUTL  0

2014-10-03 09:08:44.612: [    CRSD][1]32ENV Logging level for Module: OCRMSG  0

2014-10-03 09:08:44.612: [    CRSD][1]32ENV Logging level for Module: OCRCLI  0

2014-10-03 09:08:44.613: [    CRSD][1]32ENV Logging level for Module: OCRCAC  0

2014-10-03 09:08:44.614: [    CRSD][1]32ENV Logging level for Module: OCRSRV  0

2014-10-03 09:08:44.614: [    CRSD][1]32ENV Logging level for Module: OCRMAS  0

2014-10-03 09:08:44.615: [ CRSMAIN][1]32Filename is /oracle/product/10.2.0/crs/crs/init/jxsmdb2.pid

2014-10-03 09:08:44.651: [ CRSMAIN][1]32Using Authorizer location: /oracle/product/10.2.0/crs/crs/auth/

[  clsdmt][8235]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=jxsmdb2DBG_CRSD))

2014-10-03 09:08:44.667: [ CRSMAIN][1]32Initializing RTI

2014-10-03 09:08:44.719: [CRSTIMER][8749]32Timer Thread Starting.

2014-10-03 09:08:44.740: [  CRSRES][1]32Parameter SECURITY = 1, running in USER Mode

2014-10-03 09:08:44.743: [ CRSMAIN][1]32Initializing EVMMgr

2014-10-03 09:08:44.942: [ COMMCRS][9006]clsc_connect: (1139c41d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

 

2014-10-03 09:08:46.745: [ CRSMAIN][1]32CRSD locked during state recovery, please wait.

2014-10-03 09:08:46.824: [ CRSMAIN][1]32CRSD recovered, unlocked.

2014-10-03 09:08:46.847: [ CRSMAIN][1]32QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))

2014-10-03 09:08:46.847: [ CRSMAIN][1]32QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))

2014-10-03 09:08:46.855: [ CRSMAIN][1]32CRSD UI socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))

2014-10-03 09:08:46.873: [ CRSMAIN][1]32E2E socket on: (ADDRESS=(PROTOCOL=tcp)(HOST=jxsmdb2_priv)(PORT=49896))

2014-10-03 09:08:46.873: [ CRSMAIN][1]32Starting Threads

2014-10-03 09:08:46.874: [ CRSMAIN][10292]32Starting runCommandServer for (UI = 1, E2E = 0). 0

2014-10-03 09:08:46.874: [ CRSMAIN][10549]32Starting runCommandServer for (UI = 1, E2E = 0). 1

2014-10-03 09:08:46.874: [ CRSMAIN][1]32CRS Daemon Started.

2014-10-03 09:08:46.888: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.901: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.911: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.925: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.934: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.942: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.950: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.958: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.966: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.974: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.983: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.991: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.999: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:47.173: [  CRSRES][11834]32startRunnable: setting CLI values

2014-10-03 09:08:47.188: [  CRSRES][11834]32Attempting to start `ora.jxsmdb2.vip` on member `jxsmdb2`

2014-10-03 09:08:47.189: [  CRSRES][11577]32startRunnable: setting CLI values

2014-10-03 09:08:47.199: [  CRSRES][11577]32Attempting to start `ora.jxsmdb2.ASM2.asm` on member `jxsmdb2`

2014-10-03 09:08:49.742: [  CRSRES][11834]32Start of `ora.jxsmdb2.vip` on member `jxsmdb2` succeeded.

2014-10-03 09:08:49.775: [  CRSRES][11834]32startRunnable: setting CLI values

2014-10-03 09:08:49.783: [  CRSRES][11834]32Attempting to start `ora.jxsmdb2.LISTENER_JXSMDB2.lsnr` on member `jxsmdb2`

2014-10-03 09:08:53.948: [  CRSRES][11834]32Start of `ora.jxsmdb2.LISTENER_JXSMDB2.lsnr` on member `jxsmdb2` succeeded.

2014-10-03 09:08:54.410: [  CRSRES][12619]32CRS-1002: Resource 'ora.jxsmdb2.LISTENER_JXSMDB2.lsnr' is already running on member 'jxsmdb2'

 

2014-10-03 09:09:08.992: [  CRSRES][12625]32startRunnable: setting CLI values

2014-10-03 09:09:08.999: [  CRSRES][12625]32Attempting to start `ora.jxsmdb2.ons` on member `jxsmdb2`

2014-10-03 09:09:11.139: [  CRSRES][12625]32Start of `ora.jxsmdb2.ons` on member `jxsmdb2` succeeded.

2014-10-03 09:09:11.216: [  CRSRES][11577]32Start of `ora.jxsmdb2.ASM2.asm` on member `jxsmdb2` succeeded.

2014-10-03 09:09:11.239: [  CRSRES][11577]32startRunnable: setting CLI values

2014-10-03 09:09:11.244: [  CRSRES][11577]32Attempting to start `ora.jxsmk.jxsmk2.inst` on member `jxsmdb2`

2014-10-03 09:09:43.269: [  CRSRES][11577]32Start of `ora.jxsmk.jxsmk2.inst` on member `jxsmdb2` succeeded.

2014-10-03 09:09:43.277: [  CRSRES][12894]32Skip online resource: ora.jxsmdb2.ons

2014-10-03 09:09:43.319: [  CRSRES][13151]32startRunnable: setting CLI values

2014-10-03 09:09:43.345: [  CRSRES][12637]32startRunnable: setting CLI values

2014-10-03 09:09:43.349: [  CRSRES][13151]32Attempting to start `ora.jxsmk.db` on member `jxsmdb2`

2014-10-03 09:09:43.358: [  CRSRES][11610]32startRunnable: setting CLI values

2014-10-03 09:09:43.365: [  CRSRES][12637]32Attempting to start `ora.jxsmdb2.gsd` on member `jxsmdb2`

2014-10-03 09:09:43.371: [  CRSRES][11610]32Attempting to start `ora.jxsmdb1.vip` on member `jxsmdb2`

2014-10-03 09:09:43.916: [  CRSRES][13151]32Start of `ora.jxsmk.db` on member `jxsmdb2` succeeded.

2014-10-03 09:09:44.378: [  CRSRES][12637]32Start of `ora.jxsmdb2.gsd` on member `jxsmdb2` succeeded.

2014-10-03 09:09:44.416: [  CRSRES][13668]32CRS-1002: Resource 'ora.jxsmk.db' is already running on member 'jxsmdb2'

 

2014-10-03 09:09:45.730: [  CRSRES][11610]32Start of `ora.jxsmdb1.vip` on member `jxsmdb2` succeeded.

查看ocssd.log

jxsmdb2->cd cssd

jxsmdb2->tail -f ocssd.log

[    CSSD]2014-10-03 09:05:23.603 [1] >TRACE:   clssnmFatalInit: fatal mode enabled

[    CSSD]2014-10-03 09:05:23.692 [2829] >TRACE:   clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=jxsmdb2_priv)(PORT=49895))

 

[    CSSD]2014-10-03 09:05:23.699 [2829] >TRACE:   clssnmconnect: connecting to node(1), con(1112d8b10), flags 0x0003

[    CSSD]2014-10-03 09:05:23.700 [2829] >TRACE:   clssnmDiscHelper: jxsmdb1, node(1) connection failed, con (1112d8b10), probe(0)

[    CSSD]2014-10-03 09:05:23.741 [3086] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_2))

[    CSSD]2014-10-03 09:05:23.741 [3086] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))

[    CSSD]2014-10-03 09:05:23.752 [3857] >TRACE:   clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=25)(HOST=191.191.191.101)(PORT=32823))

[    CSSD]2014-10-03 09:05:23.804 [1544] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(3) wrtcnt(78639) LATS(190241056) Disk lastSeqNo(78639)

[    CSSD]2014-10-03 09:05:30.781 [4628] >TRACE:   clssnmRcfgMgrThread: Local Join

[    CSSD]2014-10-03 09:08:44.082 [4628] >WARNING: clssnmLocalJoinEvent: takeover succ

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmDoSyncUpdate: Initiating sync 1

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (27000)ms

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSendSync: syncSeqNo(1)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2014-10-03 09:08:44.082 [2829] >TRACE:   clssnmHandleSync: diskTimeout set to (27000)ms

[    CSSD]2014-10-03 09:08:44.082 [2829] >TRACE:   clssnmHandleSync: Acknowledging sync: src[2] srcName[jxsmdb2] seq[1] sync[1]

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmDoSyncUpdate: node(2) is transitioning from joining state to active state

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)

[    CSSD]2014-10-03 09:08:44.082 [2829] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(1)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitForAcks: done, msg type(13)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmCheckDskInfo: diskTimeout set to (200000)ms

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmEvict: Start

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSendUpdate: syncSeqNo(1)

[    CSSD]2014-10-03 09:08:44.083 [4628] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)

[    CSSD]2014-10-03 09:08:44.083 [2829] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2014-10-03 09:08:44.083 [2829] >TRACE:   clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2014-10-03 09:08:44.083 [2829] >TRACE:   clssnmUpdateNodeState: node 2, state (2/3) unique (1412298321/1412298321) prevConuni(0) birth (1/1) (old/new)

[    CSSD]2014-10-03 09:08:44.083 [2829] >USER:    clssnmHandleUpdate: SYNC(1) from node(2) completed

[    CSSD]2014-10-03 09:08:44.083 [2829] >USER:    clssnmHandleUpdate: NODE 2 (jxsmdb2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2014-10-03 09:08:44.083 [2829] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2014-10-03 09:08:44.083 [4628] >TRACE:   clssnmWaitForAcks: done, msg type(15)

[    CSSD]2014-10-03 09:08:44.083 [4628] >TRACE:   clssnmDoSyncUpdate: Sync 1 complete!

[    CSSD]2014-10-03 09:08:44.101 [1] >USER:    NMEVENT_SUSPEND [00][00][00][00]

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmReconfigThread:  started for reconfig (1)

[    CSSD]2014-10-03 09:08:44.105 [4885] >USER:    NMEVENT_RECONFIG [00][00][00][04]

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 1

[    CSSD]2014-10-03 09:08:44.105 [3857] >TRACE:   clssgmPeerListener: connects done (1/1)

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmEstablishMasterNode: MASTER for 1 is node(2) birth(1)

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmChangeMasterNode: requeued 0 RPCs

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes

 

[    CSSD]CLSS-3001: local node number 2, master node number 2

 

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmReconfigThread:  completed for reconfig(1), with status(1)

[    CSSD]2014-10-03 09:08:44.266 [3086] >TRACE:   clssgmCommonAddMember: clsomon joined (2/0x1000000/#CSS_CLSSOMON

 

 

查看crs 服务

crs_stat -t

Name           Type           Target    State     Host       

------------------------------------------------------------

ora....SM1.asm application    ONLINE    OFFLINE               

ora....B1.lsnr application    ONLINE    OFFLINE              

ora....db1.gsd application    ONLINE    OFFLINE              

ora....db1.ons application    ONLINE    OFFLINE              

ora....db1.vip application    ONLINE    ONLINE    jxsmdb2    

ora....SM2.asm application    ONLINE    ONLINE    jxsmdb2    

ora....B2.lsnr application    ONLINE    ONLINE    jxsmdb2    

ora....db2.gsd application    ONLINE    ONLINE    jxsmdb2    

ora....db2.ons application    ONLINE    ONLINE    jxsmdb2    

ora....db2.vip application    ONLINE    ONLINE    jxsmdb2    

ora.jxsmk.db   application    ONLINE    ONLINE    jxsmdb2    

ora....k1.inst application    ONLINE    OFFLINE              

ora....k2.inst application    ONLINE    ONLINE    jxsmdb2 

数据库终于在2号节点起来了

现在想想最开始的mos说的bug的原因估计是 ocr无法访问,导致的。Mos上说的打pach 应该是在磁盘,硬件,系统都没啥问题的情形下。

启动1号机器crs

 

->tail -f al*

[cssd(143604)]CRS-1605:CSSD voting file is online: /dev/rhdisk5. Details in /oracle/product/10.2.0/crs/log/jxsmdb1/cssd/ocssd.log.

[cssd(143604)]CRS-1601:CSSD Reconfiguration complete. Active nodes are jxsmdb1 jxsmdb2 .

2014-09-30 17:39:11.803

[crsd(139286)]CRS-1012:The OCR service started on node jxsmdb1.

2014-09-30 17:39:12.848

[evmd(151694)]CRS-1401:EVMD started on node jxsmdb1.

2014-09-30 17:39:15.807

[crsd(139286)]CRS-1201:CRSD started on node jxsmdb1.

2014-10-02 00:05:49.042

[crsd(159746)]CRS-1011:OCR cannot determine that the OCR content contains the latest updates. Details in /oracle/product/10.2.0/crs/log/jxsmdb1/crsd/crsd.log.

Terminated
可以看到crs起来了


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/26175573/viewspace-1290649/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/26175573/viewspace-1290649/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值