RAC 节点死机重启案列

环境

数 据 库:ORACLE RAC  11.2.0.4.0

主机系统:Red Hat Enterprise Linux Server release 6.6(Santiago)

 

集群日志:

2016-11-03 16:49:09.591: [    CSSD][4028430080]clssgmQueueGrockRequest: queued msg from node(1), for operation 4, RPC#1181, generation 47, tag<0x7f55a0047e00>
2016-11-03 16:49:13.204: [    CSSD][3820443392]clssnmSendingThread: sending status msg to all nodes
2016-11-03 16:49:13.204: [    CSSD][3820443392]clssnmSendingThread: sent 5 status msgs to all nodes
2016-11-03 16:49:13.720: [    CSSD][3822020352]clssnmPollingThread: node bjltj1dw02 (2) at 50% heartbeat fatal, removal in 14.610 seconds
2016-11-03 16:49:13.720: [    CSSD][3822020352]clssnmPollingThread: node bjltj1dw02 (2) is impending reconfig, flag 2294796, misstime 15390
2016-11-03 16:49:13.720: [    CSSD][3822020352]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
2016-11-03 16:49:13.720: [    CSSD][4042692352]clssnmvDHBValidateNcopy: node 2, bjltj1dw02, has a disk HB, but no network HB, DHB has rcfg 358519447, wrtcnt, 48653087, LATS 3373852946, lastSeqNo 48652997, uniqueness 1463024655, timestamp 1478162953/3373861536
2016-11-03 16:49:13.720: [    CSSD][4037945088]clssnmvDHBValidateNcopy: node 2, bjltj1dw02, has a disk HB, but no network HB, DHB has rcfg 358519447, wrtcnt, 48653088, LATS 3373852946, lastSeqNo 48652998, uniqueness 1463024655, timestamp 1478162953/3373861596
2016-11-03 16:49:13.750: [    CSSD][4045903616]clssnmvDiskPing: Writing with status 0x3, timestamp 1478162953/3373852976
2016-11-03 16:49:14.190: [    CSSD][4041099008]clssnmvDiskPing: Writing with status 0x3, timestamp 1478162954/3373853416
2016-11-03 16:49:14.221: [    CSSD][4033189632]clssnmvDHBValidateNcopy: node 2, bjltj1dw02, has a disk HB, but no network HB, DHB has rcfg 358519447, wrtcnt, 48653089, LATS 3373853446, lastSeqNo 48652984, uniqueness 1463024655, timestamp 1478162954/3373862336
2016-11-03 16:49:14.250: [    CSSD][4036343552]clssnmvDiskPing: Writing with status 0x3, timestamp 1478162954/3373853476
2016-11-03 16:49:14.251: [    CSSD][4045903616]clssnmvDiskPing: Writing with status 0x3, timestamp 1478162954/3373853476
2016-11-03 16:49:14.691: [    CSSD][4041099008]clssnmvDiskPing: Writing with status 0x3, timestamp 1478162954/3373853916
2016-11-03 16:49:14.720: [    CSSD][4042692352]clssnmvDHBValidateNcopy: node 2, bjltj1dw02, has a disk HB, but no network HB, DHB has rcfg 358519447, wrtcnt, 48653090, LATS 3373853946, lastSeqNo 48653087, uniqueness 1463024655, timestamp 1478162954/3373862536

 检查确定由于私网心跳网络致命导致node2被移除和假死状态。

执行计划

1、停止node1数据库实例;

2、两节点私网更换为专用交换机,检查私网通讯状态,成功后继续下一步,否则检查服务器配置和硬件问题;

3、启动node2集群服务,检查两节点集群服务是否正常:如果正常继续执行下一步,否则放弃此计划;

4、启动两节点数据库实例,检查数据库状态。如果失败,恢复单节点对外提供服务

运维操作步骤

停止node1的数据库服务

$su- oracle

$sqlplus/ as sysdba

SQL>shutdownimmediate

                             

更换交换机

网络工程师接入交换机,系统工程师更换私有网络到专用交换机上,数据库工程师检查私网通讯,确定私有网络无法通讯;

私网故障解决

系统工程师解决私有网络通讯问题,更换光纤模块后发现网卡灯状态异常,向数据库工程师申请停掉集群更换网卡;

停止node1集群服务

[root@bjltj1dw01~]$su - root

[root@bjltj1dw01~]$/u01/app/11.2.0/grid/bin/crsctl stop crs

 

更换网卡

主机工程师更换网卡后,启动node 1,解决私网通讯故障;

启动集群服务

数据库工程师启动node1集群服务

[root@bjltj1dw01~]$su - root

[root@bjltj1dw01~]$/u01/app/11.2.0/grid/bin/crsctlstart crs

检查node1集群服务状态正常,启动node2 集群服务

[root@bjltj1dw02~] su - root

[root@bjltj1dw02~]$/u01/app/11.2.0/grid/bin/crsctlstart crs

检查node1、node2集群服务均正常。

启动两节点数据库实例

[root@bjltj1dw01~]$su - oracle

[oracle@bjltj1dw01~]$srvctl start database -d bjltj1dw

确定两节点集群服务正常,数据库Open。

登录sqlplus检查数据库状态

检查数据库实例状态,数据库运行正常

[oracle@bjltj1dw01~]$sqlplus / as sysdba

SQL>setpages 200 lines 200

SQL>selectinstance_name,status from gv$instance;

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值