11.2.0.2 后crs 新节点隔离机制(IO fencing)

An important service provided by Oracle Clusterware is node fencing. Node fencing is a technique 
used by clustered environments to evict nonresponsive or malfunctioning hosts from the cluster. 
Allowing affected nodes to remain in the cluster increases the probability of data corruption due to unsynchronized database writes.


Traditionally, Oracle Clusterware uses a STONITH (Shoot The Other Node In The Head) 

comparable fencing algorithm to ensure data integr ity in cases, in which cluster integrity is 
endangered and split-brain scenarios need to be prevented. For Oracle Clusterware this means 
that a local process enforces the removal of one or more nodes from the cluster (fencing). This 
approach traditionally involved a forced “fast” reboot of the offending node. A fast reboot is a 
shutdown and restart procedure that does not wait for any I/O to finish or for file systems to 
synchronize on shutdown. Starting with Oracle Clusterware 11 g Release 2 (11.2.0.2), this 
mechanism has been changed to prevent such a reboot as much as possible by introducing 
rebootless node fencing. 

Now, when a decision is made to evict a node from the cluster, Oracle Clusterware will first 
attempt to shut down all resources on the machine that was chosen to be the subject of an 
eviction. Specifically, I/O generating processes ar e killed and Oracle Clusterware ensures that 

those  processes are completely st pped before continuing .  If all resources can be stopped and all I/O generating processes can be killed, Oracle Clusterware 

will shut itself down on the respective node, but will attempt to restart after the stack has been stopped.

If, for some reason, not all resources can be stopped or I/O generating processes cannot be stopped completely, Oracle Clusterware will still perform a reboot


STONITH:先尝试关闭集群,如果遇到异常无法关闭集群资源时,oracle crs node fencing 机制就会升级:直接节点重启;新算法的核心是reboot less;

个人认为在出现集群节点异常时,直接重启节点,可能会存在数据丢失,数据一致性被破坏,在对oracle cssdmonitor 跟踪分析时发现 节点的重启是通过 /proc/sysrq-trigger来完成的;

SysRq 经常被称为 Magic System Request,它被定义为一系列按键组合。之所以说它神奇,是因为它在系统挂起,大多数服务已无法响应的情况下,还能通过按键组合来完成一系列预先定义的系统操作。通过它,不但可以在保证磁盘数据安全的情况下重启一台挂起的服务器,避免数据丢失和重启后长时间的文件系统检查,还可以收集包括系统内存使用,CPU 任务处理,进程运行状态等系统运行信息,甚至还可能在无需重启的情况下挽回一台已经停止响应的服务器。

但是具体据不知道oracle crs 是通过那个参数来关机的啦

B - 立即重启系统

SysRq: Resetting

该操作会立即重启系统,比想象中要快。



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值