used by clustered environments to evict nonresponsive or malfunctioning hosts from the cluster.
Allowing affected nodes to remain in the cluster increases the probability of data corruption due to unsynchronized database writes.
Traditionally, Oracle Clusterware uses a STONITH (Shoot The Other Node In The Head)
comparable fencing algorithm to ensure data integr ity in cases, in which cluster integrity isendangered and split-brain scenarios need to be prevented. For Oracle Clusterware this means
that a local process enforces the removal of one or more nodes from the cluster (fencing). This
approach traditionally involved a forced “fast” reboot of the offending node. A fast reboot is a
shutdown and restart procedure that does not wait for any I/O to finish or for file systems to
synchronize on shutdown. Starting with Oracle Clusterware 11 g Release 2 (11.2.0.2), this
mechanism has been changed to prevent such a reboot as much as possible by introducing
rebootless node fencing.
Now, when a decision is made to evict a node from the cluster, Oracle Clusterware will first
attempt to shut down all resources on the machine that was chosen to be the subject of an
eviction. Specifically, I/O generating processes ar e killed and Oracle Clusterware ensures that
those processes are completely st pped before continuing . If all resources can be stopped and all I/O generating processes can be killed, Oracle Clusterware
will shut itself down on the respective node, but will attempt to restart after the stack has been stopped.If, for some reason, not all resources can be stopped or I/O generating processes cannot be stopped completely, Oracle Clusterware will still perform a reboot
STONITH:先尝试关闭集群,如果遇到异常无法关闭集群资源时,oracle crs node fencing 机制就会升级:直接节点重启;新算法的核心是reboot less;
个人认为在出现集群节点异常时,直接重启节点,可能会存在数据丢失,数据一致性被破坏,在对oracle cssdmonitor 跟踪分析时发现 节点的重启是通过 /proc/sysrq-trigger来完成的;
SysRq 经常被称为 Magic System Request,它被定义为一系列按键组合。之所以说它神奇,是因为它在系统挂起,大多数服务已无法响应的情况下,还能通过按键组合来完成一系列预先定义的系统操作。通过它,不但可以在保证磁盘数据安全的情况下重启一台挂起的服务器,避免数据丢失和重启后长时间的文件系统检查,还可以收集包括系统内存使用,CPU 任务处理,进程运行状态等系统运行信息,甚至还可能在无需重启的情况下挽回一台已经停止响应的服务器。
但是具体据不知道oracle crs 是通过那个参数来关机的啦
B - 立即重启系统
SysRq: Resetting
该操作会立即重启系统,比想象中要快。