【生产篇】如何防止RAC环境下ASM自动脱落

一、异常报错
最近在安装RAC 12c的过程,配置ASM阶段发生如下错误。
KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
KFOD-00105: Could not open pfile ‘init@.ora’
百思不得其解,经各种查阅资料,终于知道原因

二、 原因分析
设置了RemoveIPC=yes 的RHEL7.2  会crash掉Oracle asm 实例和Oracle database实例,该问题也会在使用Shared Memory Segment (SHM) or Semaphores (SEM)的应用程序中发生。

来源于:
ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (文档 ID 2081410.1)
适用于:
Oracle Database - Standard Edition
Oracle Database - Enterprise Edition
Linux x86-64
Linux x86

在RHEL7.2中,systemd-logind 服务引入了一个新特性,该新特性是:当一个user 完全退出os之后,remove掉所有的IPC objects。

该特性由/etc/systemd/logind.conf参数文件中RemoveIPC选项来控制。详细请看man logind.conf(5)

在RHEL7.2中,RemoveIPC的默认值为yes

因此,当最后一个oracle 或者Grid用户退出时,操作系统会remove 掉这个user的shared memory segments and semaphores
由于Oracle ASM 和database 使用 shared memory segments ,remove shared memory segments将会crash掉Oracle ASM and database  instances.
请参考Redhat bug 1264533  - https://bugzilla.redhat.com/show_bug.cgi?id=1264533
 
OCCURRENCE
该问题影响使用the shared memory segments 和semaphores 的所有应用程序,因此,Oracle ASM 实例和Oracle Database 实例均受到影响。

Oracle Linux 7.2 通过在/etc/systemd/logind.conf配置文件中明确设置RemoveIPC为no,Oracle Linux7.2 避免了该问题,
请注意
若是/etc/systemd/logind.conf文件是在os upgrade之前修改的,那么yum/update将会写一个正确的配置文件(RemoveIPC=no),该配置文件名是logind.conf.rpmnew,如果用户使用原来的配置文件,那么本文描述的failures将会发生。为了避免本问题,当os升级之后,务必编辑logind.conf 文件并设置RemoveIPC=no。

此参数导致的常见异常场景:

  1. Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation.
  2. Upgrading to 11.2 and 12c GI/CRS fails.
  3. After Redhat Linux is upgraded to 7.2, 11.2 and 12c ASM and database instances crash.

systemd-logind remove掉IPC objects可能在任何时候发生,故障的表现可以有很大的不同,下面是故障的几个例子:

  1. 最常见的错误发生在asm or database alert.log告警日志中,示错信息如下
    Most common error that occurs is that the following is found in the asm or database alert.log:
    ORA-27157: OS post/wait facility removed
    ORA-27300: OS system dependent operation:semop failed with status: 43
    ORA-27301: OS failure message: Identifier removed
    ORA-27302: failure occurred at: sskgpwwait1

  2. 在更新GI的asmca时,the following error:
    The second observed error occurs during installation and upgrade when asmca fails with the following error:
    KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
    KFOD-00105: Could not open pfile ‘init@.ora’

3.Grid安装和升级过程中,创建ASM 初始密码时失败
The third observed error occurred during installation and upgrade:
Creation of ASM password file failed. Following error occurred: Error in Process: /u01/app/12.1.0/grid/bin/orapwd
Enter password for SYS:
OPW-00009: Could not establish connection to Automatic Storage Management instance
2019/07/05 21:38:45 CLSRSC-184: Configuration of ASM failed
2019/07/05 21:38:46 CLSRSC-258: Failed to configure and start ASM

  1. ASM被crash后,/var/log/messages中的错误记录信息
    The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed:
    Jul 05 21:38:43 ethanDB kernel: traps: oracle[24861] trap divide error

三、 解决方案:
方法1:
变通的解决方法:

  1. Set RemoveIPC=no in /etc/systemd/logind.conf
  2. Reboot the server or restart systemd-logind as follows:
        # systemctl daemon-reload
        # systemctl restart systemd-logind

方法2
打补丁:
从RHEL7.2迁移到Oracle Linux7.2可以解决本问题;若是迁移到Oracle Linux7.2还不可行,则使用方法1。
 
【结语】
1. RHEL7.2中,systemd-logind 服务引入了一个新特性,该新特性是:当一个user 完全退出os之后,remove掉所有的IPC objects。
2. 该特性由/etc/systemd/logind.conf参数文件中RemoveIPC选项来控制。默认设置了RemoveIPC=yes 的RHEL7.2  会crash掉Oracle asm 实例和Oracle database实例;
3. /etc/systemd/logind.conf文件中(RemoveIPC=yes),systemd-logind remove掉IPC objects可能在任何时候发生;故,RHEL7.2中/etc/systemd/logind.conf 中必须设置Set RemoveIPC=no

【参考文章】
 https://blog.csdn.net/msdnchina/article/details/50864065

如果大家觉得此文有帮助,欢迎关注个人微信公众号;
长按识别二维码或公众号搜索“一森咖记”

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值