【RAC】因系统时间设置不当,造成RAC一节点集群资源及数据库关闭



        下午接到一个同事电话,说一体机(RAC)第二个节点数据库连不上了,让我帮忙看看。我便登上系统,在第一个节点查看信息,如下

点击(此处)折叠或打开

  1. [grid@pwjkdb01 ~]$ crs_stat -t
  2. Name Type Target State Host
  3. ------------------------------------------------------------
  4. ora....PWJK.dg ora....up.type ONLINE ONLINE pwjkdb01
  5. ora.DBFS_DG.dg ora....up.type ONLINE ONLINE pwjkdb01
  6. ora....ER.lsnr ora....er.type ONLINE ONLINE pwjkdb01
  7. ora....N1.lsnr ora....er.type ONLINE OFFLINE
  8. ora....PWJK.dg ora....up.type ONLINE ONLINE pwjkdb01
  9. ora.asm ora.asm.type ONLINE ONLINE pwjkdb01
  10. ora.cvu ora.cvu.type ONLINE OFFLINE
  11. ora.gsd ora.gsd.type OFFLINE OFFLINE
  12. ora....network ora....rk.type ONLINE ONLINE pwjkdb01
  13. ora.oc4j ora.oc4j.type ONLINE OFFLINE
  14. ora.ons ora.ons.type ONLINE ONLINE pwjkdb01
  15. ora.pwdata.db ora....se.type ONLINE ONLINE pwjkdb01
  16. ora....rv1.svc ora....ce.type ONLINE ONLINE pwjkdb01
  17. ora....rv2.svc ora....ce.type ONLINE ONLINE pwjkdb01
  18. ora....SM1.asm application ONLINE ONLINE pwjkdb01
  19. ora....01.lsnr application ONLINE ONLINE pwjkdb01
  20. ora....b01.gsd application OFFLINE OFFLINE
  21. ora....b01.ons application ONLINE ONLINE pwjkdb01
  22. ora....b01.vip ora....t1.type ONLINE ONLINE pwjkdb01
  23. ora....b02.vip ora....t1.type ONLINE OFFLINE
  24. ora.scan1.vip ora....ip.type ONLINE OFFLINE
登录第二个节点,查看部分信息


点击(此处)折叠或打开

  1. [root@pwjkdb02 ~]# ps -ef |grep pmon
  2. root 6679 1 0 2015 ? 00:01:42 /usr/bin/perl -w /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -server
  3. root 63055 62491 0 16:45 pts/1 00:00:00 grep pmon

由于业务系统关系,我查看了下系统时间、运行时间,便尝试启动第二个节点集群资源


点击(此处)折叠或打开

  1. [root@pwjkdb02 bin]# ./crsctl start crs
  2. CRS-4640: Oracle High Availability Services is already active
  3. CRS-4000: Command Start failed, or completed with errors.
  4. [root@pwjkdb02 bin]# ./crsctl start cluster
  5. CRS-2672: Attempting to start 'ora.asm' on 'pwjkdb02'
  6. CRS-2676: Start of 'ora.asm' on 'pwjkdb02' succeeded
  7. CRS-2672: Attempting to start 'ora.crsd' on 'pwjkdb02'
  8. CRS-2676: Start of 'ora.crsd' on 'pwjkdb02' succeeded
  9. [root@pwjkdb02 bin]#

节点二启动正常,如下

点击(此处)折叠或打开

  1. [grid@pwjkdb02 ~]$ crs_stat -t
  2. Name Type Target State Host
  3. ------------------------------------------------------------
  4. ora....PWJK.dg ora....up.type ONLINE ONLINE pwjkdb01
  5. ora.DBFS_DG.dg ora....up.type ONLINE ONLINE pwjkdb01
  6. ora....ER.lsnr ora....er.type ONLINE ONLINE pwjkdb01
  7. ora....N1.lsnr ora....er.type ONLINE ONLINE pwjkdb02
  8. ora....PWJK.dg ora....up.type ONLINE ONLINE pwjkdb01
  9. ora.asm ora.asm.type ONLINE ONLINE pwjkdb01
  10. ora.cvu ora.cvu.type ONLINE ONLINE pwjkdb02
  11. ora.gsd ora.gsd.type OFFLINE OFFLINE
  12. ora....network ora....rk.type ONLINE ONLINE pwjkdb01
  13. ora.oc4j ora.oc4j.type ONLINE ONLINE pwjkdb02
  14. ora.ons ora.ons.type ONLINE ONLINE pwjkdb01
  15. ora.pwdata.db ora....se.type ONLINE ONLINE pwjkdb01
  16. ora....rv1.svc ora....ce.type ONLINE ONLINE pwjkdb01
  17. ora....rv2.svc ora....ce.type ONLINE ONLINE pwjkdb01
  18. ora....SM1.asm application ONLINE ONLINE pwjkdb01
  19. ora....01.lsnr application ONLINE ONLINE pwjkdb01
  20. ora....b01.gsd application OFFLINE OFFLINE
  21. ora....b01.ons application ONLINE ONLINE pwjkdb01
  22. ora....b01.vip ora....t1.type ONLINE ONLINE pwjkdb01
  23. ora....SM2.asm application ONLINE ONLINE pwjkdb02
  24. ora....02.lsnr application ONLINE ONLINE pwjkdb02
  25. ora....b02.gsd application OFFLINE OFFLINE
  26. ora....b02.ons application ONLINE ONLINE pwjkdb02
  27. ora....b02.vip ora....t1.type ONLINE ONLINE pwjkdb02
  28. ora.scan1.vip ora....ip.type ONLINE ONLINE pwjkdb02

启动后,查看部分日志

数据库日志:

点击(此处)折叠或打开

  1. tail -100f alertpwjkdb02.log

  2. 4016-01-02 16:26:00.736
  3. [/u01/app/11.2.0.3/grid/bin/oraagent.bin(48051)]CRS-5011:Check of resource "pwdata" failed: details at "(:CLSN00007:)" in "/u01/app/11.2.0.3/grid/log/pwjkdb02/agent/crsd/oraagent_oracle/oraagent_oracle.log"
  4. 4016-01-02 16:26:01.262
  5. [crsd(9981)]CRS-2765:Resource 'ora.pwdata.db' has failed on server 'pwjkdb02'.
  6. 4016-01-02 16:26:01.329
  7. [crsd(9981)]CRS-2765:Resource 'ora.pwdata.pwdatasrv2.svc' has failed on server 'pwjkdb02'.
  8. 4016-01-02 16:26:01.329
  9. [crsd(9981)]CRS-2771:Maximum restart attempts reached for resource 'ora.pwdata.pwdatasrv2.svc'; will not restart.
  10. 4016-01-02 16:26:01.510
  11. [/u01/app/11.2.0.3/grid/bin/oraagent.bin(8988)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0.3/grid/log/pwjkdb02/agent/ohasd/oraagent_grid/oraagent_grid.log"
  12. 4016-01-02 16:26:01.614
  13. [ohasd(6722)]CRS-2765:Resource 'ora.asm' has failed on server 'pwjkdb02'.
  14. 4016-01-02 16:26:01.663
  15. [/u01/app/11.2.0.3/grid/bin/oraagent.bin(8988)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0.3/grid/log/pwjkdb02/agent/ohasd/oraagent_grid/oraagent_grid.log"
  16. ……………………………………
ASM实例警告日志,如下所示,报错为系统当前系统时间问题


点击(此处)折叠或打开

  1. tail -600f alert_+ASM2.log |more

  2. Warning: VKTM detected a time drift.
    Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
    Tue Jan 12 00:39:25 2016
    Warning: VKTM detected a time drift.
    Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
    Sat Jan 02 16:26:00 4016
    Warning: VKTM detected a time drift.
    Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
    Sat Jan 02 16:26:00 4016
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_pmon_9913.trc:
    ORA-01513: invalid current time returned by operating system
    PMON (ospid: 9913): terminating the instance due to error 1513
    Sat Jan 02 16:26:01 4016
    System state dump requested by (instance=2, osid=9913 (PMON)), summary=[abnormal instance termination].
    System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_9923.trc
    Dumping diagnostic data in directory=[cdmp_19740622152201], requested by (instance=2, osid=9913 (PMON)), summary=[abnormal instance 
    termination].
    Sat Jan 02 16:26:01 4016
    ORA-1092 : opitsk aborting process
    Sat Jan 02 16:26:01 4016
    License high water mark = 24
    Instance terminated by PMON, pid = 9913
    USER (ospid: 49268): terminating the instance


查看该节点操作系统操作历史记录,

点击(此处)折叠或打开

  1. vi .bash_profile

  2. su - oracle
  3. #1454306838
  4. sar 1 5
  5. #1454315175
  6. date
  7. #1454315199
  8. date 010216264016.00
  9. #64565627161
  10. date
  11. #64565627177
  12. date 0102162716.00
  13. #1451723220
  14. date
  15. #1451723223
  16. date
  17. #1451723225
  18. date
  19. #1451723227
  20. date
  21. #1451723233
  22. date
  23. #1451723250
  24. date
  25. #1451723302
  26. date
  27. #1451723310
  28. date
  29. #1451723315
  30. date
  31. #1451723323
  32. su - oracle
  33. #1451723446
  34. date
  35. #1451723534
  36. date 0201163016.00
  37. #1454315401
  38. date
  39. #1454315505
  40. date 0201163416.00
  41. #1454315642
  42. date
  43. #1454315647
  44. date
  45. #1454315648

通过以上我们可以 找到一条记录为: date 010216264016.00,再通过警告日志及查看其它集群日志,可以确认,由于更改操作系统时间造成RAC节点二集群关闭,经过电话沟通,该同事发现系统时间慢5分钟,直接在操作系统上更改(请注意,更改操作系统时间需谨慎,尤其数据库系统运行状态,以免影响业务应用),由于命令不熟,将时间改为4016年,带来以上问题。

任何操作都有风险性,在做操作时,我们应该提前做好规划、操作方案以及应急预案及风险性评估,切不可
想当然对在线系统做任何更改。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29487349/viewspace-1991263/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/29487349/viewspace-1991263/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值