oracle数据库关掉一个node,如何追查人为关闭Oracle数据库

1. 故障介绍

这几天遇到一个状况,数据库莫名其妙的down掉。然后客户让我们找出原因。其实数据库down掉的原因有好多种,故障或者是人为。而我这个情况是人为造成的。既然是人为造成的,我们如何得知呢?

2. 故障重现

首先我们来思考一下,在RAC数据库中,直接发起一条shutdown abort的命令会记录什么?我们首先看一下alert日志。从这里我们可以看到一段话shutting down instance(abort),并且记录了用户的pid是9806。

Thu Jun 06 20:48:29 CST 2013

Shutting down instance (abort)

License high water mark = 12

Instance terminated by USER, pid = 9806

下面是crsd的记录。

2013-06-06 20:48:31.204: [ CRSRES][1497741632]0ora.racdb.racdb1.inst target set to OFFLINE before stop action

2013-06-06 20:48:31.204: [ CRSRES][1497741632]0StopResource: setting CLI values

2013-06-06 20:48:31.216: [ CRSRES][1497741632]0Attempting to stop `ora.racdb.racdb1.inst` on member `rac1`

2013-06-06 20:48:31.555: [ CRSRES][1497741632]0Stop of `ora.racdb.racdb1.inst` on member `rac1` succeeded.

如果我们用srvctl stop instance -d racdb -i racdb1 -o abort命令来停数据库的话。会出现如下结果

后台alert日志

Thu Jun 06 21:06:09 CST 2013

Shutting down instance (abort)

License high water mark = 5

Instance terminatedby USER, pid = 17446

crsd日志如下

2013-06-06 21:06:08.845: [ CRSRES][1497741632]0ora.racdb.racdb1.inst target set to OFFLINE before stop action

2013-06-06 21:06:08.845: [ CRSRES][1497741632]0StopResource: setting CLI values

2013-06-06 21:06:08.862: [ CRSRES][1497741632]0Attempting to stop `ora.racdb.racdb1.inst` on member `rac1`

2013-06-06 21:06:11.900: [ CRSRES][1497741632]0Stop of `ora.racdb.racdb1.inst` on member `rac1` succeeded.

从这儿可以看出手工shutdown abort和通过srvctl shutdown abort是一样的。那么在来看看kill -9 smon会怎么样。

Thu Jun 06 21:16:19 CST 2013

Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_pmon_21008.trc:

ORA-00474: SMON process terminated with error

Thu Jun 06 21:16:19 CST 2013

PMON: terminating instance due to error 474

Thu Jun 06 21:16:19 CST 2013

System state dump is made for local instance

System State dumped to trace file /oracle/app/oracle/admin/racdb/bdump/racdb1_diag_21010.trc

Thu Jun 06 21:16:19 CST 2013

Trace dumping is performing id=[cdmp_20130606211619]

Thu Jun 06 21:16:20 CST 2013

Instance terminated by PMON, pid = 21008

crsd日志如下

2013-06-06 21:16:19.286: [ CRSRES][1497741632]0In stateChanged, ora.racdb.racdb1.inst target is ONLINE

2013-06-06 21:16:19.286: [ CRSRES][1497741632]0ora.racdb.racdb1.inst on rac1 went OFFLINE unexpectedly

2013-06-06 21:16:19.286: [ CRSRES][1497741632]0StopResource: setting CLI values

2013-06-06 21:16:20.523: [ CRSRES][1497741632]0Attempting to stop `ora.racdb.racdb1.inst` on member `rac1`

2013-06-06 21:16:24.015: [ CRSRES][1497741632]0Stop of `ora.racdb.racdb1.inst` on member `rac1` succeeded.

2013-06-06 21:16:24.016: [ CRSRES][1497741632]0ora.racdb.racdb1.inst RESTART_COUNT=0 RESTART_ATTEMPTS=5

2013-06-06 21:16:24.016: [ CRSRES][1497741632]0Restarting ora.racdb.racdb1.inst on rac1

2013-06-06 21:16:24.024: [ CRSRES][1497741632]0startRunnable: setting CLI values

2013-06-06 21:16:24.024: [ CRSRES][1497741632]0Attempting to start `ora.racdb.racdb1.inst` on member `rac1`

2013-06-06 21:16:43.749: [ CRSRES][1497741632]0Start of `ora.racdb.racdb1.inst` on member `rac1` succeeded.

2013-06-06 21:16:43.749: [ CRSRES][1497741632]0Successfully restarted ora.racdb.racdb1.inst on rac1, RESTART_COUNT=1

2013-06-06 21:16:43.792: [ CRSRES][1497741632]0ora.racdb.racdb1.inst Updated LAST_RESTART time in ocr

这里来总结一下,只要是手工执行shutdown abort或者是用srvctl stop instance来执行shutdown的话,都会在alert日志中记录shutdown by user,还会记录会话的session id。而Kill -9则不会记录,它只会告诉你PMON异常中断。而且CRS会立马将数据库重新拉起来。

3. 追查

正是因为alert日志中记录了session id,我们可以审计也就是adump下的文件来查看。在Oracle 10g和Oracle 11g这项功能默认是开启的。因此,我们到adump目录下检索一番就能够得到我们想要的信息。比如第一次shutdown abort的pid是9806。

[oracle@rac1 adump]$ more racdb1_ora_9806_1.aud

Audit file /oracle/app/oracle/admin/racdb/adump/racdb1_ora_9806_1.aud

Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options

ORACLE_HOME = /oracle/app/oracle/product/10.2.0/db_1

System name: Linux

Node name: rac1.localdomain

Release: 2.6.18-164.el5

Version: #1 SMP Thu Sep 3 04:15:13 EDT 2009

Machine: x86_64

Instance name: racdb1

Redo thread mounted by this instance: 1

Oracle process number: 27

Unix process pid: 9806, image: oracle@rac1.localdomain (TNS V1-V3)

Thu Jun 6 20:47:39 2013

LENGTH : '158'

ACTION :[7] 'CONNECT'

DATABASE USER:[1] '/'

PRIVILEGE :[6] 'SYSDBA'

CLIENT USER:[6] 'oracle'

CLIENT TERMINAL:[5] 'pts/1'

STATUS:[1] '0'

DBID:[9] '782449062'

Thu Jun 6 20:48:29 2013

LENGTH : '159'

ACTION :[8] 'SHUTDOWN'

DATABASE USER:[1] '/'

PRIVILEGE :[6] 'SYSDBA'

CLIENT USER:[6] 'oracle'

CLIENT TERMINAL:[5] 'pts/1'

STATUS:[1] '0'

DBID:[9] '782449062'

这里我们通过审计文件看到了action是shutdown。是由pst/1这个终端执行的。那么怎么查看pst/1是从哪儿来的呢?我们可以用w命令来显示。

[oracle@rac1 adump]$ w

21:33:02 up 58 min, 4 users, load average: 0.27, 0.31, 0.31

USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT

root :0 - 20:37 ?xdm? 1:55 0.04s /usr/bin/gnome-session

root pts/1 192.168.56.1 20:38 0.00s 0.13s 0.00s w

root pts/2 192.168.56.1 20:47 22:56 0.12s 0.04s tail -200f crsd.log

root pts/3 192.168.56.1 21:15 17:03 0.04s 0.04s tail -200f alert_racdb1.log

这里显示了IP地址。正常情况下,都是从某个主机telnet过来的。也可以通过history命令来查看历史的操作记录。

分享到:

更多

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值