ORACLE RAC 11.2.0.4 一节点出现Suspending MMON slave action kewrmrfsa_ for 82800 seconds

今天上午,广州一Oracle数据库 11g rac集群的节点1出现告警提示:Suspending MMON slave action kewrmrfsa_ for 82800 seconds

出现的现象是,普通用户登录系统挂起:

[oracle@gzbmi01 ~]$ sqlplus test/test

SQL*Plus: Release 11.2.0.4.0 Production on Fri May 19 11:18:33 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

从操作系统看,系统的cpu和内存资源都很空闲,但是oracle有比较多的远程访问连接挂起

[oracle@gzbmi01 ~]$ps -ef|grep LOCAL=NO|wc -l

137

[oracle@gzbmi01 ~]$

节点1使用sys用户能正常登录,但是不能发起任何查询,节点1发起的查询全部挂起:

[oracle@gzbmi01 ~]$ sqlplus /  as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Fri May 19 11:18:33 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> select event,count(*) from v$session where wait_class<>'Idle' group by event;


但是rac的另外一个节点则是正常的,从节点2查询数据库的等待事件:

SQL> select event,count(*) from v$session where wait_class<>'Idle' group by event;

EVENT                                                                         COUNT(*)
---------------------------------------------------------------- ----------
enq: HW - contention                                                      1
SQL*Net message to client                                              1

SQL>select sid,username,sql_id,module,machine,program from v$session where event='enq: HW - contention' and sql_id is not null


       SID USERNAME                       SQL_ID        MODULE                              MACHINE                                                   PROGRAM
---------- ------------------------------ ------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ------------------------------------------------
       430 BMICNEW                        4vs91dcv7u1p6 plsqldev.exe                        Q3RF184E                                         plsqldev.exe


SQL> set long 100000
SQL> select sql_fulltext from v$sqlarea where sql_id='4vs91dcv7u1p6';


SQL_FULLTEXT
--------------------------------------------------------------------------------
insert into sys.aud$( sessionid,entryid,statement,ntimestamp#, userid,userhost,t
erminal,action#,returncode, obj$creator,obj$name,auth$privileges,auth$grantee, n
ew$owner,new$name,ses$actions,ses$tid,logoff$pread, logoff$lwrite,logoff$dead,co
mment$text,spare1,spare2,  priv$used,clientid,sessioncpu,proxy$sid,user$guid, in
stance#,process#,xid,scn,auditid,  sqlbind,sqltext,obj$edition,dbid)  values(:1,
:2,:3,SYS_EXTRACT_UTC(SYSTIMESTAMP),     :4,:5,:6,:7,:8,     :9,:10,:11,:12,
 :13,:14,:15,:16,:17,     :18,:19,:20,:21,:22,     :23,:24,:25,:26,:27,     :28,
:29,:30,:31,:32,     :33,:34,:35,:36)


SQL> select sid,spid, osuser, s.program from v$session s,v$process p where
s.paddr=p.addr and sid=&sid;  2
Enter value for sid: 430
old   2: s.paddr=p.addr and sid=&sid
new   2: s.paddr=p.addr and sid=430


       SID SPID                     OSUSER                         PROGRAM
---------- ------------------------ ------------------------------ ------------------------------------------------
       430 33139                    Administrator                  plsqldev.exe
SQL>

    查询到的sql是数据库自身的基表进行的Insert,于是尝试查杀该会话对应的系统进程:kill -9 33139

但是,查杀会话进程后问题依然存在。

    有人说可能跟归档路径有关系,经查看数据库的归档路径是+ARCH,在节点一发起:alter system switch logfile;

能正常执行并且能在归档的asm磁盘组arch中查到该文件:

Fri May 19 10:43:11 2017
Thread 1 advanced to log sequence 21922 (LGWR switch)
  Current log# 7 seq# 21922 mem# 0: +DATA/gzbmi/onlinelog/group_7.349.918726525

    经过与地市技术人员沟通决定,重新启动节点1上的数据库实例:

[oracle@gzbmi01 ~]$ ps -ef|grep ora_
oracle   36354     1  0 Jan13 ?        01:19:43 ora_pmon_gzbmi1
oracle   36356     1  0 Jan13 ?        00:41:35 ora_psp0_gzbmi1
oracle   36358     1  0 Jan13 ?        1-05:44:24 ora_vktm_gzbmi1
oracle   36362     1  0 Jan13 ?        00:08:05 ora_gen0_gzbmi1
oracle   36364     1  0 Jan13 ?        02:44:09 ora_diag_gzbmi1
oracle   36366     1  0 Jan13 ?        00:33:55 ora_dbrm_gzbmi1
oracle   36368     1  0 Jan13 ?        00:17:22 ora_ping_gzbmi1
oracle   36370     1  0 Jan13 ?        00:06:58 ora_acms_gzbmi1
oracle   36372     1  0 Jan13 ?        12:43:37 ora_dia0_gzbmi1
oracle   36374     1  0 Jan13 ?        16:38:02 ora_lmon_gzbmi1
oracle   36376     1  0 Jan13 ?        05:14:26 ora_lmd0_gzbmi1
oracle   36378     1  0 Jan13 ?        13:07:21 ora_lms0_gzbmi1
oracle   36382     1  0 Jan13 ?        13:04:36 ora_lms1_gzbmi1
oracle   36386     1  0 Jan13 ?        13:06:21 ora_lms2_gzbmi1
oracle   36390     1  0 Jan13 ?        00:06:41 ora_rms0_gzbmi1
oracle   36392     1  0 Jan13 ?        00:17:44 ora_lmhb_gzbmi1
oracle   36394     1  0 Jan13 ?        00:57:45 ora_mman_gzbmi1
oracle   36396     1  0 Jan13 ?        01:37:48 ora_dbw0_gzbmi1
oracle   36398     1  0 Jan13 ?        01:39:29 ora_dbw1_gzbmi1
oracle   36400     1  0 Jan13 ?        01:39:10 ora_dbw2_gzbmi1
oracle   36402     1  0 Jan13 ?        01:38:37 ora_dbw3_gzbmi1
oracle   36404     1  0 Jan13 ?        03:27:51 ora_lgwr_gzbmi1
oracle   36406     1  0 Jan13 ?        04:29:21 ora_ckpt_gzbmi1
oracle   36408     1  0 Jan13 ?        01:26:17 ora_smon_gzbmi1
oracle   36410     1  0 Jan13 ?        00:04:03 ora_reco_gzbmi1
oracle   36412     1  0 Jan13 ?        00:09:02 ora_rbal_gzbmi1
oracle   36414     1  0 Jan13 ?        00:08:44 ora_asmb_gzbmi1
oracle   36422     1  0 Jan13 ?        00:03:50 ora_d000_gzbmi1
oracle   36424     1  0 Jan13 ?        00:03:48 ora_s000_gzbmi1
oracle   36426     1  0 Jan13 ?        00:12:38 ora_mark_gzbmi1
oracle   36432     1  0 Jan13 ?        06:48:47 ora_lck0_gzbmi1
oracle   36434     1  0 Jan13 ?        00:15:49 ora_rsmn_gzbmi1
oracle   36537     1  0 Jan13 ?        00:25:46 ora_arc0_gzbmi1
oracle   36539     1  0 Jan13 ?        00:29:15 ora_arc1_gzbmi1
oracle   36541     1  0 Jan13 ?        00:07:45 ora_arc2_gzbmi1
oracle   36543     1  0 Jan13 ?        00:25:44 ora_arc3_gzbmi1
oracle   36546     1  0 Jan13 ?        00:07:06 ora_gtx0_gzbmi1
oracle   36548     1  0 Jan13 ?        00:45:36 ora_rcbg_gzbmi1
oracle   41230     1  0 10:01 ?        00:00:03 ora_p000_gzbmi1
oracle   51435 47466  0 11:06 pts/2    00:00:00 grep ora_
[oracle@gzbmi01 ~]$ ps -ef|grep asm_
grid     22587     1  0 Jan13 ?        00:56:55 asm_pmon_+ASM1
grid     22589     1  0 Jan13 ?        00:40:03 asm_psp0_+ASM1
grid     22591     1  0 Jan13 ?        1-05:47:40 asm_vktm_+ASM1
grid     22595     1  0 Jan13 ?        00:07:35 asm_gen0_+ASM1
grid     22597     1  0 Jan13 ?        02:31:03 asm_diag_+ASM1
grid     22599     1  0 Jan13 ?        00:21:24 asm_ping_+ASM1
grid     22601     1  0 Jan13 ?        19:48:03 asm_dia0_+ASM1
grid     22603     1  0 Jan13 ?        06:51:21 asm_lmon_+ASM1
grid     22605     1  0 Jan13 ?        04:58:18 asm_lmd0_+ASM1
grid     22607     1  0 Jan13 ?        08:30:03 asm_lms0_+ASM1
grid     22611     1  0 Jan13 ?        00:15:48 asm_lmhb_+ASM1
grid     22613     1  0 Jan13 ?        00:08:13 asm_mman_+ASM1
grid     22615     1  0 Jan13 ?        00:09:25 asm_dbw0_+ASM1
grid     22617     1  0 Jan13 ?        00:09:44 asm_lgwr_+ASM1
grid     22619     1  0 Jan13 ?        00:23:57 asm_ckpt_+ASM1
grid     22621     1  0 Jan13 ?        00:06:58 asm_smon_+ASM1
grid     22623     1  0 Jan13 ?        01:34:49 asm_rbal_+ASM1
grid     22625     1  0 Jan13 ?        01:08:11 asm_gmon_+ASM1
grid     22627     1  0 Jan13 ?        00:23:08 asm_mmon_+ASM1
grid     22629     1  0 Jan13 ?        03:20:59 asm_mmnl_+ASM1
grid     22631     1  0 Jan13 ?        00:28:17 asm_lck0_+ASM1
grid     22715     1  0 Jan13 ?        00:06:40 asm_asmb_+ASM1
oracle   51438 47466  0 11:06 pts/2    00:00:00 grep asm_

[grid@gzbmi01 ~]$srvctl stop instance -d gzbmi -n gzbmi01

oracle用户的告警日志提示:

Fri May 19 11:01:46 2017
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
Stopping background process QMNC
Stopping background process MMNL
License high water mark = 212
Stopping Job queue slave processes, flags = 7
Job queue slave processes stopped
Fri May 19 11:02:40 2017
Stopping background process CJQ0
Fri May 19 11:06:49 2017
Active call for process 41230 user 'oracle' program 'oracle@gzbmi01 (P000)'
SHUTDOWN: waiting for active calls to complete.
Fri May 19 11:07:34 2017
All dispatchers and shared servers shutdown
ALTER DATABASE CLOSE NORMAL /* db agent *//* {1:61387:21743} */
Fri May 19 11:09:59 2017
Reconfiguration started (old inc 4, new inc 6)
List of instances:
 1 2 (myinst: 1) 
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Fri May 19 11:09:59 2017
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri May 19 11:09:59 2017
 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri May 19 11:09:59 2017
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
Fri May 19 11:11:48 2017
License high water mark = 212
USER (ospid: 52209): terminating the instance
Instance terminated by USER, pid = 52209

[oracle@gzbmi01 ~]$ ps -ef|grep ora_

[oracle@gzbmi01 ~]$

重新启动节点1上的实例:

[grid@gzbmi01 ~]$srvctl start instance -d gzbmi -n gzbmi01

oracle用户告警日志提示:

Fri May 19 11:14:12 2017
Adjusting the default value of parameter parallel_max_servers
from 1280 to 470 due to the value of parameter processes (500)
Starting ORACLE instance (normal)

......

Starting background process QMNC
Fri May 19 11:14:53 2017
QMNC started with pid=46, OS id=52912 
Completed: ALTER DATABASE OPEN /* db agent *//* {1:61387:21775} */
Fri May 19 11:14:56 2017
Starting background process CJQ0
Fri May 19 11:14:56 2017
CJQ0 started with pid=51, OS id=52941

节点1上的实例重新启动后恢复正常:

[oracle@gzbmi01 ~]$ sqlplus bmiCnew/bmiCnew

SQL*Plus: Release 11.2.0.4.0 Production on Fri May 19 11:18:33 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>

应用程序启动后功能测试恢复正常,具体的节点1实例hang的原因不太清楚,地市技术人员说前期做过数据抽取,

但是从数据库告警日志和集群日志没有找到资源紧张引发的告警信息。

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值