登录测试数据库,发现普通用户登录就hang住没反应
[oracle@rac2 ~]$ sqlplus scott/tiger
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jan 15 19:22:25 2014
Copyright (c) 1982, 2009, Oracle. All rights reserved.
。。。。
hang在这里没有反应
用sys登录没有异常
查看hang住的session 的event
SQL> select sid,username,event,p1,p2,p3 from v$session where status='ACTIVE' and username is not null;
SID USERNAME EVENT P1 P2
---------- ------------------------- ------------------------------------------------------- ---------- ----------
P3
----------
36 SYS SQL*Net message to client 1650815232 1
0
59 SCOTT write complete waits 5 176
0
76 SYS Streams AQ: waiting for messages in the queue 12884 1388097736
这个等待一般是由于IO性能引起的,或者dbwr进程工作量大导致,跟我的情况明显不同
测试库上没有跑任何作业或者其他的操作
操作系统的负载也很低,几乎没有
这个等待的P1 ,P2分别是file#,block#
SQL> select name,parameter1,parameter2,parameter3,wait_class from v$event_name where name='write complete waits';
NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS
--------------------------------------------------------------------------------
write complete waits file# block#
查看等待的对象是什么
SQL> select owner,segment_name, segment_type
2 from dba_extents
3 where file_id = &file_id
4 and &block_id between block_id and block_id + blocks - 1;
Enter value for file_id: 5
old 3: where file_id = &file_id
new 3: where file_id = 5
Enter value for block_id: 176
old 4: and &block_id between block_id and block_id + blocks - 1
new 4: and 176 between block_id and block_id + blocks - 1
OWNER SEGMENT_NAME SEGMENT_TYPE
------------------------------------------------------------ -------------------- ------------------------------------
SYS _SYSSMU14_277467141$ TYPE2 UNDO
同时发现此时很多操作无法完成
如flush buffer_cahce,switch logfile等
查看alter日志发现:
Fri Jan 10 20:07:17 2014
Suspending MMON action 'undo usage' for 82800 seconds
Fri Jan 10 22:00:01 2014
Setting Resource Manager plan SCHEDULER[0x3007]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Fri Jan 10 22:00:01 2014
Starting background process VKRM
Fri Jan 10 22:00:01 2014
VKRM started with pid=38, OS id=13518
Fri Jan 10 22:34:33 2014
DW00 started with pid=51, OS id=14090, wid=2, job SCOTT.SYS_EXPORT_SCHEMA_01
Fri Jan 10 22:47:38 2014
Suspending MMON slave action kewrmrfsa_ for 82800 seconds
Fri Jan 10 22:49:52 2014
DW00 started with pid=54, OS id=14344, wid=2, job SCOTT.SYS_EXPORT_SCHEMA_01
Fri Jan 10 23:15:20 2014
Suspending MMON slave action kewrmafsa_ for 82800 seconds
Sat Jan 11 19:29:00 2014
Suspending MMON action 'undo usage' for 82800 seconds
Sat Jan 11 23:05:39 2014
Suspending MMON slave action kewrmafsa_ for 82800 seconds
Sun Jan 12 00:35:17 2014
Suspending MMON slave action kewrmrfsa_ for 82800 seconds
Sun Jan 12 18:43:51 2014
Suspending MMON action 'undo usage' for 82800 seconds
Sun Jan 12 23:11:10 2014
Suspending MMON slave action kewrmafsa_ for 82800 seconds
Mon Jan 13 00:35:04 2014
Suspending MMON slave action kewrmrfsa_ for 82800 seconds
Mon Jan 13 18:00:27 2014
Suspending MMON action 'undo usage' for 82800 seconds
Mon Jan 13 23:13:24 2014
Suspending MMON slave action kewrmafsa_ for 82800 seconds
发现2个值得注意的地方
1,Setting Resource Manager plan SCHEDULER[0x3007]:DEFAULT_MAINTENANCE_PLAN via scheduler window
2,Suspending MMON slave action kewrmrfsa_ for 82800 seconds
Suspending MMON action 'undo usage' for 82800 seconds
关于1,有文章说可能是Resource Manager plan引起的bug,
resouce manager的bug:
Oracle Support - March 31, 2012 10:35:20 AM GMT-04:00 [ODM Action Plan]
The hang analyze shows several chains where the blocking session is waiting on resmgr:cpu quantum.
Please disable resource manager and check it the issue still occurs.
这里跟我的情况不太一样
2,MMON进程异常,MMON进程跟awr有关,可以kill,这里是测试环境,所以我直接kill了这个进程
[oracle@rac2 trace]$ ps aux | grep mmon
grid 4414 0.0 0.6 419584 13084 ? Ss Jan10 0:00 asm_mmon_+ASM2
grid 4749 0.0 1.4 306724 30028 ? Sl Jan10 1:09 /opt/app/grid/product/11.2.0/jdk/jre//bin/java -Doracle.supercluster.cluster.server=eonsd -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/app/grid/product/11.2.0/srvm/admin/logging.properties -classpath /opt/app/grid/product/11.2.0/jdk/jre//lib/rt.jar:/opt/app/grid/product/11.2.0/jlib/srvm.jar:/opt/app/grid/product/11.2.0/jlib/srvmhas.jar:/opt/app/grid/product/11.2.0/jlib/supercluster.jar:/opt/app/grid/product/11.2.0/jlib/supercluster-common.jar:/opt/app/grid/product/11.2.0/ons/lib/ons.jar oracle.supercluster.impl.cluster.EONSServerImpl
oracle 5054 0.0 4.8 992696 100980 ? Ss Jan10 0:06 ora_mmon_rac2
oracle 26959 0.0 0.0 3920 664 pts/5 R+ 19:21 0:00 grep mmon
[oracle@rac2 trace]$ kill -9 5054
[oracle@rac2 trace]$ ps aux | grep mmon
grid 4414 0.0 0.6 419584 13084 ? Ss Jan10 0:00 asm_mmon_+ASM2
grid 4749 0.0 1.4 306724 30028 ? Sl Jan10 1:09 /opt/app/grid/product/11.2.0/jdk/jre//bin/java -Doracle.supercluster.cluster.server=eonsd -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/app/grid/product/11.2.0/srvm/admin/logging.properties -classpath /opt/app/grid/product/11.2.0/jdk/jre//lib/rt.jar:/opt/app/grid/product/11.2.0/jlib/srvm.jar:/opt/app/grid/product/11.2.0/jlib/srvmhas.jar:/opt/app/grid/product/11.2.0/jlib/supercluster.jar:/opt/app/grid/product/11.2.0/jlib/supercluster-common.jar:/opt/app/grid/product/11.2.0/ons/lib/ons.jar oracle.supercluster.impl.cluster.EONSServerImpl
oracle 26961 0.0 0.0 3920 664 pts/5 R+ 19:21 0:00 grep mmon
alter日志:
Wed Jan 15 19:23:45 2014
Restarting dead background process MMON
Wed Jan 15 19:23:45 2014
MMON started with pid=24, OS id=27085
Wed Jan 15 19:26:33 2014
但是kill后发现还是无法正常登录系统
检查是否有其他异常session的阻塞
SQL> select BLOCKING_INSTANCE,BLOCKING_SESSION from v$session where sid=59;
BLOCKING_INSTANCE BLOCKING_SESSION
----------------- ----------------
2 18
SQL> select sid,username,event,BLOCKING_INSTANCE,BLOCKING_SESSION from v$session where sid=18;
no rows selected
session 59 被session 18阻塞,但是没有session 18的信息
决定重启一下rac2的实例
[grid@rac2 ~]$ srvctl stop instance -d rac -n rac2
这里等待了几分钟,发现没有任何反应
查看alter日志,只有一条信息:
Wed Jan 15 19:26:33 2014
PMON failed to delete process, see PMON trace file
检查pmon tracefile信息:
*** 2014-01-15 19:27:43.564
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:27:57.503
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:28:11.474
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:28:25.360
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:28:39.244
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:28:53.137
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
PMON waiting for 1000 csecs
*** 2014-01-15 19:29:06.990
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:29:06.990
deleting process 0x52bcce2c pid=34 serial=6 priority=0
need redo log switch, current log full
need redo log switch, current log full
need redo log switch, current log full
need redo log switch, current log full
need redo log switch, current log full
need redo log switch, current log full
deletion of process 52bcce2c pid=34 seq=6 prog=FALSE unsuccessful
从pmon信息可以看出来,need redo log switch, current log full
current redo log 需要切换,但是无法完成
查看归档日志信息:
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /home/oracle/archivelog
Oldest online log sequence 794
Next log sequence to archive 794
Current log sequence 796
SQL> select dest_name,status,error,target,process from v$archive_dest where dest_name=1;
DEST_NAME STATUS ERROR
------------------------------ ------------------ --------------------
TARGET PROCESS
-------------- --------------------
LOG_ARCHIVE_DEST_1 ERROR ORA-19502: write
error on file "",
block number
(block size=)
PRIMARY ARCH
SQL> ! df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 23G 16G 5.5G 75% /
/dev/sda1 99M 12M 83M 12% /boot
tmpfs 1014M 614M 400M 61% /dev/shm
发现归档路径有错误,ORA-19502
才查看磁盘,磁盘还有5个多G空间,完全没有问题,后来测试了磁盘的读写权限,也没有问题
这就比较奇怪
重新指定一个归档路径
SQL> alter system set log_archive_dest_1='location=/home/oracle/';
System altered
SQL> alter system switch logfile;
System altered.
发现可以正常切换日志了
此时普通用户也可以正常连接
[oracle@rac2 archivelog]$ sqlplus scott/tiger
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jan 15 19:42:18 201
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>
[oracle@rac2 ~]$ sqlplus scott/tiger
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jan 15 19:22:25 2014
Copyright (c) 1982, 2009, Oracle. All rights reserved.
。。。。
hang在这里没有反应
用sys登录没有异常
查看hang住的session 的event
SQL> select sid,username,event,p1,p2,p3 from v$session where status='ACTIVE' and username is not null;
SID USERNAME EVENT P1 P2
---------- ------------------------- ------------------------------------------------------- ---------- ----------
P3
----------
36 SYS SQL*Net message to client 1650815232 1
0
59 SCOTT write complete waits 5 176
0
76 SYS Streams AQ: waiting for messages in the queue 12884 1388097736
1
这个等待一般是由于IO性能引起的,或者dbwr进程工作量大导致,跟我的情况明显不同
测试库上没有跑任何作业或者其他的操作
操作系统的负载也很低,几乎没有
这个等待的P1 ,P2分别是file#,block#
SQL> select name,parameter1,parameter2,parameter3,wait_class from v$event_name where name='write complete waits';
NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS
--------------------------------------------------------------------------------
write complete waits file# block#
查看等待的对象是什么
SQL> select owner,segment_name, segment_type
2 from dba_extents
3 where file_id = &file_id
4 and &block_id between block_id and block_id + blocks - 1;
Enter value for file_id: 5
old 3: where file_id = &file_id
new 3: where file_id = 5
Enter value for block_id: 176
old 4: and &block_id between block_id and block_id + blocks - 1
new 4: and 176 between block_id and block_id + blocks - 1
OWNER SEGMENT_NAME SEGMENT_TYPE
------------------------------------------------------------ -------------------- ------------------------------------
SYS _SYSSMU14_277467141$ TYPE2 UNDO
同时发现此时很多操作无法完成
如flush buffer_cahce,switch logfile等
查看alter日志发现:
Fri Jan 10 20:07:17 2014
Suspending MMON action 'undo usage' for 82800 seconds
Fri Jan 10 22:00:01 2014
Setting Resource Manager plan SCHEDULER[0x3007]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Fri Jan 10 22:00:01 2014
Starting background process VKRM
Fri Jan 10 22:00:01 2014
VKRM started with pid=38, OS id=13518
Fri Jan 10 22:34:33 2014
DW00 started with pid=51, OS id=14090, wid=2, job SCOTT.SYS_EXPORT_SCHEMA_01
Fri Jan 10 22:47:38 2014
Suspending MMON slave action kewrmrfsa_ for 82800 seconds
Fri Jan 10 22:49:52 2014
DW00 started with pid=54, OS id=14344, wid=2, job SCOTT.SYS_EXPORT_SCHEMA_01
Fri Jan 10 23:15:20 2014
Suspending MMON slave action kewrmafsa_ for 82800 seconds
Sat Jan 11 19:29:00 2014
Suspending MMON action 'undo usage' for 82800 seconds
Sat Jan 11 23:05:39 2014
Suspending MMON slave action kewrmafsa_ for 82800 seconds
Sun Jan 12 00:35:17 2014
Suspending MMON slave action kewrmrfsa_ for 82800 seconds
Sun Jan 12 18:43:51 2014
Suspending MMON action 'undo usage' for 82800 seconds
Sun Jan 12 23:11:10 2014
Suspending MMON slave action kewrmafsa_ for 82800 seconds
Mon Jan 13 00:35:04 2014
Suspending MMON slave action kewrmrfsa_ for 82800 seconds
Mon Jan 13 18:00:27 2014
Suspending MMON action 'undo usage' for 82800 seconds
Mon Jan 13 23:13:24 2014
Suspending MMON slave action kewrmafsa_ for 82800 seconds
发现2个值得注意的地方
1,Setting Resource Manager plan SCHEDULER[0x3007]:DEFAULT_MAINTENANCE_PLAN via scheduler window
2,Suspending MMON slave action kewrmrfsa_ for 82800 seconds
Suspending MMON action 'undo usage' for 82800 seconds
关于1,有文章说可能是Resource Manager plan引起的bug,
resouce manager的bug:
Oracle Support - March 31, 2012 10:35:20 AM GMT-04:00 [ODM Action Plan]
The hang analyze shows several chains where the blocking session is waiting on resmgr:cpu quantum.
Please disable resource manager and check it the issue still occurs.
这里跟我的情况不太一样
2,MMON进程异常,MMON进程跟awr有关,可以kill,这里是测试环境,所以我直接kill了这个进程
[oracle@rac2 trace]$ ps aux | grep mmon
grid 4414 0.0 0.6 419584 13084 ? Ss Jan10 0:00 asm_mmon_+ASM2
grid 4749 0.0 1.4 306724 30028 ? Sl Jan10 1:09 /opt/app/grid/product/11.2.0/jdk/jre//bin/java -Doracle.supercluster.cluster.server=eonsd -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/app/grid/product/11.2.0/srvm/admin/logging.properties -classpath /opt/app/grid/product/11.2.0/jdk/jre//lib/rt.jar:/opt/app/grid/product/11.2.0/jlib/srvm.jar:/opt/app/grid/product/11.2.0/jlib/srvmhas.jar:/opt/app/grid/product/11.2.0/jlib/supercluster.jar:/opt/app/grid/product/11.2.0/jlib/supercluster-common.jar:/opt/app/grid/product/11.2.0/ons/lib/ons.jar oracle.supercluster.impl.cluster.EONSServerImpl
oracle 5054 0.0 4.8 992696 100980 ? Ss Jan10 0:06 ora_mmon_rac2
oracle 26959 0.0 0.0 3920 664 pts/5 R+ 19:21 0:00 grep mmon
[oracle@rac2 trace]$ kill -9 5054
[oracle@rac2 trace]$ ps aux | grep mmon
grid 4414 0.0 0.6 419584 13084 ? Ss Jan10 0:00 asm_mmon_+ASM2
grid 4749 0.0 1.4 306724 30028 ? Sl Jan10 1:09 /opt/app/grid/product/11.2.0/jdk/jre//bin/java -Doracle.supercluster.cluster.server=eonsd -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/app/grid/product/11.2.0/srvm/admin/logging.properties -classpath /opt/app/grid/product/11.2.0/jdk/jre//lib/rt.jar:/opt/app/grid/product/11.2.0/jlib/srvm.jar:/opt/app/grid/product/11.2.0/jlib/srvmhas.jar:/opt/app/grid/product/11.2.0/jlib/supercluster.jar:/opt/app/grid/product/11.2.0/jlib/supercluster-common.jar:/opt/app/grid/product/11.2.0/ons/lib/ons.jar oracle.supercluster.impl.cluster.EONSServerImpl
oracle 26961 0.0 0.0 3920 664 pts/5 R+ 19:21 0:00 grep mmon
alter日志:
Wed Jan 15 19:23:45 2014
Restarting dead background process MMON
Wed Jan 15 19:23:45 2014
MMON started with pid=24, OS id=27085
Wed Jan 15 19:26:33 2014
但是kill后发现还是无法正常登录系统
检查是否有其他异常session的阻塞
SQL> select BLOCKING_INSTANCE,BLOCKING_SESSION from v$session where sid=59;
BLOCKING_INSTANCE BLOCKING_SESSION
----------------- ----------------
2 18
SQL> select sid,username,event,BLOCKING_INSTANCE,BLOCKING_SESSION from v$session where sid=18;
no rows selected
session 59 被session 18阻塞,但是没有session 18的信息
决定重启一下rac2的实例
[grid@rac2 ~]$ srvctl stop instance -d rac -n rac2
这里等待了几分钟,发现没有任何反应
查看alter日志,只有一条信息:
Wed Jan 15 19:26:33 2014
PMON failed to delete process, see PMON trace file
检查pmon tracefile信息:
*** 2014-01-15 19:27:43.564
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:27:57.503
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:28:11.474
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:28:25.360
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:28:39.244
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:28:53.137
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
PMON waiting for 1000 csecs
*** 2014-01-15 19:29:06.990
found process 0x52bcce2c pid=34 serial=6 ospid = 10671 dead
*** 2014-01-15 19:29:06.990
deleting process 0x52bcce2c pid=34 serial=6 priority=0
need redo log switch, current log full
need redo log switch, current log full
need redo log switch, current log full
need redo log switch, current log full
need redo log switch, current log full
need redo log switch, current log full
deletion of process 52bcce2c pid=34 seq=6 prog=FALSE unsuccessful
从pmon信息可以看出来,need redo log switch, current log full
current redo log 需要切换,但是无法完成
查看归档日志信息:
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /home/oracle/archivelog
Oldest online log sequence 794
Next log sequence to archive 794
Current log sequence 796
SQL> select dest_name,status,error,target,process from v$archive_dest where dest_name=1;
DEST_NAME STATUS ERROR
------------------------------ ------------------ --------------------
TARGET PROCESS
-------------- --------------------
LOG_ARCHIVE_DEST_1 ERROR ORA-19502: write
error on file "",
block number
(block size=)
PRIMARY ARCH
SQL> ! df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 23G 16G 5.5G 75% /
/dev/sda1 99M 12M 83M 12% /boot
tmpfs 1014M 614M 400M 61% /dev/shm
发现归档路径有错误,ORA-19502
才查看磁盘,磁盘还有5个多G空间,完全没有问题,后来测试了磁盘的读写权限,也没有问题
这就比较奇怪
重新指定一个归档路径
SQL> alter system set log_archive_dest_1='location=/home/oracle/';
System altered
SQL> alter system switch logfile;
System altered.
发现可以正常切换日志了
此时普通用户也可以正常连接
[oracle@rac2 archivelog]$ sqlplus scott/tiger
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jan 15 19:42:18 201
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>