ASM空间爆满导致数据库挂起

最新推荐文章于 2021-04-12 22:48:12 发布

cuilin2183

最新推荐文章于 2021-04-12 22:48:12 发布

阅读量215

点赞数

今天用客户端链接一个测试库时出现错误ORA-00257: archiver error. Connect internal only, until freed。
数据库环境是SOLARIS 10 + 10203 RAC + ASM。

之前遇到过几次这个问题，是因为数据库出了问题，导致无法归档。登上数据库服务器，用sys连接进去查看日志的状态：
SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------------
1 1 54 1073741824 2 NO CURRENT 5678852562 11-10??-07
2 1 51 1073741824 2 NO INACTIVE 5678784185 11-10??-07
3 1 53 1073741824 2 NO INACTIVE 5678835269 11-10??-07
4 1 52 1073741824 2 NO INACTIVE 5678800967 11-10??-07
5 2 22 1073741824 2 YES INACTIVE 5678750740 11-10??-07
6 2 23 1073741824 2 NO CURRENT 5678835302 11-10??-07
7 2 20 1073741824 2 YES INACTIVE 5678603818 11-10??-07
8 2 21 1073741824 2 YES INACTIVE 5678685075 11-10??-07

果然是有很多日志无法归档。再查看alert文件：
ASM的ALERT报警：
Sun Sep 30 02:16:22 2007
WARNING: allocation failure on disk DATA_0000 for file 364 xnum 2147483648
Sun Sep 30 03:26:59 2007
WARNING: allocation failure on disk DATA_0000 for file 370 xnum 2147483648
........

数据库其中一个节点的ALERT
Sun Sep 30 02:16:22 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc1_13668.trc:
ORA-19504: failed to create file "+DATA/archivelog/1_19_634432026.dbf"
ORA-17502: ksfdcre:4 Failed to create file +DATA/archivelog/1_19_634432026.dbf
ORA-15041: diskgroup space exhausted
Sun Sep 30 02:16:22 2007
ARC1: Error 19504 Creating archive log file to '+DATA/archivelog/1_19_634432026.dbf'
ARCH: Archival stopped, error occurred. Will continue retrying
Sun Sep 30 02:16:22 2007
ORACLE Instance prerac1 - Archival Error
........
Thu Oct 11 23:55:03 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc2_15831.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_2.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_2.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded

从alert文件可以知道，不能归档的原因是因为ASM DISK的空间被用完了，但后面的错误：
ORA-00020: maximum number of processes () exceeded是怎么回事呢？
当时整个数据库都已经顿住了，查询v$process完全没有反映，只能从操作系统上查：
bash-3.00$ ps -ef|grep ora|wc -l
84

这个进程数离process参数设定值还差很远，所以基本可以排除是数据库的进程数满了，再看一下asm的进程：
bash-3.00$ ps -ef|grep asm|wc -l
39
asm的进程设定值是40，所以应该是asm的进程达到的设定值。正常情况下asm只有十几个进程，现在出现39个进程肯定是不正常的。
查了一会资料发现，原因还是因为归档引起的。因为我的归档是放在ASM上的，当oracle不能归档时，oracle会新起进程尝试进行归档，而不释放之前的归档进程，而归档进程是需要链接ASM实例的，所以导致ASM的进程数增大，最终出现ORA-00020错误。

为了解决空间不足问题，我用asmcmd删除了大部分的归档日志，用在RMAN中清除了相关的归档日志信息。
空间虽然有了，但此时问题让人存在，估计是因为归档进程并不清楚已经有空间了。此时解决这一系列问题有两种方法：
1、设置log_archive_dest_state_n，让归档暂时放在其他路径
2、重启数据库

在我解决过程中，ASM的连接数不断增加，最终我用sys连接数据库无法shutdown数据库:
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc0_13665.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_2.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_2.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded

最后通过重启crs的方式把数据库重启了：
bash-3.00# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

bash-3.00# ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly

重启后v$log的情况如下：
SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARCHIV STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- --------------
1 1 54 1073741824 2 YES INACTIVE 5678852562 11-10? -07
2 1 55 1073741824 2 YES INACTIVE 5678932182 11-10? -07
3 1 53 1073741824 2 YES INACTIVE 5678835269 11-10? -07
4 1 56 1073741824 2 NO CURRENT 5678958358 12-10? -07
5 2 22 1073741824 2 YES INACTIVE 5678750740 11-10? -07
6 2 23 1073741824 2 YES INACTIVE 5678835302 11-10? -07
7 2 24 1073741824 2 NO CURRENT 5678958359 12-10? -07
8 2 21 1073741824 2 YES INACTIVE 5678685075 11-10? -07

8 rows selected.

可见，数据库已经恢复正常。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/231499/viewspace-63861/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/231499/viewspace-63861/

cuilin2183

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ASM空间爆满导致数据库挂起

今天用客户端链接一个测试库时出现错误ORA-00257: archiver error. Connect internal only, until freed。数据库环境是SOLARIS 10 + 10203 RAC + ASM...
复制链接

扫一扫