今天上午接到电话,用户反映一套10.2.0.4的数据库出现用户无法登录的症状,随即在家里连上×××远程支持,登上主机以后尝试测试操作系统认证登录和远程登录,发现已经能够正常登录了;为了了解起因,去查看了告警日志alert.log,发现在11:00左右出现大量的ksvcreate:process creation failed错误,具体错误日志如下:
Wed Mar 23 10:00:17 2011
Process m000 died, see its trace file
Wed Mar 23 10:00:17 2011
ksvcreate: Process(m000) creation failed
Wed Mar 23 10:01:18 2011
Process m000 died, see its trace file
Wed Mar 23 10:01:18 2011
ksvcreate: Process(m000) creation failed
Wed Mar 23 10:02:19 2011
Process m000 died, see its trace file
Wed Mar 23 10:02:19 2011
ksvcreate: Process(m000) creation failed
Wed Mar 23 10:54:41 2011
Process P007 died, see its trace file
Wed Mar 23 10:56:40 2011
Process P007 died, see its trace file
Wed Mar 23 10:57:40 2011
Process P007 died, see its trace file
Wed Mar 23 10:58:40 2011
Process P007 died, see its trace file
Wed Mar 23 10:59:18 2011
Process J000 died, see its trace file
Wed Mar 23 10:59:18 2011
kkjcre1p: unable to spawn jobq slave process
Wed Mar 23 10:59:18 2011
Errors in file /oracle/app/oracle/admin/sdh/bdump/sdh_cjq0_717010.trc:
Wed Mar 23 11:01:00 2011
Process m000 died, see its trace file
Wed Mar 23 11:01:00 2011
ksvcreate: Process(m000) creation failed
Wed Mar 23 11:02:01 2011
Process m000 died, see its trace file
Wed Mar 23 11:02:01 2011
ksvcreate: Process(m000) creation failed
Wed Mar 23 11:02:40 2011
Process P007 died, see its trace file
Wed Mar 23 11:03:02 2011
Process m000 died, see its trace file
Wed Mar 23 11:03:02 2011
ksvcreate: Process(m000) creation failed
Wed Mar 23 11:04:03 2011
Process m000 died, see its trace file
Wed Mar 23 11:04:03 2011
ksvcreate: Process(m000) creation failed
Wed Mar 23 11:04:40 2011
Process P007 died, see its trace file
ksvcreate: Process creation failed错误信息一般在Oracle实例在创建一些辅助后台进程(如mmon的子进程m00x或者并行子进程p00x等)时出现进程启动失败时出现,而造成该错误的可能性有多种,包括Oracle实例资源不足、操作系统资源不足等等。其中较为常见的是实例instance的process使用达到上限,可以通过查询v$resource_limit视图来了解实例生命周期内是否发生过process总数暴满的情况:
SQL> select * from v$resource_limit;
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
------------------------------ ------------------- --------------- -------------------- --------------------
processes 79 800 800 800
sessions 80 813 885 885
enqueue_locks 18 303 10910 10910
enqueue_resources 18 166 4112 UNLIMITED
ges_procs 0 0 0 0
ges_ress 0 0 0 UNLIMITED
ges_locks 0 0 0 UNLIMITED
ges_cache_ress 0 0 0 UNLIMITED
ges_reg_msgs 0 0 0 UNLIMITED
ges_big_msgs 0 0 0 UNLIMITED
ges_rsv_msgs 0 0 0 0
gcs_resources 0 0 0 0
gcs_shadows 0 0 0 0
dml_locks 0 153 3892 UNLIMITED
temporary_table_locks 0 3 UNLIMITED UNLIMITED
transactions 3125 3156 973 UNLIMITED
branches 0 3 973 UNLIMITED
cmtcallbk 0 2 973 UNLIMITED
sort_segment_locks 0 8 UNLIMITED UNLIMITED
max_rollback_segments 19 43 973 65535
max_shared_servers 0 0 UNLIMITED UNLIMITED
parallel_max_servers 6 122 120 3600
/* 可以看到processes的MAX_UTILIZATION最大使用数目曾到过LIMIT_VALUE限定的800,
sessions也有类似的情况 */
从以上V$resource_limit视图的输出来看,极有可能是processes总数达到上限导致了新的后台辅助进程创建失败,其实我们可以很方便地验证这一点:
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
SQL> alter system set processes=20 scope=spfile;
System altered.
/* 将processes参数修改到一个较小值 */
SQL> startup force;
ORACLE instance started.
SQL> select resource_name,MAX_UTILIZATION,LIMIT_VALUE from v$resource_limit where resource_name in ('processes','sessions');
RESOURCE_NAME
转载于:https://blog.51cto.com/maclean/1277732