接收到某生产系统系统中间件报警,从中间件现象初步判断为中间件到数据库的连接耗尽或者数据库压力过大。
登录到192.168.1.105服务器后,通过topas发现CPU资源耗尽,CPU idle为0。主机组将CPU由2个动态增加到4个,稍后发现数据库异常宕掉,其后台alert日志如下
Detected change in CPU count to 8
Errors in file /software/oracle/app/oracle/product/diag/rdbms/scdb/scdb/trace/scdb_ckpt_389312.trc (incident=89):
ORA-00600: internal error code, arguments: [ksrpubwait:deadlock], [parameters to cluster db instances - broadcast channel], [1], [1], [0], [], [], [], [], [], [], []
Incident details in: /software/oracle/app/oracle/product/diag/rdbms/scdb/scdb/incident/incdir_89/scdb_ckpt_389312_i89.trc
Errors in file /software/oracle/app/oracle/product/diag/rdbms/scdb/scdb/trace/scdb_ckpt_389312.trc:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00600: internal error code, arguments: [ksrpubwait:deadlock], [parameters to cluster db instances - broadcast channel], [1], [1], [0], [], [], [], [], [], [], []
CKPT (ospid: 389312): terminating the instance due to error 2097
此错误为触发了Bug 7535429 (详见本文档最下面的NoteID 756128.1)
手动启动数据库, 连接正常,但CPU资源仍然耗尽。
查看等待事件
select event,count(*) from v$session_wait group by event order by 1;
在不同的时间段,发现会有少量的latch free事件,但持续时间很短。通过AWR报告,此语句耗CPU最多
select b.id, b.actual_name, c.short_name, d.name, b.duty,
nvl(a.total_credit_hour,0) as total_credit_hour,a.can_achieve from
user_credit_hour a ,user_info b, organ c, organ_dept d where
a.user_id=b.id and b.organ_id=c.id and c. organ_degree=0 and
b.dept_id=d.id and rownum <=10 order by a.total_credit_hour desc
查看其执行计划如下:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3762 (100)| |
| 1 | SORT ORDER BY | | 10 | 780 | 3762 (1)| 00:00:46 |
|* 2 | COUNT STOPKEY | | | | | |
|* 3 | HASH JOIN | | 217 | 16926 | 3761 (1)| 00:00:46 |
|* 4 | HASH JOIN | | 75 | 4875 | 184 (2)| 00:00:03 |
| 5 | NESTED LOOPS | | 75 | 3600 | 177 (2)| 00:00:03 |
|* 6 | TABLE ACCESS FULL| ORGAN | 1 | 19 | 3 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| USER_INFO | 75 | 2175 | 174 (2)| 00:00:03 |
| 8 | TABLE ACCESS FULL | ORGAN_DEPT | 3089 | 52513 | 7 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | USER_CREDIT_HOUR | 73403 | 931K| 3575 (1)| 00:00:43 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<=10)
3 - access("A"."USER_ID"="B"."ID")
4 - access("B"."DEPT_ID"="D"."ID")
6 - filter("C"."ORGAN_DEGREE"=0)
7 - filter("B"."ORGAN_ID"="C"."ID")
通过执行计划,此语句里几张表都是在做全表扫描,最大的一张为如下:
SQL> select count(*) from DB_SC.USER_CREDIT_HOUR;
COUNT(*)
----------
73412
查找该表上的索引信息,总共有两个索引
SQL> select owner,index_name,index_type from dba_indexes where table_name='USER_CREDIT_HOUR';
OWNER INDEX_NAME INDEX_TYPE
------------------------------ ------------------------------ ---------------------------
DB_SC USER_CREDIT_HOUR_PK NORMAL
DB_SC USER_CREDIT_HOUR_MAIN_INDEX NORMAL
索引定义如下:
SQL> select dbms_metadata.get_ddl('INDEX','USER_CREDIT_HOUR_MAIN_INDEX','DB_SC') from dual;
DBMS_METADATA.GET_DDL('INDEX','USER_CREDIT_HOUR_MAIN_INDEX','DB_SC')
--------------------------------------------------------------------------------
CREATE INDEX "DB_SC"."USER_CREDIT_HOUR_MAIN_INDEX" ON "DB_SC"."USER_CREDIT_HOUR" ("YEAR", "USER_ID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 2097152 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TS_STUDY_IDX"
SQL> select dbms_metadata.get_ddl('INDEX','USER_CREDIT_HOUR_PK','DB_SC') from dual;
DBMS_METADATA.GET_DDL('INDEX','USER_CREDIT_HOUR_PK','DB_SC')
--------------------------------------------------------------------------------
CREATE UNIQUE INDEX "DB_SC"."USER_CREDIT_HOUR_PK" ON "DB_SC"."USER_CREDIT_HOUR" ("ID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 4194304 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TS_STUDY_IDX"
在执行计划中没有使用复合索引,手动创建一个单键索引(USER_CREDIT_HOUR_USER_ID)后, 其执行计划如下:
DBMS_METADATA.GET_DDL('INDEX','USER_CREDIT_HOUR_USER_ID','DB_SC')
--------------------------------------------------------------------------------
CREATE INDEX "DB_SC"."USER_CREDIT_HOUR_USER_ID" ON "DB_SC"."USER_CREDIT_HOUR" ("USER_ID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 16384 NEXT 8192 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TS_STUDY_IDX"
其执行计划为走索引
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3736789896
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 10 | 780 | 483 (1)| 00:00:06 |
| 1 | SORT ORDER BY | | 10 | 780 | 483 (1)| 00:00:06 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 217 | 16926 | 482 (1)| 00:00:06 |
|* 5 | HASH JOIN | | 75 | 4875 | 184 (2)| 00:00:03 |
| 6 | NESTED LOOPS | | 75 | 3600 | 177 (2)| 00:00:03 |
|* 7 | TABLE ACCESS FULL | ORGAN | 1 | 19 | 3 (0)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | USER_INFO | 75 | 2175 | 174 (2)| 00:00:03 |
| 9 | TABLE ACCESS FULL | ORGAN_DEPT | 3089 | 52513 | 7 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | USER_CREDIT_HOUR_USER_ID | 3 | | 1 (0)| 00:00:01 |
| 11 | TABLE ACCESS BY INDEX ROWID| USER_CREDIT_HOUR | 3 | 39 | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------
2 - filter(ROWNUM<=10)
5 - access("B"."DEPT_ID"="D"."ID")
7 - filter("C"."ORGAN_DEGREE"=0)
8 - filter("B"."ORGAN_ID"="C"."ID")
10 - access("A"."USER_ID"="B"."ID")
附录:
NoteID 756128.1
ORA-600 [Ksrpubwait:Deadlock] ORA-2097 Instance crashes when CPU Count is Dynamically Changed [ID 756128.1] | |
|
|
In this Document
Symptoms
Cause
Solution
References
Platforms: 1-914CU;
This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review. |
Applies to:
Oracle Server - Enterprise Edition - Version: 11.1.0.7 to 11.1.0.7 - Release: 11.1 to 11.1
Information in this document applies to any platform.
ORA-600 [ksrpubwait:deadlock], [parameters to cluster db instances - broadcast channel]
ORA-2097: parameter cannot be modified because specified value is invalid
Instance crashes.
This happens everytime when CPU is dynamically changed.
STACK TRACE:
ksrpubwait ksrpublish kspbcast kspset0 kxfpCPUCountUpdate ksbcpcb
Cause
This caused by Bug 7535429
Bug 7535429 is fixed in version 11.2.
Check for the availability of one-off Patch 7535429 for your platform on MetaLink.
Possible Workarounds:
1) Explicitely setting CPU_COUNT to new value.
2) Start the same instance from pfile created from the same spfile can avoid the problem.
3) Set explicitely PARALLEL_MAX_SERVERS to the current value.
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29468144/viewspace-1079786/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/29468144/viewspace-1079786/