DB Name | DB Id | Instance | Inst num | Startup Time | Release | RAC |
ORCL | 4567890123 | ORCL1 | 1 | 07-Jan-15 19:01 | 11.1.0.6.0 | YES |
Host Name | Platform | CPUs | Cores | Sockets | Memory (GB) |
orcl1 | AIX-Based Systems (64-bit) | 16 | 8 |
| 61.50 |
| Snap Id | Snap Time | Sessions | Cursors/Session |
Begin Snap: | 41170 | 06-Mar-15 09:58:35 | 205 | 17.2 |
End Snap: | 41171 | 06-Mar-15 10:58:26 | 194 | 18.3 |
Elapsed: |
| 59.85 (mins) |
|
|
DB Time: |
| 644.49 (mins) |
|
|
查看CPU个数:
$ bindprocessor -q
The available processors are: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
在此快照间隔中, 总共约 60 分钟, cpu 就共有 60*16=960 分钟, DBtime为 644.49 分钟,则:
cpu 花费了 644.49 分钟在处理 Oralce 非空闲等待和运算上(比方逻辑读)
也就是说 cpu 有 644.49 /960*100%=67% 花费在处理 Oracle 的操作上,这还不包括后台进程
Top5 Timed Foreground Events
Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
cursor: pin S wait on X | 3,579,491 | 35,037 | 10 | 90.61 | Concurrency |
library cache pin | 259 | 3,508 | 13543 | 9.07 | Concurrency |
DB CPU |
| 108 |
| 0.28 |
|
DFS lock handle | 13,879 | 6 | 0 | 0.02 | Other |
SQL*Net message from dblink | 6,283 | 3 | 0 | 0.01 | Network |
在这些等待事件(除了db cpu)中, 等待次数最高的是cursor: pin S wait on X
生成一下这个一个小时内的ash报告进一步分析
DB Name | DB Id | Instance | Inst num | Release | RAC | Host |
ORCL | 4567890123 | ORCL1 | 1 | 11.1.0.6.0 | YES | orcl1 |
CPUs | SGA Size | Buffer Cache | Shared Pool | ASH Buffer Size |
16 | 25,231M (100%) | 5,888M (23.3%) | 10,942M (43.4%) | M (%) |
| Sample Time | Data Source |
Analysis Begin Time: | 06-Mar-15 09:58:35 | DBA_HIST_ACTIVE_SESS_HISTORY |
Analysis End Time: | 06-Mar-15 10:58:35 | DBA_HIST_ACTIVE_SESS_HISTORY |
Elapsed Time: | 60.0 (mins) |
|
Sample Count: | 4,699 |
|
Average Active Sessions: | 13.05 |
|
Avg. Active Session per CPU: | 0.82 |
|
Report Target: | None specified |
|
Top Blocking Sessions
- Blocking session activity percentages are calculated with respect to waits on enqueues, latches and "buffer busy" only
- '% Activity' represents the load on the database caused by a particular blocking session
- '# Samples Active' shows the number of ASH samples in which the blocking session was found active.
- 'XIDs' shows the number of distinct transaction IDs sampled in ASH when the blocking session was found active.
Blocking Sid | % Activity | Event Caused | % Event | User | Program | # Samples Active | XIDs |
2069, 3099 | 76.21 | cursor: pin S wait on X | 76.21 | USR_ABCD | dcba@orcl1 (TNS V1-V3) | 901/360 [250%] | 0 |
由此可以看出cursor:pin S wait on X 是由 “2069,3099”这个session 导致的。
Top Sessions
- '# Samples Active' shows the number of ASH samples in which the session was found waiting for that particular event. The percentage shown in this column is calculated with respect to wall clock time and not total database activity.
- 'XIDs' shows the number of distinct transaction IDs sampled in ASH when the session was waiting for that particular event
- For sessions running Parallel Queries, this section will NOT aggregate the PQ slave activity into the session issuing the PQ. Refer to the 'Top Sessions running PQs' section for such statistics.
Sid, Serial# | % Activity | Event | % Event | User | Program | # Samples Active | XIDs |
1969,22749 | 7.64 | cursor: pin S wait on X | 7.62 | USR_ABCD | dcba@orcl1 (TNS V1-V3) | 358/360 [ 99%] | 0 |
1978, 6440 | 7.64 | cursor: pin S wait on X | 7.62 | USR_ABCD | dcba@orcl1 (TNS V1-V3) | 358/360 [ 99%] | 0 |
1989,17438 | 7.64 | cursor: pin S wait on X | 7.64 | USR_ABCD | dcba@orcl1 (TNS V1-V3) | 359/360 [100%] | 0 |
2004, 8866 | 7.64 | cursor: pin S wait on X | 7.60 | USR_ABCD | dcba@orcl1 (TNS V1-V3) | 357/360 [ 99%] | 0 |
2069, 3099 | 7.64 | library cache pin | 7.64 | USR_ABCD | dcba@orcl1 (TNS V1-V3) | 359/360 [100%] | 0 |
上面红色部分是“2069,3099”这个session 的等待事件
Complete List of SQL Text
SQL Id | SQL Text |
dn00zn4qjg469 | SELECT * FROM xx WHERE XX; |
分析到这里,就需要到ORCL1 实例上(不是到ORCL2实例上)执行select * from v$session where sid='2069' and serial#='3099'去看看这个session 是哪个主机发出的。
SQL> set line 300
SQL> select sid,machine,program,status,event from v$session wheresid='2069' and serial#='3099';
SID MACHINE PROGRAM STATUS EVENT
---------- ------------- ------------------------- -------- ----------------------
2069 orcl1 dcba@orcl1 (TNS V1-V3) ACTIVE library cache pin
其实,cursor: pin S wait on X 也好,library cache pin 也罢,这都是shared pool 的等待事件。
可以通过下面的sql语句查看这个session正在运行的这个sql--dn00zn4qjg469 是啥样的sql语句
SQL> select sql_text from v$sql where sql_Id='dn00zn4qjg469';
此时咨询了以下应用软件的工程师,了解一下那个sql语句是干啥用的,结果工程师回应如果是查询语句,把它切掉就行
但是kill session是有原则的,应该先kill os pid,然后再去kill session
每个session都在 db server 上有一个操作系统进程
那么现在就需要通过session去找os pid
SQL> select p.spid from v$session s,v$process p wheres.paddr=p.addr and s.sid='2069';
SPID
------------------------
345054
查出来os pid是345054,然后在ps -ef | grep 345054
进一步确认 345054是不是 oracle数据库自己的进程
$ ps -ef | grep 345054
oracle 303344 406340 0 17:13:47 pts/1 0:00 grep 345054
arcsde 345054 340258 0 Feb 09 - 0:50 oracleORCL1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
LOCAL=YES的进程,是由在数据库服务器上的进程(典型的就是一些业务程序)与oracle 数据库交互后,oracle 数据库产生的进程。
再 ps -ef | grep 340258
340258 是 345054的父进程。
$ ps -ef | grep 340258
arcsde 340258 463126 0 Feb 09 - 0:01 /u01/arfgd/sdeee/bin/gsrvresri_sde WIN-L41
arcsde 345054 340258 0 Feb 09 - 0:50 oracleORCL1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 484124 406340 0 17:20:30 pts/1 0:00 grep 340258
红色处就是应用程序开发商在数据库服务器上部署的业务程序
开始kill os pid: kill -9 345054
然后再kill session: alter system kill session'2069,3099';
$ kill -9 345054
$ ps -ef | grep 345054
oracle 303264 406340 0 17:27:59 pts/1 0:00 grep 345054
$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.6.0 - Production on Fri Mar 6 17:28:07 2015
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bitProduction
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> alter system kill session '2069,3099';
alter system kill session '2069,3099'
*
ERROR at line 1:
ORA-00031: session marked for kill
kill session的时候仅仅是将会话杀掉。在有些时候,由于较大的事务或需要运行较长的SQL语句将导致需要kill的session并不能立即杀掉。对于这种情况将收到 "marked for kill"提示(如上),一旦会话当前事务或操作完成,该会话被立即杀掉。
查看一下那个进程,结果还在
SQL> select sid,machine,program,status,event from v$session wheresid='2069' and serial#='3099';
SID MACHINE PROGRAM STATUS EVENT
---------- -------------- --------------------------- -------- ------------------------
2069 orcl1 abcd@orcl1 (TNS V1-V3) KILLED library cache pin
等待一会再查询
SQL> select sid,machine,program,status,event from v$session wheresid='2069' and serial#='3099';
no rows selected
ok,整理了这个文档仅供参考