一条全表扫描sql引发的案例

某数据库  实例1出现监听不可用。原因:月初数据库压力大,一个高并发的查询sql(每秒达到216,表大小1.44G 数据量:3873863 没有用到索引 走全表扫描),导致整个数据库系统负载冲高,数据库一个节点的rac后台进程在系统高负载下没有轮询到网卡状态返回信息,重启网卡,同时重启监听,由于负载太高,系统尝试重启失败,部分登录失败,手工重启监听后解决; 

现象:

LSNRCTL> status
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
TNS-12541: TNS:no listener
 TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Linux Error: 2: No such file or directory

15:12 dba反应个账数据库登录不上,dba登录主机查看实例1监听已经关闭

通知产品dba 查看并启动监听

问题分析:

开发dba侧:

建立临时表保存会话信息:

create table tmp_gt_0301 as  SELECT * FROM  dba_hist_active_sess_history

  where to_char(sample_time, 'yyyymmdd hh24:mi') >= '20160301 14:45'

and to_char(sample_time, 'yyyymmdd hh24:mi') <= '20160301 14:59'

 

查找问题发生的时间点:

SELECT count(*), to_char(sample_time, 'yyyymmdd hh24:mi:ss')  FROM  tmp_gt_0301 group by to_char(sample_time, 'yyyymmdd hh24:mi:ss')

查看等待事件分类:

SELECT event, count(*),to_char(sample_time, 'yyyymmdd hh24:mi:ss')

   FROM tmp_gt_0301 t   group by t.event  ,to_char(sample_time, 'yyyymmdd hh24:mi:ss') ;

  buffer busy waits 1235 20160301 14:54:13
  buffer busy waits 1078 20160301 14:54:24
  buffer busy waits 1064 20160301 14:54:03
  buffer busy waits 901 20160301 14:54:34
  gc buffer busy acquire 889 20160301 14:53:52
  gc buffer busy acquire 883 20160301 14:53:42
  buffer busy waits 854 20160301 14:53:52
  buffer busy waits 854 20160301 14:53:42
  buffer busy waits 851 20160301 14:53:10
  gc buffer busy acquire 793 20160301 14:53:10
  gc buffer busy acquire 659 20160301 14:53:31
  gc buffer busy acquire 650 20160301 14:53:21

根据高峰期等待事件查看是否有阻塞:

  select blocking_session,count(*)  from tmp_gt_0301 where event='buffer busy waits'

   and to_char(sample_time, 'yyyymmdd hh24:mi:ss') ='20160301 14:54:13'

  group by blocking_session ;
2257 103
6077 102
4753 102
4576 101
267 98

根据阻塞的源头 查找阻塞的原因:

      select blocking_session,count(*)  from tmp_gt_0301 where to_char(sample_time, 'yyyymmdd hh24:mi:ss') ='20160301 14:50:55'  group by blocking_session ;

1                          461

82           7747         115

29           8077         23

26           4829         22

40           8240         16

61           3431         16

  SELECT * FROM  tmp_gt_0301 where to_char(sample_time, 'yyyymmdd hh24:mi:ss') ='20160301 14:50:55'  and session_id='7747'

g9a9wp3mau6sy    gc cr block 2-way

 

  SELECT sql_id,event,count(*) FROM  tmp_gt_0301 where to_char(sample_time, 'yyyymmdd hh24:mi:ss') ='20160301 14:50:55'    and blocking_session='7747'  group by sql_id,event;

1            g9a9wp3maxxxx enq: SQ - contention       115

select S_T_XXX_LOG.nextval from dual

SELECT * FROM dba_sequences where sequence_name='S_T_xxxxx_LOG';------cache:500

排查了这几个阻塞的session 发现都在做insert和select操作 没有长时间的固定的阻塞源头

select S_T_XXXXXV_LOG.nextval from dua

INSERT INTO T_XXXKET_XXXXX_DETAIL

insert into  T_XXX_XXXXX_INFO

INSERT INTO T_MXXXXT_XXXXX_XXXXSS

都属于正常业务  sequencecache 500 也可以

 

  SELECT sql_id, to_char(sample_time, 'yyyymmdd hh24:mi:ss'),count(*)  FROM tmp_gt_0301  where  sql_opname='SELECT'  group by sql_id, to_char(sample_time, 'yyyymmdd hh24:mi:ss') order by 3 desc ;

1            4b88vs5099ddk 20160301 14:55:06          216

2            4b88vs5099ddk 20160301 14:55:37          215

3            4b88vs5099ddk 20160301 14:55:16          213

4            4b88vs5099ddk 20160301 14:56:49          211

5            4b88vs5099ddk 20160301 14:54:24          198

6            4b88vs5099ddk 20160301 14:54:34          197

SELECT  count(*),to_char(sample_time, 'yyyymmdd hh24:mi:ss') FROM tmp_gt_0301   where sql_id='4b88vs5099ddk'

group by to_char(sample_time, 'yyyymmdd hh24:mi:ss');

216          20160301 14:55:06

213          20160301 14:55:16

sql信息

4b88vs5099ddk:

select accept_trans_seq_no from t_log_bill_writeoff M_C_W where terminal_seq_no = :1

Plan hash value: 2759573899

------------------------------------------------------------------------------------------

| Id  | Operation         | Name                | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |                     |        |       |     2 (100)|          |

|   1 |  TABLE ACCESS FULL| T_LOG_BILL_WRITEOFF |      1 |    54 |     2   (0)| 00:00:01 |


该sql在问题发生时间段 一个小时内2:00-3:00 

实例1:执行8257次  平均每次执行4s

实例2:执行9413次  平均每次执行3s

平时该sql  单实例每小时执行不足100次

该表大小1.44G  数据量:3873863 termxxxxl_seq_no因此考虑在该列上建立索引

terXxxxx_seq_no 选择性非常好,建完索引后 单条sql执行由原来的平均4s降到平均3ms 性能提高1000倍以上。

 

主机系统侧:

实例1


集群侧:

Listener_log:

实例1:

No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))

No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.xxx.xx.xxx)(PORT=1521)))

Listener completed notification to CRS on stop   crs关闭监听

01-MAR-2016 14:54:21 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=xxxx)(USER=grid))(COMMAND=stop)(ARGUMENTS=64)(SERVICE=LISTENER)(VERSION=186647296)) * stop * 0

Tue Mar 01 15:18:57 2016

System parameter file is /u01/xxxxxx/network/admin/listener.ora

Log messages written to /u01/xxxxxx/tnslsnr/spayrxxxx/listener/alert/log.xml

Trace information written to /u01/apxxxxx/listener/trace/ora_19333160_1.trc

Trace level is currently 0

 

Crsd_log

实例1

2016-03-01 14:54:23.191: [UiServer][11567] {1:31649:5735} Done for ctx=111ad0670

2016-03-01 14:54:23.434: [    A**][10025] {0:2:9900} Received the reply to the message: RESOURCE_STOP[ora.LISTENER.lsnr spayrdb3 1] ID 4099:586286 from the agent /u01/app/11id/in/gent_grid

2016-03-01 14:54:23.436: [    A**][10025] {0:2:9900} A** Proxy Server sending the reply to PE for message:RESOURCE_STOP[ora.LISTENER.lsnr spayrdb3 1] ID 4099:1733429

2016-03-01 14:54:23.902: [    A**][10025] {0:2:9900} Received the reply to the message: RESOURCE_STOP[ora.LISTENER.lsnr spayr 1] ID 4099:586286 from the agent /u01/app2.0/grid/bn/ont_grid

2016-03-01 14:54:23.904: [    A**][10025] {0:2:9900} A** Proxy Server sending the last reply to PE for message:RESOURCE_STOP[ora.LISTENER.lsnr spayrdb3 1] ID 4099:1733429  关闭监听

实例2

2016-03-01 14:54:23.920: [   CRSPE][11310] {0:2:9900} Received reply to action [Stop] message ID: 1733429

2016-03-01 14:54:23.921: [   CRSPE][11310] {0:2:9900} RI [ora.LISTENER.lsnr spayrdb3 1] new internal state: [STABLE] old value: [STOPPING]

2016-03-01 14:54:23.921: [   CRSPE][11310] {0:2:9900} RI [ora.LISTENER.lsnr spayrdb3 1] new external state [OFFLINE] old value: [ONLINE] label = []

2016-03-01 14:54:23.924: [   CRSPE][11310] {0:2:9900} CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'spa' succeeded  crs从'spa' 发出关闭监听

 

 

 

Alert_log:

实例1

***********************************************************************

Fatal NI connect error 12537, connecting to:

 (LOCAL=NO)

Tue Mar 01 14:54:20 2016

Tue Mar 01 14:54:22 2016

***********************************************************************

  VERSION INFORMATION:

             TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

             TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

             Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 – Production

Tue Mar 01 17:02:10 2016

 

 

***********************************************************************

Fatal NI connect error 12537, connecting to:

 (LOCAL=NO)

  VERSION INFORMATION:

             TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

             TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

             Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

  Time: 01-MAR-2016 17:02:10

  Tracing not turned on.

  Tns error struct:

    ns main err code: 12537


处理问题总结:

   月初数据库系统压力本来就很大,再加一个高并发的异常sql,平时该sql很少执行,突然高并发执行 再加系统比较繁忙,导致整个数据库系统负载冲高。因此月初月末需要加强对系统的监控。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/30109892/viewspace-2050765/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/30109892/viewspace-2050765/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值