记录数据库性能优化一例

故障现象

接到客户反馈,一套生产rac刚把部分业务迁走,业务量降低,数据库性能不止没有提升,反而数据库性能下降:
1、反应数据库Sqluldr导出慢,平日几分钟导出现在要几倍的时间;
2、存储过程直接跑不下去,最长一次跑了两天。

优化思路

1、 首先确认数据库业务业务量是否降低;
2、 查看alert日志是够有异常报错,排除系统软件问题;
3、 根据awr相关视图定位高峰期,收集awr报告,分析具体问题。

优化详细

1、查看数据量

  • 每日归档的次数和数据量
    每日归档次数与数据量
    可以看到数据量确实有所下降,但是性能为什么会降低呢?

2、查看alert日志,无任何异常。

3、性能瓶颈定位

Sqluldr和存储过程都是sql相关变慢,很大一部分就是sql问题了。那么找到高峰时间段,然后通过awr报告的top sql,进行下一步分析。
通过dba_hist_active_sess_histort日期group by排序:

select event,count(*) from dba_hist_active_sess_history  
where sample_time >=to_date('2020-10-29','yyyy-mm-dd')  
and sample_time <to_date('2020-10-30','yyyy-mm-dd') 
group by event order by 2;

10-29:
log file parallel write 13655
gc buffer busy acquire 17542
log file sync 49646
direct path read 63383
db file sequential read 82283

10-30
log file parallel write 12581
gc buffer busy acquire 14042
log file sync 36801
direct path read 60477
db file sequential read 66349

10-31
log file parallel write 11024
gc buffer busy acquire 11718
log file sync 30261
direct path read 60069
db file sequential read 86120

11-1
log file parallel write 10996
gc buffer busy acquire 13171
log file sync 30050
direct path read 60298
db file sequential read 75494

11-2
log file parallel write 12060
gc buffer busy acquire 12139
log file sync 38727
direct path read 58106
db file sequential read 96356

11-3
log file parallel write 10956
reliable message 12563
db file scattered read 13680
gc buffer busy acquire 15626
log file sync 36462
direct path read 40527
db file sequential read 117872

11-4
log file parallel write 15498
gc buffer busy acquire 20064
db file scattered read 24228
direct path read 49980
log file sync 52161
db file sequential read 105079
通过以上数据分析,性能瓶颈在以下几个等待:

db file scattered read等待从11月3日开始逐步上升:

select to_char(sample_time,'yyyymmdd hh24'),event,count(*) from dba_hist_active_sess_history  
where sample_time >=to_date('2020-11-3','yyyy-mm-dd')  
and sample_time <to_date('2020-11-4','yyyy-mm-dd') 
and event='db file scattered read'
group by to_char(sample_time,'yyyymmdd hh24'),event order by 1;

TO_CHAR(SAM EVENT COUNT(*)


20201103 00 db file scattered read 728
20201103 01 db file scattered read 633
20201103 02 db file scattered read 374
20201103 03 db file scattered read 311
20201103 04 db file scattered read 308
20201103 05 db file scattered read 266
20201103 06 db file scattered read 265
20201103 07 db file scattered read 311
20201103 08 db file scattered read 439
20201103 09 db file scattered read 335
20201103 10 db file scattered read 495
20201103 11 db file scattered read 463
20201103 12 db file scattered read 521
20201103 13 db file scattered read 690
20201103 14 db file scattered read 799
20201103 15 db file scattered read 831
20201103 16 db file scattered read 808
20201103 17 db file scattered read 756
20201103 18 db file scattered read 475
20201103 19 db file scattered read 457
20201103 20 db file scattered read 404
20201103 21 db file scattered read 415
20201103 22 db file scattered read 1037
20201103 23 db file scattered read 1559

再看一下2号同样等待事件分布:
20201102 00 db file scattered read 561
20201102 01 db file scattered read 299
20201102 02 db file scattered read 114
20201102 03 db file scattered read 80
20201102 04 db file scattered read 72
20201102 05 db file scattered read 173
20201102 06 db file scattered read 72
20201102 07 db file scattered read 104
20201102 08 db file scattered read 244
20201102 09 db file scattered read 205
20201102 10 db file scattered read 258
20201102 11 db file scattered read 419
20201102 12 db file scattered read 319
20201102 13 db file scattered read 370
20201102 14 db file scattered read 407
20201102 15 db file scattered read 334
20201102 16 db file scattered read 253
20201102 17 db file scattered read 324
20201102 18 db file scattered read 234
20201102 19 db file scattered read 296
20201102 20 db file scattered read 367
20201102 21 db file scattered read 610
20201102 22 db file scattered read 822
20201102 23 db file scattered read 514

log file sync

select to_char(sample_time,‘yyyymmdd hh24’),event,count() from dba_hist_active_sess_history
where sample_time >=to_date(‘2020-11-3’,‘yyyy-mm-dd’)
and sample_time <to_date(‘2020-11-4’,‘yyyy-mm-dd’)
and event=‘log file sync’
group by to_char(sample_time,‘yyyymmdd hh24’),event order by 1;
TO_CHAR(SAM EVENT COUNT(
)


20201103 00 log file sync 2674
20201103 01 log file sync 757
20201103 02 log file sync 179
20201103 03 log file sync 48
20201103 04 log file sync 28
20201103 05 log file sync 17
20201103 06 log file sync 44
20201103 07 log file sync 2969
20201103 08 log file sync 5351
20201103 09 log file sync 1432
20201103 10 log file sync 811
20201103 11 log file sync 531
20201103 12 log file sync 891
20201103 13 log file sync 508
20201103 14 log file sync 1337
20201103 15 log file sync 1378
20201103 16 log file sync 2241
20201103 17 log file sync 1734
20201103 18 log file sync 2206
20201103 19 log file sync 1794
20201103 20 log file sync 1632
20201103 21 log file sync 3001
20201103 22 log file sync 3314
20201103 23 log file sync 1585

db file sequential read

select to_char(sample_time,'yyyymmdd hh24'),event,count(*) from dba_hist_active_sess_history  
where sample_time >=to_date('2020-11-3','yyyy-mm-dd')  
and sample_time <to_date('2020-11-4','yyyy-mm-dd') 
and event='db file sequential read'
group by to_char(sample_time,'yyyymmdd hh24'),event order by 1;

TO_CHAR(SAM EVENT COUNT(*)


20201103 00 db file sequential read 3754
20201103 01 db file sequential read 2336
20201103 02 db file sequential read 3206
20201103 03 db file sequential read 1733
20201103 04 db file sequential read 1415
20201103 05 db file sequential read 1162
20201103 06 db file sequential read 1806
20201103 07 db file sequential read 21367
20201103 08 db file sequential read 19812
20201103 09 db file sequential read 6534
20201103 10 db file sequential read 4534
20201103 11 db file sequential read 3171
20201103 12 db file sequential read 5117
20201103 13 db file sequential read 2886
20201103 14 db file sequential read 4349
20201103 15 db file sequential read 3229
20201103 16 db file sequential read 5156
20201103 17 db file sequential read 3225
20201103 18 db file sequential read 4782
20201103 19 db file sequential read 3345
20201103 20 db file sequential read 3055
20201103 21 db file sequential read 4385
20201103 22 db file sequential read 4009
20201103 23 db file sequential read 3504

–2号

TO_CHAR(SAM EVENT COUNT(*)


20201102 00 db file sequential read 2451
20201102 01 db file sequential read 1953
20201102 02 db file sequential read 2364
20201102 03 db file sequential read 1728
20201102 04 db file sequential read 1477
20201102 05 db file sequential read 1147
20201102 06 db file sequential read 1541
20201102 07 db file sequential read 2732
20201102 08 db file sequential read 2993
20201102 09 db file sequential read 3357
20201102 10 db file sequential read 3649
20201102 11 db file sequential read 4077
20201102 12 db file sequential read 3998
20201102 13 db file sequential read 3398
20201102 14 db file sequential read 3241
20201102 15 db file sequential read 3102
20201102 16 db file sequential read 6309
20201102 17 db file sequential read 5700
20201102 18 db file sequential read 6536
20201102 19 db file sequential read 8878
20201102 20 db file sequential read 6523
20201102 21 db file sequential read 5954
20201102 22 db file sequential read 7582
20201102 23 db file sequential read 5666
根据以上结果,分别收集11月2号和3号22-23点,3号早7-8点的awr报告。高峰时间段等待事件和top event都指向I/O,分析物理读的Top sql,对比不同时间段与执行计划,发现其中很多语句的执行时间、物理读发生很大变化,如:
order by of read
以上sql使用同一个执行计划,同时间段多个session执行并且引起大量的物理读。

  • 当前执行计划
    当前执行计划
  • 历史执行计划
    历史执行计划
    物理读的top sql,基本指向同一张表同一执行计划,查看执行计划看到全表扫描从9分钟变为36分钟。结合上述awr报告中user I/O也是等待的主要部分。
    这基本就明了了,I/O是数据库性能下降的主要因素。

log file sync也比较高:

存储过程执行期间的event
等待log file parallel write,这个等待事件是log file sync事件的一个子过程。压力还是在 log file sync上。这个等待事件算是比较常见的了,可能 LGWR 的写效率低,或者commit过于频繁。
查看存储过程内容,特别多的insert select commit,症状与描述一样,这就对起来了。

分析总结

针对以上分析结果,总结如下:
1、优化执行计划:
对引起大量I/O的sql进行优化。使用oracle自带的sql tuning,获得两种新的执行计划:

  • 基于新索引的执行计划
    where列创建索引
    索引总体来讲,就是四个字“减少I/O”。合理的建立索引,可能带来几十倍的优化效果。
    sql tuning基于where 条件建立SUBSTR索引,执行时间从36分钟优化到6分钟。

  • 基于并行的执行计划
    使用并行
    注意,不合适的并行参数可能引起争用。虽然使用并行的执行时间仅有20秒,除非你对oracle paraller 机制有足够的理论与经验,不然还是要多测试才能在生产系统中使用。
    2、redo日志优化
    提高 LGWR 性能,尽量使用快速磁盘,redo log 避免放在RAID5的磁盘上;
    优化存储过程业务逻辑,避免频繁提交,使用批量提交。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值