oracle 频繁io,oracle IO性能问题故障诊断案例

一业务系统在白天业务时间出现了严重了IO性能问题,下面是下午业务高峰时间(3-5)的awr报告

viewspace-1220645

从等待事件来看主要都是与IO相关

viewspace-1220645

viewspace-1220645

viewspace-1220645

从上面可以看到除了几个语句的逻辑读很高,其实物理不是很高,每秒产生的重做日志以及物理读也不高.

检查磁盘IO

rx6600-1:[/]#sar -d 1 10

HP-UX rx6600-1 B.11.23 U ia64 07/15/14

16:18:45 device %busy avque r+w/s blks/s avwait avserv

16:18:46 c39t0d3 100.00 0.50 18 1130 0.00 132.11

c41t0d3 83.50 0.50 6 450 0.00 290.78

16:18:47 c3t0d0 0.99 0.50 2 63 0.00 7.12

c39t0d3 91.09 0.50 10 982 0.00 115.53

c41t0d3 100.00 0.50 12 586 0.00 291.67

16:18:48 c3t0d0 3.03 0.50 2 32 0.00 15.93

c39t0d3 100.00 0.50 9 1034 0.00 139.76

c41t0d3 92.93 0.50 7 388 0.00 310.07

16:18:49 c3t0d0 2.00 0.50 4 64 0.00 19.59

c39t0d3 100.00 0.50 12 1088 0.00 127.33

c41t0d3 86.00 0.50 8 416 0.00 251.32

16:18:50 c3t0d0 1.01 0.50 1 2 0.00 8.99

c39t0d3 100.00 0.50 16 954 0.00 117.10

c41t0d3 100.00 0.50 9 614 0.00 295.52

16:18:51 c3t0d0 0.99 0.50 1 8 0.00 10.60

c39t0d3 93.07 0.50 17 913 0.00 110.59

c41t0d3 100.00 0.50 9 350 0.00 326.92

16:18:52 c39t0d3 100.00 0.50 21 1168 0.00 127.22

c41t0d3 88.00 0.50 11 544 0.00 252.08

16:18:53 c3t0d0 2.02 0.50 3 48 0.00 18.51

c39t0d3 88.89 0.50 19 1164 0.00 98.25

c41t0d3 100.00 0.50 11 630 0.00 324.39

16:18:54 c3t0d0 3.00 0.50 3 20 0.00 12.39

c39t0d3 95.00 0.50 20 954 0.00 131.90

c41t0d3 81.00 0.50 9 610 0.00 289.05

16:18:55 c3t0d0 9.00 0.50 11 134 0.00 8.62

c39t0d3 100.00 0.50 19 1090 0.00 137.20

c41t0d3 100.00 0.50 11 512 0.00 327.16

Average c39t0d3 99.50 0.50 16 1048 0.00 123.38

Average c41t0d3 100.00 0.50 9 510 0.00 296.44

Average c3t0d0 2.20 0.50 3 37 0.00 12.28

rx6600-1:[/]#sar -d 1 10

HP-UX rx6600-1 B.11.23 U ia64 07/15/14

16:20:04 device %busy avque r+w/s blks/s avwait avserv

16:20:05 c3t0d0 1.00 0.50 1 16 0.00 8.33

c39t0d3 98.00 0.50 16 928 0.00 114.86

c41t0d3 98.00 0.50 10 684 0.00 266.43

16:20:06 c3t0d0 1.98 0.50 4 81 0.00 8.57

c39t0d3 93.07 0.50 19 1251 0.00 128.81

c41t0d3 91.09 0.50 6 475 0.00 365.83

16:20:07 c3t0d0 2.00 0.50 3 48 0.00 5.87

c39t0d3 98.00 0.50 23 1216 0.00 113.66

c41t0d3 98.00 0.50 8 576 0.00 307.92

16:20:08 c3t0d0 1.00 0.50 2 32 0.00 5.36

c39t0d3 100.00 0.50 21 1132 0.00 118.47

c41t0d3 100.00 0.50 7 592 0.00 300.71

16:20:09 c3t0d0 6.00 0.58 13 194 2.22 26.05

c39t0d3 89.00 0.50 17 1152 0.00 123.54

c41t0d3 87.00 0.50 8 512 0.00 298.26

16:20:10 c3t0d0 3.00 0.50 6 96 0.00 22.78

c39t0d3 85.00 0.50 17 1136 0.00 114.79

c41t0d3 98.00 0.50 9 592 0.00 252.52

16:20:11 c3t0d0 1.00 0.50 1 2 0.00 8.04

c39t0d3 100.00 0.50 17 1216 0.00 138.04

c41t0d3 100.00 0.50 12 672 0.00 291.69

16:20:12 c3t0d0 2.00 0.50 3 34 0.00 9.24

c39t0d3 99.00 0.50 16 1024 0.00 122.11

c41t0d3 88.00 0.50 9 476 0.00 299.79

16:20:13 c39t0d3 91.00 0.50 18 1024 0.00 111.77

c41t0d3 92.00 0.50 3 384 0.00 396.25

16:20:14 c39t0d3 99.00 0.50 17 892 0.00 132.15

c41t0d3 100.00 0.50 10 608 0.00 233.54

Average c3t0d0 1.80 0.53 3 50 0.87 17.64

Average c39t0d3 96.00 0.50 18 1097 0.00 121.54

Average c41t0d3 100.00 0.50 8 557 0.00 290.35

在业务人员下班后重启的双机软件,但在启动数据库时停在了Completed redo application这一步

SQL> startup

ORACLE instance started.

Total System Global Area 1.0318E+10 bytes

Fixed Size 2073176 bytes

Variable Size 3238006184 bytes

Database Buffers 7063207936 bytes

Redo Buffers 14700544 bytes

Database mounted.

从alert.log文件中可以看到如下信息:

Tue Jul 15 22:23:29 2014

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Picked latch-free SCN scheme 3

Autotune of undo retention is turned on.

IMODE=BR

ILAT =61

LICENSE_MAX_USERS = 0

SYS auditing is disabled

ksdpec: called for event 13740 prior to event group initialization

Starting up ORACLE RDBMS Version: 10.2.0.4.0.

System parameters with non-default values:

processes = 500

sessions = 555

__shared_pool_size = 3154116608

__large_pool_size = 16777216

__java_pool_size = 33554432

__streams_pool_size = 33554432

sga_target = 10317987840

control_files = /sx_data/ORCL/control01.ctl, /sx_data/ORCL/control02.ctl, /sx_data/ORCL/control03.ctl

db_block_size = 8192

__db_cache_size = 7063207936

compatible = 10.2.0.3.0

log_archive_dest_1 = LOCATION=/sx_data/arch_ORCL/

log_archive_format = %t_%s_%r.dbf

db_file_multiblock_read_count= 16

db_recovery_file_dest = /oracle/flash_recovery_area

db_recovery_file_dest_size= 2147483648

undo_management = AUTO

undo_tablespace = UNDOTBS1

undo_retention = 39600

fast_start_parallel_rollback= FALSE

remote_login_passwordfile= EXCLUSIVE

db_domain =

dispatchers = (PROTOCOL=TCP) (SERVICE=ORCLXDB)

local_listener = ORCL

job_queue_processes = 10

background_dump_dest = /oracle/admin/ORCL/bdump

user_dump_dest = /oracle/admin/ORCL/udump

core_dump_dest = /oracle/admin/ORCL/cdump

audit_file_dest = /oracle/admin/ORCL/adump

db_name = ORCL

open_cursors = 2000

optimizer_index_cost_adj = 20

optimizer_index_caching = 90

pga_aggregate_target = 2576351232

PMON started with pid=2, OS id=13613

PSP0 started with pid=3, OS id=13615

MMAN started with pid=4, OS id=13617

DBW0 started with pid=5, OS id=13619

LGWR started with pid=6, OS id=13621

CKPT started with pid=7, OS id=13623

SMON started with pid=8, OS id=13625

RECO started with pid=9, OS id=13627

CJQ0 started with pid=10, OS id=13629

MMON started with pid=11, OS id=13631

Tue Jul 15 22:23:30 2014

starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

MMNL started with pid=12, OS id=13635

Tue Jul 15 22:23:30 2014

starting up 1 shared server(s) ...

Tue Jul 15 22:23:31 2014

ALTER DATABASE MOUNT

Tue Jul 15 22:23:39 2014

Setting recovery target incarnation to 2

Tue Jul 15 22:23:42 2014

Successful mount of redo thread 1, with mount id 1380841571

Tue Jul 15 22:23:42 2014

Database mounted in Exclusive Mode

Completed: ALTER DATABASE MOUNT

Tue Jul 15 22:23:42 2014

ALTER DATABASE OPEN

Tue Jul 15 22:23:47 2014

Beginning crash recovery of 1 threads

parallel recovery started with 7 processes

Tue Jul 15 22:23:50 2014

Started redo scan

Tue Jul 15 22:23:52 2014

Completed redo scan

336597 redo blocks read, 78835 data blocks need recovery

Tue Jul 15 22:23:52 2014

Started redo application at

Thread 1: logseq 2270, block 29

Tue Jul 15 22:23:53 2014

Recovery of Online Redo Log: Thread 1 Group 4 Seq 2270 Reading mem 0

Mem# 0: /sx_data/ORCL/redo04.log

Tue Jul 15 22:23:58 2014

Completed redo application

一直停在Completed redo application这,而这时的等待事件是checkpoint complete开始以为是并行恢复慢造成的,就查询了v$transaction,v$fast_start_transactions但视图中并没有进行恢复操作的事务存在.后来咨询了老熊,老熊说检查一下IO情况看是不是存储出问题了,如是再次检查存储IO性能:

rx6600-1:[/]#sar 1 10

HP-UX rx6600-1 B.11.23 U ia64 07/15/14

22:36:41 %usr %sys %wio %idle

22:36:42 2 2 12 84

22:36:43 1 0 12 87

22:36:44 0 0 17 83

22:36:45 0 0 13 87

22:36:46 0 1 12 87

22:36:47 2 1 13 84

22:36:48 1 1 16 82

22:36:49 0 0 12 88

22:36:50 0 0 12 88

22:36:51 0 0 22 78

Average 1 0 14 85

从上面可以看到现在实际上并没有业务在跑居然还存在IO等待这是不正常的

rx6600-2:[/]#bdf

Filesystem kbytes used avail %used Mounted on

/dev/vg00/lvol3 983040 422504 556176 43% /

/dev/vg00/lvol1 1835008 135048 1686776 7% /stand

/dev/vg00/lvol8 8912896 8535352 374824 96% /var

/dev/vg00/lvol7 7962624 2762312 5159704 35% /usr

/dev/vg00/lvol4 524288 83192 437784 16% /tmp

/dev/vg00/tmplv 2064384 93512 1847942 5% /oratmp

/dev/vg00/orasoft 10256384 3144652 6668390 32% /orasoft

/dev/vg00/oracle 20480000 5480497 14062042 28% /oracle

/dev/vg00/lvol6 9076736 5206384 3840128 58% /opt

/dev/vg00/lvol5 131072 25472 104824 20% /home

/dev/cwjcvg/cwjc_datalv

414973952 134188114 263239842 34% /cwjc_data

/dev/sxvg/sx_datalv

624689152 298665485 305654147 49% /sx_data

rx6600-2:[/]#time dd if=/dev/zero of=/var/test bs=8k count=100000

下面对小机自身的磁盘进行IO测试写800M的数据只要12秒左右

msgcnt 2 vxfs: mesg 001: vx_nospace - /dev/vg00/lvol8 file system full (1 block extent)

I/O error

47185+0 records in

47184+1 records out

real 11.7

user 0.0

sys 0.8

但是对EMC存储进行IO测试写800M的数据只要30多分还没有完成

rx6600-2:[/]#time dd if=/dev/zero of=/sx_data/test bs=8k count=100000

711856+0 records in

711855+0 records out

real 30:58.4

user 0.5

sys 13.0

这明显的是存储出了问题,后面得知管理人员早上10点多发现了存储有一个磁盘损坏了,存储做的raid 5,有热备盘.而且还有上百G的数据进行存储级的同步.与出现性能问题的时间一至。

到此问题原因找到了解决起来也就简单了.幸亏问题解决了,第二天有大领导来检查要不就.......哈哈

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值