本文解析一个数据量为3T的oracle数据库的每日批处理工作的addm报告~~~
1)addmrpt_1_681_682.txt line 25
RECOMMENDATION 2: DB Configuration, 82% benefit (14715 seconds)
ACTION: Increase the size of the log files to 2048 M to hold at least 20
minutes of redo information.
redo log file过小,造成日志切换频繁,addm报告建议每个redo log file扩至2G。(现在是和 1.18一样500M,已经扩完了,即时生效)
2)addmrpt_1_682_683.txt line 21
ACTION: Investigate the SQL statement with SQL_ID "77k4cjhvc5rvz" for
possible performance improvements.
报告里提出了很多这种sql需要优化的建议,具体可查看各个addm报告搜索【performance improvements】。
3)addmrpt_1_682_683.txt line 565
RATIONALE: SQL statement with SQL_ID "77k4cjhvc5rvz" was executed 1
times and had an average elapsed time of 17403 seconds.
RATIONALE: At least one execution of the statement ran in parallel.
RATIONALE: Waiting for event "PX Deq Credit: send blkd" in wait class
"Other" accounted for 82% of the database time spent in processing
the SQL statement with SQL_ID "77k4cjhvc5rvz".
RATIONALE: Waiting for event "latch: parallel query alloc buffer" in
wait class "Other" accounted for 3% of the database time spent in
processing the SQL statement with SQL_ID "77k4cjhvc5rvz".
RATIONALE: Waiting for event "PX qref latch" in wait class "Other"
accounted for 2% of the database time spent in processing the SQL
statement with SQL_ID "77k4cjhvc5rvz".
如上所示,82%消耗在【Waiting for event “PX Deq Credit: send blkd” in wait class “Other”】-------------------这个发生了很多次,在别的时间段也出现过
这种情况就是在对表或者索引查询的时候,并行度过高造成了资源的竞争,导致互相等待时间过长。解决的方法就是降低每个并行执行的并行度,比如对象(表,索引)上预设的并行度或者查询Hint 指定的并行度。
4)addmrpt_1_682_683.txt line 916
RATIONALE: Waiting for event "SQL*Net more data from client" in wait
class "Network" accounted for 5% of the database time spent in
processing the SQL statement with SQL_ID "6hz25cqrzp8f1".
这个很显而易见哈,网络等待消耗5%的DBtime
5)addmrpt_1_682_683.txt line 1510
RECOMMENDATION 1: DB Configuration, 0.49% benefit (192 seconds)
ACTION: Increase the size of the SGA by setting the parameter
"sga_target" to 76800 M.
addm报告建议把sga扩到76800M,现在是61440M(和1.18一样),已经写完参数了,重启数据库生效。
6)addmrpt_1_683_684.txt line 961
RATIONALE: Waiting for event "db file scattered read" in wait class
"User I/O" accounted for 41% of the database time spent in processing
the SQL statement with SQL_ID "6091wkzhftz0h".
这个意思是用户进行离散读次数很多,离散读就是相当于一会读磁盘的东面一会读西面,所以很慢。。。
造成这种报错的原因是全表扫或者快速索引全扫。。。
解决方法:
1.对相应的sql语句进行修改
2.重建索引
3.在WHERE条件中加入更多的索引字段
7)addmrpt_1_686_687.txt line 1590
RATIONALE: The I/O usage statistics for the object are: 1 full object
scans, 9404941 physical reads, 5 physical writes and 0 direct reads.
显而易见,物理读很高,不知道为什么这个是全表扫的,SQL我没看有没有索引,建议看一下~~~~
8)addmrpt_1_686_687.txt line 2154
Individual SQL statements responsible for significant user I/O wait were
found.
addmrpt_1_686_687.txt line 2426
SYMPTOM: Wait class "User I/O" was consuming significant database time.
(17% impact [1696 seconds])
这个上面写的很清楚了,这个SQL对user IO等待消耗负责。。。
而且这一个小时内的报告里面,有很多【Wait class “User I/O”】,可以分析一下这个小时里执行的SQL逻辑~
9)addmrpt_1_688_689.txt line 1418
Time spent on the CPU by the instance was responsible for a substantial part
of database time.
这个在此后2个小时的报告里也出现过
显示的是实例在cpu上消耗的时间过长,执行top看了,cpu使用率和负载很高~
10)addmrpt_1_690_691.txt line 2728
ACTION: Use bigger fetch arrays while fetching results from the SELECT
statement with SQL_ID "ghx3dgfr1g0by".
这个有个参数,叫arraysize,默认是15,怎么说呢,这个参数是15,那么你扫描一次数据库就给你返回15个数据块,如果这个值很小,那么就会扫描很多次,造成物理读过高,如果过大,
就会造成,需要返回的数据块很多,相应的redo、undo那些乱七八糟的就会消耗很多,然后我改成了100,即时生效。。。
11)addmrpt_1_691_692.txt line 903、2723
RATIONALE: Waiting for event "local write wait" in wait class "User I/O"
accounted for 2% of the database time spent in processing the SQL
statement with SQL_ID "av6x7tv72ypdk".
这个的意思就是本地写等待,等待的还是User io,也就是自己等自己。。。
造成这种问题的原因有两种:
1、磁盘有问题了,而且是很严重的问题,比如磁盘控制器坏掉了,但是极少发生。。。
2、truncate、drop表的时候,oracle必须使DATA BUFFER中所有该对象的数据块失效或者刷新到磁盘,步骤为此时请求RO队列锁,找缓冲区中该对象的块,
并使其无效化或者刷新到磁盘,然后释放RO锁,如果多个进程并发地进行TRUNCATE的时候,TRUNCATE等待相关的块刷新到磁盘,就表现为等待时间local write wait.
这就得调整一下truncate的逻辑顺序啥的~