充值平台数据库优化一例

最新推荐文章于 2024-08-28 13:56:44 发布

ctuph5899

最新推荐文章于 2024-08-28 13:56:44 发布

阅读量104

点赞数

文章标签：数据库

环境：
os：CentOS release 5.4 (Final)
db：Orcle 10g 10.2.0.4.0

巡检发现从周三开始充值平台数据库的cpu使用率增长了以前的1到2倍，系统负载也是相同情况的增加，虽然服务器负载一直不大但是我觉得有必要检查下数据库是否存在一些性能问题。
该数据库网络限制无法连接EM通过web平台查看，sqlplus登录手动生成awr和addm报告。

1、生成报告

a、登录数据库
[oracle@chongzhi admin]$ sqlplus / as sysdba
b、执行awrrpt.sql脚本生产awr报告
SQL> @/u01/app/oracle/product/10.2.0/rdbms/admin/awrrpt.sql
c、执行addmrpt.sql脚本生成addm报告
SQL> @/u01/app/oracle/product/10.2.0/rdbms/admin/addmrpt.sql

ps：/app/oracle/product/10.2.0/rdbms/admin/ 目录存放着oracle自带的很多有用的sql脚本

2、查看报告

a、查看awr报告发现db file scattered read和read by other session等待时间比较严重

Top 5 Timed Events

Event                   Waits           Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time                               5,896                            94.3
db file scattered read 37,034,761     1,229      0                     19.7    User I/O
read by other session   10,619,767     407        0                      6.5    User I/O

db file scattered read是表示太多全表扫描
read by other session 是第一次遇到，如下是网上查到解释：
read by other session Definition: When information is requested from the database, Oracle will first read the data from disk into the database buffer cache. If two or more sessions request the same information, the first session will read the data into the buffer cache while other sessions wait.
In previous versions this wait was classified under the “buffer busy waits” event.
However, in Oracle 10.1 and higher this wait time is now broken out into the “read by other session” wait event. Excessive waits for this event are typically due to several processes repeatedly reading the same blocks, e.g. many sessions scanning the same index or performing full-table scans on the same table. Tuning this issue is a matter of finding and eliminating this contention.
Confio concludes with a summary that “read by other session waits” are very similar to buffer busy waits
When a session waits on the “read by other session” event, it indicates a wait for another session to read the data from disk into the Oracle buffer cache.
If this happens too often the performance of the query or the entire database can suffer. Typically this is caused by contention for “hot” blocks or objects so it is imperative to find out which data is being contended for. Once that is known this document lists several alternative methods for solving the issue.
简单点说就是存在热快（读）的争用，一定有一个或者数个数据库对象被争用严重。

查看awr报告发现物理和逻辑读取的段最多的段是表CZ_INSTANCE_MESSAGE

Segments by Logical Reads
Total Logical Reads: 847,901,296
Captured Segments account for 99.8% of Total

Owner         Tablespace Name Object Name Subobject Name Obj. Type Logical Reads %Total
ZJHC_CHONGZHI CHONGZHI_DATA    CZ_INSTANCE_MESSAGE           TABLE 428,135,472 50.49
ZJHC_CHONGZHI CHONGZHI_DATA    CZ_INSTANCE                   TABLE 397,794,832 46.92

Segments by Physical Reads
Total Physical Reads: 337,490,135
Captured Segments account for 100.0% of Total

Owner Tablespace Name Object Name Subobject Name Obj. Type Physical Reads %Total
ZJHC_CHONGZHI CHONGZHI_DATA CZ_INSTANCE_MESSAGE TABLE 333,501,208 98.82

b、查看addm报告发现id为ayhmtabsvpdgc的sql语句占用大量的数据库、cpu和I/O时间，平均执行时间1.5秒

RECOMMENDATION 3: SQL Tuning, 25% benefit (1541 seconds)
      ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
         "ayhmtabsvpdgc".
         RELEVANT OBJECT: SQL statement with SQL_ID ayhmtabsvpdgc and
         PLAN_HASH 2559540153
         select * from (
         select id,error_message,error_stacktrace,ch_req_message,ch_resp_messa
         ge,co_req_message,co_resp_message,async_req_message,async_resp_messag
         e,create_date,modify_date,sts,instance_id from cz_instance_message
         WHERE
         instance_id = :1
         order by id desc
         )
         WHERE
         rownum <= :2
      RATIONALE: SQL statement with SQL_ID "ayhmtabsvpdgc" was executed 796
         times and had an average elapsed time of 1.5 seconds.

RECOMMENDATION 3: SQL Tuning, 25% benefit (1541 seconds)
      ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
         "ayhmtabsvpdgc".
         RELEVANT OBJECT: SQL statement with SQL_ID ayhmtabsvpdgc and
         PLAN_HASH 2559540153
         select * from (
         select id,error_message,error_stacktrace,ch_req_message,ch_resp_messa
         ge,co_req_message,co_resp_message,async_req_message,async_resp_messag
         e,create_date,modify_date,sts,instance_id from cz_instance_message
         WHERE
         instance_id = :1
         order by id desc
         )
         WHERE
         rownum <= :2
      RATIONALE: SQL statement with SQL_ID "ayhmtabsvpdgc" was executed 796
         times and had an average elapsed time of 1.5 seconds.
      RATIONALE: Average CPU used per execution was 1.3 seconds.

RECOMMENDATION 3: SQL Tuning, 25% benefit (1541 seconds)
      ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
         "ayhmtabsvpdgc".
         RELEVANT OBJECT: SQL statement with SQL_ID ayhmtabsvpdgc and
         PLAN_HASH 2559540153
         select * from (
         select id,error_message,error_stacktrace,ch_req_message,ch_resp_messa
         ge,co_req_message,co_resp_message,async_req_message,async_resp_messag
         e,create_date,modify_date,sts,instance_id from cz_instance_message
         WHERE
         instance_id = :1
         order by id desc
         )
         WHERE
         rownum <= :2
      RATIONALE: SQL statement with SQL_ID "ayhmtabsvpdgc" was executed 796
         times and had an average elapsed time of 1.5 seconds.
      RATIONALE: Average time spent in User I/O wait events per execution was
         0.62 seconds.

该sql语句是对表cz_instance_message的一个查询操作，初步定位问题到对该表的查询，下一步检查该sql执行计划

3、检查sql的执行计划
SQL> explain plan for
         select * from (
         select id,error_message,error_stacktrace,ch_req_message,ch_resp_message,
         co_req_message,co_resp_message,async_req_message,async_resp_message,
         create_date,modify_date,sts,instance_id from zjhc_chongzhi.cz_instance_message
         WHERE
         instance_id = :1
         order by id desc
         )
         WHERE
            rownum <= :2;

Explained.

SQL> @/u01/app/oracle/product/10.2.0/rdbms/admin/utlxpls.sql
ps：utlxpls.sql可以让你以一个完美的阅读格式查看当前最近的sql语句的执行计划
或者执行一下操作亦有相同效果
SQL> set linesize 130;
SQL> select * from table(dbms_xplan.display);

Plan hash value: 2559540153

-----------------------------------------------------------------------------------------------
| Id | Operation             | Name                    | Rows | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                       |     1 | 13062 | 30494    (1)| 00:06:06 |
|* 1 | COUNT STOPKEY               |                       |       |       |    |                |
|   2 |   VIEW                        |                       |     1 | 13062 | 30494    (1)| 00:06:06 |
|* 3 |    SORT ORDER BY STOPKEY|                |     1 | 1742 | 30494   (1)| 00:06:06 |
|* 4 |     TABLE ACCESS FULL | CZ_INSTANCE_MESSAGE |     1 | 1742 | 30493   (1)| 00:06:06 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=TO_NUMBER(:2))
   3 - filter(ROWNUM<=TO_NUMBER(:2))
   4 - filter("INSTANCE_ID"=TO_NUMBER(:1))

结果表明sql对表CZ_INSTANCE_MESSAGE执行了全表扫描。
4、分析和处理：
问题sql
         select * from (
         select id,error_message,error_stacktrace,ch_req_message,ch_resp_message,
         co_req_message,co_resp_message,async_req_message,async_resp_message,
         create_date,modify_date,sts,instance_id from zjhc_chongzhi.cz_instance_message
         WHERE
         instance_id = :1
         order by id desc
         )
         WHERE
            rownum <= :2;

该sql是以CZ_INSTANCE_MESSAGE表的instance_id字段作为查询条件的，但instance_id并未做索引造成sql执行是进行了全表扫描，全表扫描增加了cpu和IO的占用，提请开发人员对instance_id做了索引。
再次检查cpu和系统负载，已经下降到以前的正常水平。
第二天检查相同业务时段的awr报告db file scattered read和read by other session已经很低了不在前五个等待事件的报告内了。
Segments by Logical Reads和Segments by Physical Reads报告内容上CZ_INSTANCE_MESSAGE表也没有了。
检查addm报告已经没有对这条sql进行指导和建议的内容了。

5、总结：
本次优化调整操作根据awr报告发现等待事件和引起等待事件的段（表）根据addm报告发现问题sql，检查该sql的执行计划发现了该sql对表执行了全表扫描，通过对查询条件字段加索引解决了全表扫描引起的性能问题。
整个解决未查询数据字典相关的视图，只是依靠oracle提供的的诊断和建议报告结合sql执行计划的查看这些基本的手段完成。
虽然只是最简单的一例因索引而引起的数据库性能问题，但重点是在于向新人（包括我在内）使用这些诊断工具来排查和解决问题的方法。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/20575781/viewspace-710235/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/20575781/viewspace-710235/

ctuph5899

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
充值平台数据库优化一例

环境：os：CentOS release 5.4 (Final)db：Orcle 10g 10.2.0.4.0巡检发现从周三开始充值平台数据库的cpu使用率增长了以前的1到2倍，系统负载也是相同情况的增加，...
复制链接

扫一扫