客户的系统进行压力测试时发现CPU压力非常大,正常情况下CPU负载在30-40%,而峰值期间CPU基本上100%,以至于事务无法处理完成,甚至无法登陆。
经过客户的描述,这个压力测试实际上压力应该不大,仅仅设置了100个客户端的并发,而且单个客户端整个功能模块完成实际上仅需要25秒钟,但并发一多,CPU负载就非常高,以至于单个功能模块完成时间被延长至几百秒。
到客户现场,在操作系统级察看,CPU大量消耗在user上,io的CPU等待很少,确定CPU是消耗在应用层面。
[oracle@localhost ~]$ sar -u 3 50 Linux 2.6.18-92.el5 (localhost.localdomain) 2009年06月15日 _x86_64_ (8 CPU)
14时56分50秒 CPU %user %nice %system %iowait %steal %idle 14时56分53秒 all 93.29 0.00 1.08 0.00 0.00 5.62 14时56分56秒 all 90.54 0.00 1.33 0.00 0.00 8.12 14时56分59秒 all 93.00 0.00 1.08 0.00 0.00 5.93 14时57分02秒 all 92.34 0.00 0.92 0.00 0.00 6.75 14时57分05秒 all 93.25 0.00 0.87 0.00 0.00 5.88 14时57分08秒 all 92.06 0.00 1.02 0.00 0.00 6.92 14时57分11秒 all 91.95 0.00 0.87 0.00 0.00 7.18 14时57分14秒 all 92.71 0.00 0.92 0.00 0.00 6.38 14时57分17秒 all 92.19 0.00 0.87 0.00 0.00 6.94 14时57分20秒 all 94.96 0.00 0.79 0.00 0.00 4.25 14时57分23秒 all 92.91 0.00 0.83 0.00 0.00 6.26 14时57分26秒 all 92.06 0.00 0.70 0.00 0.00 7.24 |
由于客户描述的情况中并发量与CPU消耗有直接关系,因此怀疑有应用级并发的竞争,于是收集了一下Statspack报告,在命中率方面:
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: | 99.98 | Redo NoWait %: | 99.98 |
Buffer Hit %: | 100.00 | In-memory Sort %: | 100.00 |
Library Hit %: | 74.95 | Soft Parse %: | 72.85 |
Execute to Parse %: | 0.27 | Latch Hit %: | 97.18 |
Parse CPU to Parse Elapsd %: | 9.47 | % Non-Parse CPU: | 7.65 |
库高速缓存的命中率很低,硬解析很高,而硬解析与CPU消耗有直接关系。
在等待事件方面:
Top 5 Timed Events
Event | Waits | Time(s) | Avg Wait(ms) | % Total Call Time | Wait Class |
latch: library cache | 5,679,771 | 363,979 | 64 | 86.1 | Concurrency |
CPU time |
| 41,658 |
| 9.9 |
|
latch: shared pool | 1,908,510 | 6,061 | 3 | 1.4 | Concurrency |
latch free | 192,585 | 5,334 | 28 | 1.3 | Other |
latch: cache buffers chains | 19,228 | 823 | 43 | .2 | Concurrency |
很明显,主要等待都集中在与库高速缓存相关的Latch竞争上,库高速缓存是主要的性能问题,而且硬解析也主要在消耗CPU,看来性能瓶颈已经表现出来了,通过下面会话级的直接查询,也发现在大量会话中同时都存在着latch: library cache 竞争:
select event from v$session_wait where event not in ('SQL*Net message from client','rdbms ipc message')
EVENT ---------------------------------------------------------------- SQL*Net message to client latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: shared pool latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache 。。。。。。 smon timer pmon timer |
目前客户的共享池有544M,而在程序运行过程中共享池的剩余空间还有59M,同时从报告中的共享池大小建议上看,对于目前的应用来说,共享池只需要400多M就足够,因此造成库高速缓存命中率低,硬解析高,CPU大量消耗的主要原因很可能是SQL代码没有共享。
Shared Pool Size(M) | SP Size Factr | Est LC Size (M) | Est LC Mem Obj | Est LC Time Saved (s) | Est LC Time Saved Factr | Est LC Load Time (s) | Est LC Load Time Factr | Est LC Mem Obj Hits |
416 | 0.76 | 67 | 7,820 | 3,652 | 0.98 | 466,989 | 1.00 | 3,518,320 |
480 | 0.88 | 130 | 15,164 | 3,710 | 1.00 | 466,931 | 1.00 | 3,523,512 |
544 | 1.00 | 193 | 22,507 | 3,727 | 1.00 | 466,914 | 1.00 | 3,523,751 |
608 | 1.12 | 256 | 29,890 | 3,730 | 1.00 | 466,911 | 1.00 | 3,523,816 |
672 | 1.24 | 319 | 37,269 | 3,731 | 1.00 | 466,910 | 1.00 | 3,523,863 |
736 | 1.35 | 382 | 44,593 | 3,731 | 1.00 | 466,910 | 1.00 | 3,523,915 |
800 | 1.47 | 445 | 51,975 | 3,732 | 1.00 | 466,909 | 1.00 | 3,523,952 |
864 | 1.59 | 508 | 59,424 | 3,733 | 1.00 | 466,908 | 1.00 | 3,524,016 |
928 | 1.71 | 571 | 66,882 | 3,735 | 1.00 | 466,906 | 1.00 | 3,524,077 |
992 | 1.82 | 634 | 74,314 | 3,744 | 1.00 | 466,897 | 1.00 | 3,524,148 |
1,056 | 1.94 | 697 | 81,781 | 3,744 | 1.00 | 466,897 | 1.00 | 3,524,190 |
1,120 | 2.06 | 760 | 89,216 | 3,748 | 1.01 | 466,893 | 1.00 | 3,524,267 |
查询内存中的所有SQL语句,发现有两条SQL语句执行的条数占了内存所有SQL语句的80%,而这些语句的文本基本相同,仅仅是where子句中的值不同。
select sql_text from v$sql order by sql_text;
select * from olat_examinationresult where examlog_id =243035 and question_id=1961 select * from olat_examinationresult where examlog_id =243035 and question_id=1974 select * from olat_examinationresult where examlog_id =243035 and question_id=2078 select * from olat_examinationresult where examlog_id =243035 and question_id=2114 select * from olat_examinationresult where examlog_id =243035 and question_id=2121 select * from olat_examinationresult where examlog_id =243035 and question_id=2330 select * from olat_examinationresult where examlog_id =243035 and question_id=2349 select * from olat_examinationresult where examlog_id =243035 and question_id=2389 select * from olat_examinationresult where examlog_id =243035 and question_id=2390 select * from olat_examinationresult where examlog_id =243035 and question_id=2420 select * from olat_examinationresult where examlog_id =243035 and question_id=2519 select * from olat_examinationresult where examlog_id =243035 and question_id=2520 select * from olat_examinationresult where examlog_id =243035 and question_id=2809 select * from olat_examinationresult where examlog_id =243035 and question_id=2826 select * from olat_examinationresult where examlog_id =243035 and question_id=2837 select * from olat_examinationresult where examlog_id =243035 and question_id=2839 select * from olat_examinationresult where examlog_id =243035 and question_id=2847 select * from olat_examinationresult where examlog_id =243035 and question_id=2871 select * from olat_examinationresult where examlog_id =243035 and question_id=2893 select * from olat_examinationresult where examlog_id =243035 and question_id=2895 select * from olat_examinationresult where examlog_id =243035 and question_id=3168 select * from olat_examinationresult where examlog_id =243035 and question_id=3264 。。。。。。
select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242499 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242500 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242501 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242502 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242503 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242504 。。。。。。 |
问题已经很明显了,应用中存在两条循环执行的SQL语句,这两条语句都存在没有使用绑定变量的问题,因此每次使用不同的值都不得不硬解析,因此并发越多,需要硬解析的SQL语句越多,从而带来更多的CPU消耗,因此让用户将这两条语句的值换成绑定变量,CPU消耗的峰值降低到了15%以下。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22235/viewspace-606501/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/22235/viewspace-606501/