Database Performance: How to reduce "library cache: mutex X" waits for scalability in the Oracle dat

最新推荐文章于 2021-04-04 20:48:43 发布

badman250

最新推荐文章于 2021-04-04 20:48:43 发布

阅读量3w

点赞数

Database Performance: How to reduce "library cache: mutex X" waits for scalability in the Oracle database

数据库性能：如何消除 library cache:mutex X

As the demand for database workload throughput increases along with server CPU core counts there can be an increasing challenge of maintaining scalability even with the most optimized workloads. In fact in some scenarios following best practices can result in contention as more sessions running on more CPUs contend for access to the same SQL or data. In this blog post I will look at an extremely high throughput transactional Oracle database workload and the options available for improving concurrency and throughput without modifying the application.

The workload I will use is the HammerDB Oracle OLTP workload on a 4-socket Intel Xeon server - importantly the server CPU utilisation is over 95% and the throughput is already very high so we need to be aware that contention may not "true" ie if the CPU is already fully utilised then contention may be as a result of a process waiting for CPU time as much as genuine contention on a particular object. Nevertheless we are in the business of maximising as much as possible from our IT investments and it is interesting to see that even with full CPU utilisation there are options to reduce contention and improve performance still further. This HammerDB workload is a good example of an optimized workload as it shares cursors, reduces parsing and makes use of bind variables. How it does this is when a workload is run, for each stored procedure a cursor is opened and as shown here in Oratcl the statement that calls the stored procedure is parsed once only:

set curn_sl [oraopen $lda ]

set sql_sl "BEGIN slev(:st_w_id,:st_d_id,:threshold); END;"

oraparse $curn_sl $sql_sl

return $curn_sl

When the stored procedure is called bind variables are used

orabind $curn_sl :st_w_id $w_id :st_d_id $stock_level_d_id :THRESHOLD $threshold

oraexec $curn_sl

using bind variables means that the same cursor can be shared between sessions, reducing the impact on the amount of memory needed for the shared pool and keeping hard parsing activity to a minimum: Running the workload and measuring the output with an AWR report shows the following top 10 foreground events:

Top 10 Foreground Events by Total Wait Time

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Total Wait Wait % DB

Event Waits Time (sec) Avg(ms) time Wait Class

------------------------------ ------------ ---------- ------- ------ ----------

DB CPU 34.2K 84.5

library cache: mutex X 1,443,752 2284 2 5.6 Concurrenc

cursor: pin S 1,060,588 2004.2 2 5.0 Concurrenc

latch: In memory undo latch 1,036,240 401 0 1.0 Concurrenc

latch: enqueue hash chains 103,783 331.7 3 .8 Other

enq: TX - row lock contention 171,164 266.9 2 .7 Applicatio

db file sequential read 249,612 151.8 1 .4 User I/O

log file sync 35,981 117.3 3 .3 Commit

db file scattered read 155,862 105.7 1 .3 User I/O

buffer busy waits 270,967 90 0 .2 Concurrenc

Performance is good and CPU utilisation is maximised, however the top wait events of library cache: mutex X and cursor: pin S are related to the good practices of cursor sharing. Each of these events covers multiple mutexes and drilling down into the AWR report shows more detailed information on each.

Wait

Mutex Type Location Sleeps Time (ms)

--------------------- -------------------------------- ------------ ------------

Cursor Pin kksfbc [KKSCHLFSP2] 634,827 933,897

Library Cache kglpin1 4 610,798 747,221

Library Cache kglpndl1 95 599,973 753,431

Library Cache kglpnal1 90 503,140 155,556

Cursor Pin kksLockDelete [KKSCHLPIN6] 454,516 701,271

Library Cache kglhbh1 63 2 1

Here the top mutexes are related to child cursor lookup activity (kksfbc) and library cache pinning and unpinning activity (kglpin/kglpndl) - for further information on this there is an excellent presentation on Library Cache internals here. Mutexes are extremely lightweight and use atomic instructions implemented at the CPU level with the LOCK CMPXCHG instruction, however a session sleeping waiting to acquire a mutex will do so utilising CPU time and therefore in this case the optimization has meant the high CPU and session count running the same stored procedures shows evidence of contention.

The tried and tested workaround is to modify the application to to artificially duplicate the SQL by changing the text. One such way to do this is by introducing 'dummy hints' that are not stripped out by the parser but are sufficient for Oracle to recognise that the statements are different and thereby reduces contention by having different sessions use different 'versions' of the same statements. This is not ideal however by modifying the application for a particular hardware platform and in some cases it is simply not an option to modify the application, instead, dbms_shared_pool.markhot can be used. Instead of duplicating the SQL ourselves dbms_shared_pool.markhot serves this purpose introducing multiple versions of the same packages, procedures and cursors into the library cache for different sessions to use - to do this all we have to do is mark the objects in question hot. We already know which packages and procedures are being used for this workload so can mark them hot at a SQL prompt as the SYS user as follows:

exec dbms_shared_pool.markhot(schema=>'SYS',objname=>'DBMS_RANDOM',NAMESPACE=>1);

exec dbms_shared_pool.markhot(schema=>'SYS',objname=>'DBMS_RANDOM',NAMESPACE=>2);

exec dbms_shared_pool.markhot(schema=>'SYS',objname=>'DBMS_OUTPUT',NAMESPACE=>1);

exec dbms_shared_pool.markhot(schema=>'SYS',objname=>'DBMS_OUTPUT',NAMESPACE=>2);

exec dbms_shared_pool.markhot(schema=>'TPCC',objname=>'NEWORD',NAMESPACE=>1);

exec dbms_shared_pool.markhot(schema=>'TPCC',objname=>'PAYMENT',NAMESPACE=>1);

exec dbms_shared_pool.markhot(schema=>'TPCC',objname=>'DELIVERY',NAMESPACE=>1);

exec dbms_shared_pool.markhot(schema=>'TPCC',objname=>'OSTAT',NAMESPACE=>1);

exec dbms_shared_pool.markhot(schema=>'TPCC',objname=>'SLEV',NAMESPACE=>1);

Re-running the workload shows a significant impact on the library cache: mutex X event and despite the high CPU utilisation already throughput has also improved.

Top 10 Foreground Events by Total Wait Time

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Total Wait Wait % DB

Event Waits Time (sec) Avg(ms) time Wait Class

------------------------------ ------------ ---------- ------- ------ ----------

DB CPU 34.5K 88.6

cursor: pin S 1,071,740 2093.3 2 5.4 Concurrenc

enq: TX - row lock contention 373,472 548.9 1 1.4 Applicatio

latch: In memory undo latch 956,992 415.6 0 1.1 Concurrenc

latch: enqueue hash chains 103,312 355.7 3 .9 Other

db file sequential read 211,498 139.8 1 .4 User I/O

db file scattered read 168,610 123.5 1 .3 User I/O

log file sync 36,418 118.7 3 .3 Commit

buffer busy waits 257,958 95.9 0 .2 Concurrenc

library cache: mutex X 23,288 24.3 1 .1 Concurrenc

The detailed mutex information also shows the reduction in library cache mutex activity.

Wait

Mutex Type Location Sleeps Time (ms)

--------------------- -------------------------------- ------------ ------------

Cursor Pin kksfbc [KKSCHLFSP2] 644,771 973,310

Cursor Pin kksLockDelete [KKSCHLPIN6] 455,400 725,202

Library Cache kglpin1 4 10,879 11,115

Library Cache kglpndl1 95 8,330 10,836

Library Cache kglpnal1 90 6,959 1,857

Library Cache kglhbh1 63 1 0

Using dbms_shared_pool.markhot can be also be used for cursors and the most active ones can be seen within the view X$KGLOB - there is a SQL statement from Andrey Nikolaev that shows the hottest objects in the library cache here so we can modify that to find the top 4 cursors for the TPCC user and mark them hot as follows:

SET ECHO OFF

SET SERVEROUTPUT ON

DECLARE

BEGIN

FOR hash_id in (

select KGLNAHSV from (

select

case when (kglhdadr = kglhdpar) then 'Parent' else 'Child' ||kglobt09 end,

kglhdadr ADDRESS,substr(kglnaobj,1,20),

parsing_schema_name, kglnahsh, KGLNAHSV, kglobtyd, kglobt23, kglobt24, kglhdexc, kglhdnsp

from x$kglob, v$sql

where x$kglob.kglnahsh = v$sql.hash_value

and v$sql.parsing_schema_name='TPCC'

and kglhdexc > 1

and kglobt24 > 1

order by kglobt24 desc) where rownum < 5

) LOOP

DBMS_OUTPUT.PUT_LINE('hash=>'|| hash_id.KGLNAHSV ||',namespace=>0');

SYS.DBMS_SHARED_POOL.MARKHOT(hash=>hash_id.KGLNAHSV,namespace=>0);

END LOOP;

END;

An example output is as follows - it is important to note that the hash value required is the full 16 byte hash value of KGLNAHSV in X$KGLOB although on some versions of Oracle no error is given if using the incorrect hash value however it has no effect so you should ensure that you are using the correct hash value as shown in the example:

SQL> @markhot4sql

hash=>b45763c79b8f4f0e8439f6a3190f1b8f,namespace=>0

hash=>554b1093c9cd87db8f6f55016b20198a,namespace=>0

hash=>15e9d1f93391ce5edb7be748a2ab9f80,namespace=>0

hash=>6508e1443bf2a2948165d5ad91803582,namespace=>0

In this case cursor: pin S events have been reduced, however at the expense of increasing library cache mutex events and lowering throughput.

Top 10 Foreground Events by Total Wait Time

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Total Wait Wait % DB

Event Waits Time (sec) Avg(ms) time Wait Class

------------------------------ ------------ ---------- ------- ------ ----------

DB CPU 33.4K 79.7

library cache: mutex X 849,072 5296.8 6 12.6 Concurrenc

enq: TX - row lock contention 277,268 745.3 3 1.8 Applicatio

cursor: pin S 222,045 455.4 2 1.1 Concurrenc

...

The detail shows however that it is a different area of library cache mutex x contention and from referring to the presentation noted previously kglhd is related to the handle structure and therefore creation of a new object.

Mutex Type Location Sleeps Time (ms)

--------------------- -------------------------------- ------------ ------------

Library Cache kglhdgn1 62 1,032,273 2,763,927

Cursor Pin kksfbc [KKSCHLFSP2] 202,786 338,137

Library Cache kglhdgn2 106 186,681 496,834

Library Cache kglpin1 4 127,693 190,311

Library Cache kglpndl1 95 126,592 200,987

Library Cache kglpnal1 90 83,421 19,982

Cursor Pin kksLockDelete [KKSCHLPIN6] 56,208 100,403,

As a consequence with the system already at full CPU utilisation further optimization has proved counter

productive. In fact testing showed that even when marking packages and procedures hot only and limiting the number of copies created with the following parameter resulted in the optimal overall system performance.

SQL> show parameter kgl

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

_kgl_hot_object_copies integer 4

In summary what we have seen is that if experiencing contention on library cache: mutex X or cursor: pin S events then dbms_shared_pool.markhot can be a highly effective tool. Whether looking to achieve a few extra percent on an extreme high performance workload or to achieve high scalability on systems with very high core counts Oracle on Intel makes this possible all without without needing to modify the application.

badman250

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Database Performance: How to reduce "library cache: mutex X" waits for scalability in the Oracle dat

Database Performance: How to reduce "library cache: mutex X" waits for scalability in the Oracle database数据库性能：如何消除 library cache:mutex XAs the demand for database workload throughput incr
复制链接

扫一扫