Chapter 3. Latches
There are numerous data structures in Oracle's System Global Area (SGA) that need to be accessed concurrently by many different database processes. It is essential that only one process be able to modify any particular data structure at one time, and that the data structure cannot be modified while it is being inspected. Oracle makes sure this does not happen by protecting all SGA data structures with either locks or latches. (See Chapter 6, for a description of the contents of the SGA and other memory areas.)
3.1 Latches and Locks
Latches are the more restrictive mechanism, because they do not allow multiple processes to inspect the protected data structure at the same time—they provide for exclusive access only.[1] Locks allow for better concurrency, because they may be held in a shared mode when the data structure is simply being inspected.
[1] This is a simplification. The redo copy latches can be shared, but this is hardware dependent.
Another significant difference between locks and latches is request queuing. Requests for locks are queued if necessary and serviced in order, whereas latches do not support request queuing. If a request to get a latch fails because the latch is busy, the process just continues to retry until it succeeds. So latch requests are not necessarily serviced in order.
Because a latch can only be held by one process at a time, and because there is no inherent concept of queuing, the latch data structure itself is very simple— essentially just a single location in memory representing the state of the latch. And because the latch data structure is so simple, the functions to get and release a latch have very little work to do. By contrast, the data structures for locks are much more sophisticated because of their support for queuing and concurrency. So the functions to get, convert, and release locks have correspondingly more work to do.
Of course, it is necessary for Oracle to ensure that only one process at a time can modify the latch and lock data structures themselves. For latches this is easy. Because each latch is just a single location in memory, Oracle is able to use the TEST AND SET, LOAD AND CLEAR, or COMPARE AND SWAP instructions of the underlying hardware's instruction set for its latch get operations. Because these are simple machine instructions that are guaranteed to be atomic, no other locking mechanism is needed. This simplicity makes latch gets very efficient.
Oracle's lock data structures, on the other hand, have several parts, and therefore cannot be modified atomically. For this reason, Oracle actually protects operations on locks with latches. The type of latch used varies depending on the type of lock. For example, the cache buffer locks are indirectly protected by the cache buffers chains latches, and the row cache enqueue locks are protected by the row cache objects latch.
Because latches are efficient, Oracle often uses a latch, rather than a lock and latch combination, to protect data structures that are expected to be accessed only briefly and intermittently.
3.2 Parent and Child Latches
Most internal Oracle data structures that are protected by latches are protected by only one latch. However, in some cases more than one latch may be used. For example, there may be a number of library cache latches protecting different groups of objects in the library cache, and separate cache buffers chains latches are used to protect each of the database buffer cache hash chains.
Whenever a number of latches may be used to protect different parts of a structure, or different equivalent structures, these latches are called child latches. For each set of child latches of the same type there is one parent latch. In general, both the parent and child latches may be taken. In practice, however, the library cache parent latch is the only parent latch you are likely to see being taken, and even then this is a relatively rare occurrence by comparison with the activity against its child latches.
Somewhat confusingly, Oracle also refers to solitary latches that have no children as parent latches. So the V$LATCH_PARENT view contains one row for each of the solitary latches, as well as one row for each of the genuine parent latches. V$LATCH_CHILDREN has a row for each child latch. Thus, the union of these two views represents all latches.
The types of latches used by Oracle, and whether they are solitary latches or parent and child sets, varies with different releases of Oracle and operating system ports. The APT script latch_types.sql can be used to see what latch types are in use in your database, whether they are parent and child sets, and if so, how many child latches there are. Example 3.1 shows an extract of the output of this script.
Example 3.1. Sample Output from latch_types.sql
SQL> @latch_types
0 latch wait list 1 process allocation 2 session allocation 3 session switching 4 session idle bit 1 1 111 11
...
APT Scripts and X$ Tables
A number of the APT scripts referred to in this book, like latch_types.sql, are based directly on the X$ tables, rather than the V$ views. This is often necessary because the V$ views do not contain the required information, or because querying the V$ views would impose an unsatisfactory load on the instance.
Because the X$ tables are only visible to the SYS schema, and because it would be bad practice to do anything as SYS unnecessarily, APT requires that you create a set of views that expose the X$ tables to other DBA schemata. This can be done with the create_xviews.sql script, which of course must be run as SYS. Unless these views exist, all APT scripts that are dependent on the X$ tables will fail.
Note that the X$ tables change from release to release, and so these APT scripts are often release specific. Make sure that you use the right scripts for your release of Oracle.
The V$LATCH view contains summary latch statistics grouped by latch type. V$LATCH should be your first point of reference when investigating a suspected latching problem. If the problem relates to a set of latches of the same type, you should consult V$LATCH_CHILDREN to investigate whether the distribution of activity across the child latches is even, and possibly V$LATCH_PARENT also to determine whether there has been any activity against the parent latch.
3.3 Latch Gets
When an Oracle process needs to access a data structure protected by a latch, it can request to get the latch in one of two modes—willing-to-wait mode or no-wait mode (also called immediate mode).
3.3.1 Willing-to-Wait Mode
Oracle expects latches to be held briefly and intermittently. So if a process attempts to get a latch in willing-to-wait mode and finds that the latch is not available, it will spin briefly and then try again. When a process spins, it executes a simple series of instructions a number of times, as a way of waiting before trying again. This is sometimes called an active wait because from the operating system's perspective, the process is still actively consuming CPU cycles, although it is really just waiting a while.
The amount of CPU time that a process will burn before trying to get the latch once again is very small and fixed (although it was tunable in Oracle7 using the _LATCH_SPIN_COUNT parameter). If the next attempt to get the latch fails again, the procedure will be repeated up to the number of times specified by the _SPIN_COUNT parameter. This parameter normally defaults to 2000 iterations in multi-processor environments.
3.3.1.1 Why spin?
The idea of spinning is that another process executing on another CPU may release the latch, thereby allowing the spinning process to proceed. Of course, it makes no sense to spin on a machine with just one CPU, and so Oracle does not.
The alternative to spinning is to relinquish the CPU and allow another process to use it. At first glance, this may seem like a good idea. However, for a CPU to stop executing one process and begin executing another, it must perform a context switch . That is, it must save the context of the first process, determine which process to schedule next, and then resume the context of the next process. The context of a process is essentially a set of CPU register values that describes the exact state of the process.
The implementation of context switches is highly machine dependent. In fact, it is typically written in assembly language. System vendors make every effort to minimize the size of the context data and optimize context switching by using tricks such as remapping memory addresses rather than copying data. Nevertheless, context switching remains an expensive operation because various kernel data structures have to be searched and updated. Access to these structures is protected by spinlocks, which are the equivalent of latches for the operating system. On a large and busy system, context switching normally consumes between 1% and 3% of CPU time. So if a context switch can be avoided by spinning briefly, then some CPU time can be saved, and the waiting time to obtain the latch can be minimized. For this reason, spinning briefly is normally preferable to relinquishing the CPU immediately.
3.3.1.2 Understanding the spin statistics
The latch statistics in the V$LATCH family of views record a get whenever a process acquires a latch in willing-to-wait mode. If the process fails to get the latch without spinning, a miss is recorded. If the latch is obtained after one or more spin iterations, a spin get is recorded. If the latch cannot be obtained while spinning, the process relinquishes the CPU and enters a sleep. No matter how many times the process subsequently wakes up, spins, and sleeps again, no further gets or misses will be recorded, and neither will a spin get be recorded if the latch is finally obtained while spinning. So, the number of times that a latch was obtained without spinning at all is gets -misses. I call these simple gets. The APT script latch_gets.sql shows the breakdown of gets into simple gets, spin gets, and gets that slept, called sleep gets. Example 3.2 shows some sample output.
Example 3.2. Sample Output from latch_gets.sql
SQL> @latch_gets
archive control 228 100.00% 0 0.00% 0 0.00%cache buffer handles 67399 100.00% 0 0.00% 0 0.00%cache buffers chains 2948282897 100.00% 11811 0.00% 35999 0.00%cache buffers lru chain 56863812 99.60% 44364 0.08% 182480 0.32%dml lock allocation 2047579 99.99% 36 0.00% 199 0.01%enqueue hash chains 14960087 99.95% 1139 0.01% 6603 0.04%enqueues 24759299 100.00% 165 0.00% 861 0.00%...
Perhaps more interestingly, the APT script latch_spins.sql shows the effectiveness of spinning for each latch type, as illustrated in Example 3.3.
Example 3.3. Sample Output from latch_spins.sql
SQL> @latch_spins
LATCH TYPE SPIN GETS SLEEP GETS SPIN HIT RATE
cache buffers lru chain 44752 182595 19.68% redo allocation 29218 66781 30.44% library cache 18997 43535 30.38% cache buffers chains 11812 36001 24.70% redo copy 606 18245 3.21% messages 3968 8315 32.30% enqueue hash chains 1139 6603 14.71% system commit number 2312 5548 29.41% undo global data 252 1327 15.96% session idle bit 256 1198 17.61% enqueues 165 861 16.08% transaction allocation 80 535 13.01% list of block allocation 47 353 11.75% shared pool 272 295 47.97% dml lock allocation 36 199 15.32% global tx hash mapping 36 184 16.36% latch wait list 27 95 22.13% session allocation 13 78 14.29% row cache objects 89 76 53.94%
ALL LATCHES 114080 372833 23.43%
3.3.1.3 Tuning the spin count
Clearly, increasing the _SPIN_COUNT parameter has the potential to improve the effectiveness of spinning, at the cost of using more CPU time on unsuccessful spins. Alternately, if many spins are unsuccessful, reducing the spin count will reduce the amount of CPU time spent spinning. In general, tuning the spin count is a matter of balancing the CPU time used spinning against the CPU time and elapsed time saved by avoiding context switches. A workable rule of thumb is to attempt to minimize the value of the following:
_SPIN_COUNT * sleeps / misses
which serves as an approximation of the cost of spinning. If in doubt, err in favor of a higher spin count rather than a lower one. In database instances with mild latching problems, it may be beneficial to increase the _SPIN_COUNT parameter significantly from its default value. This is particularly true if the number of active processes is of the same order of magnitude as the number of CPUs. In instances experiencing severe latch contention the optimum spin count is normally much less than the default, but more than one.
The APT script tune_spin_count.sql can be used to try out alternate values for the _SPIN_COUNT parameter. It notes the spin statistics, then uses the ALTER SYSTEM SET "_SPIN_COUNT" command to change the spin count. After waiting for the specified period, it checks the spin statistics again and computes the effect of the new spin count over the interval. A sample dialog from this script is shown in Example 3.4. Be warned that no allowance is made for variations in load, so some variability in results is to be expected. Note also that trying a very high value for _SPIN_COUNT could upset your users!
Example 3.4. Sample Dialog from tune_spin_count.sql
SQL> @tune_spin_count
SPIN_COUNT
2000
SPIN HIT RATE SPIN COST
93.53% 6
Enter new _spin_count value to try: 4000 Enter time to wait (in seconds): 900
SPIN HIT RATE SPIN COST
96.27% 4
SQL>
Of course, tuning the spin count should be the very last thing you do in response to latch contention. You should first identify which latches are subject to contention, and then attempt to understand why. You should then make every possible effort to prevent the contention. Only when you have completely run out of ideas should you attempt to minimize the effect of the contention by tuning the spin count.
3.3.2 Sleeps
If a willing-to-wait request fails, then before the process goes to sleep, it must arrange for itself to be woken up again. As described in Chapter 2, there are two mechanisms for a process that is sleeping to be woken up again. The normal mechanism for latch sleeps is a simple timeout. A process sleeping for a latch waits on its semaphore, but before it does so, it sets an alarm that will cause it to be signaled by the operating system at the end of a specified interval. The interval specified is variable. Initially the process will sleep for just one centisecond. If after waking up, the process again fails to obtain the latch, then the length of the second and any subsequent sleeps will be doubled under what is called the exponential backoff algorithm. The maximum sleep under the exponential backoff algorithm is set by the _MAX_EXPONENTIAL_SLEEP parameter, which defaults to 2 seconds in Oracle8. However, if the process is already holding other latches, then the maximum sleep time is reduced to the value of the _MAX_SLEEP_HOLDING_LATCH parameter, which defaults to 4 centiseconds, and possibly further, in proportion with the number of other latches already being held.
Another task that the process performs before it goes to sleep is to update the session wait information visible in the V$SESSION_WAIT view to indicate that the process is waiting on a latch free wait . The wait parameters are shown in Table 3.1.
Parameter Description
Table 3.1. Wait Parameters (latch free waits)
The SGA address of the latch required; corresponds to the ADDR column
p1 of the V$LATCH_PARENT and V$LATCH_CHILDREN views (but not
V$LATCH itself)
p2 The type of the latch; corresponds to the LATCH# V$LATCH family of views column of the
p3 The number of times that the process has slept during this attempt to acquire the latch
When the process wakes up again, it will update the session wait information to indicate that the wait is over, and if timed statistics are enabled, it will record the time for which it slept. The cumulative statistics for latch free waits that are visible in the V$SESSION_EVENT and V$SYSTEM_EVENT views are also updated at this time. Note that consecutive sleeps during a single attempt to acquire a latch are recorded as separate waits. However, the latching statistics visible in the V$LATCH family of views are only updated once the latch has been acquired.
If a process fails to obtain a latch in either willing-to-wait or no-wait mode, then it updates the latch miss statistics which are visible in the V$LATCH_MISSES view. This update is not protected by a latch, and so these statistics may not tally with those in V$LATCH . Each row in V$LATCH_MISSES represents a location in the Oracle server code from which a latch may be held. The NWFAIL_COUNT and SLEEP_COUNT columns record the number of no-wait get failures and sleeps, respectively, that occurred while the latch was being held from that particular location in the code. Unfortunately, considerable familiarity with the Oracle server code is required to be able to interpret the significance of these statistics. For what it's worth, the APT script latch_where.sql shows the distribution of sleeps against code locations.
3.3.3 Latch Wait Posting
The second mechanism whereby a process sleeping on a latch may be woken up is called latch wait posting. In this case, the next process to free the required latch will wake up the sleeping process. The waiting process must request latch wait posting before it goes to sleep. It does this by putting itself on a list of processes waiting to be posted, known as the latch wait list. When a process frees a latch, it checks the latch wait list, and if there is a process waiting for that latch, it posts the semaphore of the waiting process, which acts as a signal to the operating system to schedule the waiting process to run.
The benefit of latch wait posting is that there is a high probability of the waiting process obtaining the latch almost as soon as the latch is freed. Of course, there is also a significant cost to latch wait posting, namely maintaining the latch wait list data structure. This data structure is implemented as a set of singly linked lists through the process table in the SGA (visible as X$KSUPR. KSLLALAQ ). Of course, as with any other data structure, the lists have to be protected by latches. Where latch wait posting is used extensively, the latch wait lists can become relatively long, with the result that the latch wait list latches are held longer and more frequently than otherwise. Indeed, it is not uncommon to see secondary contention on one of the latch wait list latches, when there is severe contention for some other latch for which latch wait posting is enabled.
By default, latch wait posting is enabled only for the library cache and shared pool latches. It can be disabled entirely by setting the _LATCH_WAIT_POSTING parameter to (the default is 1), or it can be enabled for all latches by setting the parameter to 2. Changes to latch wait posting need to be carefully benchmarked. Disabling latch wait posting can be beneficial where contention on the library cache latch is severe, and enabling it for all latches can improve performance in cases of moderate contention for other latches. Even when enabled for all latches, latch wait posting will not always be requested for sleeps on the cache buffers chains latches.
The WAITERS_WOKEN column in the V$LATCH family of views shows the number of times that a waiter has been woken via the latch wait posting mechanism. This statistic can actually be greater than the number of misses, because it is possible for a process to be posted and yet fail to obtain the latch because some other process has taken it in the interim.
3.3.4 Latch Contention
We have already observed that Oracle expects latches to be held only briefly and intermittently. If the use of any latch is either not brief, or not intermittent, then contention for that latch is likely. An episode of latch contention begins when the latch is being held by one process and is required by two or more other processes. Until the backlog of demand is cleared, waiting processes must contend for the latch. This results in CPU time being ineffectively used, and in the extreme can have a disastrous effect on performance.
The severity of contention for a particular latch may be characterized in terms of the frequency, duration, and intensity of latch contention episodes. This can be assessed using the histogram of sleep counts contained in the SLEEP1 to SLEEP4 columns of V$LATCH. Note that no statistics are kept for sleep cycles longer than four iterations—the columns SLEEP5 to SLEEP11 are retained for compatibility with releases of Oracle prior to 7.3.
The histogram of sleep counts can also be used to determine the effectiveness (or otherwise) of attempts to reduce contention for the latch. However, the ratio of sleeps to gets serves as a better indicator of the effectiveness of latch tuning, because it accounts for simple gets as well as misses. I call this ratio, expressed as a percentage, the sleep rate. The sleep rate is calculated by the APT script latch_sleeps.sql . See Example 3.5 for sample output.
Example 3.5. Sample Output from latch_sleeps.sql
SQL> @latch_sleeps
LATCH TYPE IMPACT SLEEP RATE WAITS HOLDING LEVEL
library cache 11224 0.03% 256 5 cache buffers chains 1295 0.00% 0 1 redo allocation 713 0.01% 9613 7 system commit number 373 0.00% 66 8 enqueue hash chains 221 0.00% 3 4 redo copy 210 22.30% 0 6 shared pool 166 0.01% 1434 7 cache buffers lru chain 146 0.01% 336 3
messages 135 0.01% 0 8
session allocation 113 0.02% 0 5
row cache objects 86 0.00% 0 4
enqueues 75 0.00% 624 5
latch wait list 48 0.08% 1 9
session idle bit 47 0.00% 0 1
undo global data 14 0.00% 0 5
multiblock read objects 13 0.00% 8 3
sequence cache 11 0.00% 0 8
dml lock allocation 10 0.00% 0 3
transaction allocation 10 0.00% 0 8
list of block allocation 4 0.00% 0 3
modify parameter values 2 0.03% 0 0
process allocation 1 0.02% 0 0
Note that there is an important difference between the sleep rate and the impact of a particular type of latch on overall performance. For example, in Example 3.5 the sleep rate for the redo copy latches is high (as is normal). However, because there are very few willing-to-wait gets on these latches, the impact of these sleeps is not the highest. The impact shown is based on the number of sleeps. However, not all sleeps are equal because of the exponential backoff algorithm. So the number of sleeps per sleep get is used as an indicator of the average length of sleeps against each latch, and this is multiplied by the number of sleeps to estimate the impact.
3.3.5 Latch Levels
It is very common for an Oracle process to need to hold a number of latches concurrently. Therefore, there might be a possibility of latching deadlocks occurring—namely, one process holding latch A and another process holding latch B, and both processes spinning and waiting for the alternate latch. Oracle ensures that this cannot happen by ensuring that latches are always taken in a defined order, when more than one latch is required. To support this, every latch in Oracle has a level between and 15, and a 2-byte bitmap is maintained for every process representing the levels of the latches that the process is currently holding. When a process attempts to get a latch in willing-to-wait mode, a check is made to ensure that it is not already holding a latch at the same level or at a higher level. In general, if this rule is broken, an ORA-600 [504] internal error is raised.[2]
[2] However, this latch level rule is sometimes relaxed to allow two library cache child latches to be held simultaneously.
Contention for a high-level latch such as the redo allocation latch (level 6) can easily exacerbate contention for lower-level latches such as the cache buffers chains latches (level 1 in Oracle 8.1). This happens because processes needing the higher-level latch have to sleep while holding a lower-level latch. So the lower-level latches are held for much longer than normal. An indication of this factor is available in the WAITS_HOLDING_LATCH column of the V$LATCH family of views. That statistic represents the number of times that a process waited while holding this latch. Those waits include, but are not limited to, waits for a higher-level latch. For example, the waits holding latch statistic for the cache buffers chains latches could include sleeps while trying acquire the redo allocation latch. However, it could also include other waits such as log buffer space waits. If waits holding a latch appear to be a significant factor in contention for that latch, those waits should be addressed first. For that reason, it is generally wise to address latch contention issues in descending order of latch level, rather than merely in descending order of apparent impact, particularly if there are waits while holding a low-level latch.
3.3.6 No-Wait Mode
No-wait mode is used when Oracle is already holding one latch and needs to acquire another latch at the same level or at a lower level. A willing-to-wait request cannot be used in this case because of the deadlock prevention requirement. In this case, Oracle can request the latch in no-wait mode, as long as no more than one pair of latches would be held at the same level. If the no-wait request succeeds, there is no risk of deadlock and so all is well. However, if the request fails, there would be a risk of deadlock were the process to persist in its attempt to acquire the latch. Instead, the process releases all the higher-level latches that it holds, yields the CPU, and then immediately attempts to acquire them again in the correct order of level.
The redo copy latches are a slightly special case. No-wait mode is used for most gets against these latches, because Oracle can use any one of them to protect the copy into the log buffer. If the request for one copy latch fails, Oracle can perform the copy on another latch instead. Willing-to-wait mode is only used to get the last copy latch if no-wait gets against all the other copy latches have failed. This is normally a symptom of waits while holding the copy latches, such as contention for a higher-level latch, and so increasing the number of copy latches with the _LOG_SIMULTANEOUS_COPIES parameter does not normally help.
Other than the redo copy latches, there are only a few types of latches that Oracle sometimes attempts to get in no-wait mode. For all other types of latches, the IMMEDIATE_GETS and IMMEDIATE_MISSES columns in the V$LATCH family of views are always zero.
From a performance point of view, immediate misses are not necessarily a problem. If the relinquished latches are reclaimed cheaply after the willing-to-wait get is satisfied, then the cost of the immediate miss is not inordinate. However, if there is a degree of contention for those other latches, then immediate misses exacerbate the problem by increasing the workload on those latches. Therefore, when tuning any latch you should attempt to eliminate immediate misses as well as sleeps. However, don't lose too much sleep over immediate misses unless you are sleeping too much on higher-level latches.
3.3.7 Latch Cleanups
It is a fact of life that Oracle processes sometimes die unexpectedly, and can die when holding a latch. It is the task of the Oracle PMON process to detect the unexpected death of user processes and perform cleanup actions. Among the cleanup actions that PMON performs first is latch cleanup. Latch cleanup is completed for all newly deceased processes, before any work is begun to roll back uncommitted transactions.
Latch cleanup is not merely a matter of freeing the latch. Latches are taken to manipulate data structures, and if a process dies holding a latch, there is every chance that the data structure protected by the latch may have been left in an inconsistent state. To support latch recovery, processes holding a latch in order to manipulate a structure write a record of their intended operation into the latch recovery area for that latch, prior to performing the operation. PMON's task is not just to free the latch, but first to recover the protected data structure. A latch is said to be in flux if latch recovery is necessary or in progress.
However, because PMON normally wakes up only every 3 seconds, Oracle has another way of initiating latch cleanup. If a process has repeatedly failed to acquire a latch, it will perform a latch activity test to check whether latch cleanup may be necessary. If there is no activity on the latch for 5 centiseconds, the process will post PMON, and PMON will check whether the process holding the latch has died and needs to be cleaned up.
When a process is performing a latch activity test, or waiting for PMON to check the process holding the latch, the V$SESSION_WAIT view shows that the process is waiting on a latch activity wait . The wait parameters are as shown in Table 3.2.
Table 3.2. Wait Parameters (latch activity waits)
Parameter Description
p1 The SGA address of the latch required.
p2 The type of the latch.
p3 0 for the latch activity test. Otherwise, the process number of the possibly deceased latch holder being checked by PMON.
If latch contention is accompanied by numerous latch activity waits, the cause of both symptoms could be an operating system scheduling problem that is preventing the latch holder from releasing the latch quickly enough.
3.3.8 DLM Latches
Instance locks are used for inter-instance locking and communication between the instances of an Oracle parallel server database. A separate part of the SGA contains the structures needed for instance locks. A set of latches is used to protect these structures. In release 8.0, the latching statistics for these latches were reported separately in V$DLM_LATCH . From release 8.1, the Distributed Lock Manager (DLM) latching statistics have been merged into V$LATCH.
LMON performs latch cleanup for DLM latches in cooperation with PMON.
3.4 Advanced Latching Control
Some operating systems support a facility called multi-processing control. This enables an authorized user process to influence its CPU scheduling in a variety of ways. Where available, Oracle can use certain multi-processing control features. The following features affect the latching mechanism.
3.4.1 Preemption Control
Preemption control enables Oracle to suspend the operation of the normal operating system process preemption mechanism during performance-critical operations—in particular, when holding a latch. This means that the Oracle process can continue to run on its CPU until it explicitly enables preemption again, or until it blocks on an operating system event such as an I/O request, semaphore operation, or page fault. The process will not be pre-empted at the end of its time-slice by a higher priority process of the time-sharing priority class. This means that operations protected by latches complete as quickly as possible, and so the risk of latch contention is greatly reduced. If preemption control is available to Oracle, it is used by default unless disabled using the _NO_PREEMPT parameter.
3.4.2 CPU Yielding
CPU yielding enables Oracle processes to offer to yield the CPU during a spin. If there is another runnable process of higher priority able to use the CPU, that process is scheduled, and the yielding process is placed at the end of its run queue, but it remains runnable. Otherwise, if there are no other higher-priority processes able to use the CPU, then the process will continue to spin for its latch. The frequency with which Oracle will offer to yield the CPU while spinning is controlled by the _SPIN_YIELD_CPU_FREQ parameter, which defaults to the default value of the _SPIN_COUNT parameter. If CPU yielding is available, and if these two parameters have the same value, the effect is that the process will begin a new spin without sleeping if there is no other process available to use the CPU. Thus, CPU yielding enables Oracle processes to obtain latches as quickly as possible without consuming otherwise usable CPU time.
3.4.3 Affinity Control
Affinity control enables Oracle processes to disable and re-enable the normal operating system affinity mechanism which attempts to weakly bind a process to the last CPU it ran on. If a process runs on the same CPU as before, many of the memory address and value pairs (cache lines) required for its execution may still be available in that CPU's cache. This can result in greatly reduced memory access by that CPU, and thus much faster execution. However, faster execution is not necessary when all the process is doing is spinning for a latch, and faster execution is less important than earlier execution when the process has been sleeping holding a latch that other processes may need. Where it is available, Oracle uses affinity control to optimize latching automatically. Incidentally, it is not recommended to use explicit processor binding for Oracle processes. Otherwise, runnable processes will not be migrated to idle CPUs.
Oracle can use multi-processing control features to improve the performance of large, highly active instances significantly, and the biggest impact is in the area of latching. However, under many operating systems some or all of these features are not available, or are not available to the processes of ordinary users such as "oracle." Where these features are available, the "oracle" user must be specifically authorized to use them. In some cases, such authorizations are not persistent, and so the authorization commands must be placed in the system startup scripts to ensure that Oracle will always be able to use these features. Check your operating system documentation for an mpctl ( ) system call and related entries to determine whether your operating system supports multi-processing control features for ordinary user processes, and if so, how to enable them.
3.5 Reference
This section contains a quick reference to the parameters, statistics, waits, and APT scripts mentioned in Chapter 3.
3.5.1 Parameters
Parameter Description
_LATCH_WAIT_POSTING Latch wait posting is a mechanism whereby a process can be woken (posted) when the latch that it requires becomes available. If this parameter is set to 0, latch wait posting is disabled. If this parameter is set to 1 (the default), latch wait posting is enabled for the library cache and shared pool latches only. Any other setting results in latch wait posting being enabled for all latches.
_MAX_EXPONENTIAL_SLEEP Consecutive sleeps during a single attempt to acquire a latch become progressively longer, under an exponential backoff algorithm, up to the limit specified by this parameter. Defaults to 200 centiseconds in Oracle8.
_MAX_SLEEP_HOLDING_LATCH The maximum sleep allowed under the exponential backoff algorithm when the sleeping process is holding another latch. Defaults to 4 centiseconds.
_NO_PREEMPT If this parameter is set to TRUE (the default) Oracle will use the operating system's preemption control mechanism, if available, to minimize the risk of processes sleeping while holding a latch.
The number of iterations to perform before sleeping when spinning to acquire a latch. Defaults
_SPIN_COUNT
to 1 on single CPU systems, and 2000 on multi-
processor machines. This parameter controls the frequency with which an Oracle process will offer to yield the CPU if possible during a spin. If a higher-priority process is runnable, it will be scheduled, and the yielding
_SPIN_YIELD_CPU_FREQ
process will be placed at the end of the run queue without sleeping. Defaults to the default value of _SPIN_COUNT. If _SPIN_COUNT is tuned, this parameter should normally be tuned as well.
3.5.2 Statistics 3.5.3 Waits
Statistic Source Description
immediate gets V$LATCH family Successful latch get requests in no-wait mode
immediate misses V$LATCH family Latch get requests in no-wait mode that failed
gets V$LATCH family Completed willing-to-wait latch acquisitions
misses V$LATCH family Gets that waited because the latch was in use
simple gets gets - misses Gets completed without waiting at all
spin gets V$LATCH family Gets that obtained the latch by spinning, but did not sleep
sleep gets misses - spin gets Gets that required one or more sleeps
spin get rate 100 * spin gets misses / A measure of the effectiveness of spinning
spin cost _SPIN_COUNT sleeps / misses * A measure of the cost of spinning
sleeps V$LATCH family Total number of times that processes slept while waiting for the latch
sleep1 V$LATCH family Gets that slept once
sleep2 V$LATCH family Gets that slept twice
sleep3 V$LATCH family Gets that slept three times
sleep4 V$LATCH family Gets that slept four times
sleep rate 100 * sleeps / gets A measure of the severity of contention for the latch
sleep impact sleeps2 / sleep gets An estimate of the relative impact of latch
sleeps on overall performance
waiters woken V$LATCH family The number of times that waiters were posted due to latch wait posting
waits holding latch V$LATCH family The number of times that a process waited on any event wait while holding the latch
Event Description
latch activity A process that has repeatedly failed to acquire a latch will perform a latch activity test to check whether latch cleanup may be necessary. This wait occurs both during the activity test and while waiting for latch cleanup if necessary.
latch free Latch free waits are just sleeps by another name.
wait for DLM latch This wait corresponds to latch free waits, but for DLM latches.
wait for influx DLM latch The DLM latch needed latch recovery.
3.5.4 APT Scripts
Script Description
create_xviews.sql Some APT scripts are based on the X$ tables. Before those scripts can be used, this script must be run as SYS to create the required views on the X$ tables.
latch_gets.sql Shows the breakdown of willing-to-wait gets into simple gets, spin gets, and sleep gets.
latch_levels.sql Like latch_types.sql, but shows the level for each latch type.
latch_sleeps.sql Shows the sleep rate and impact for latch sleeps. Used to determine the priority of latch tuning issues.
latch_spins.sql Shows the number of spin gets and sleep gets and calculates the spin hit rate for each latch and for all latches.
latch_types.sql Shows all latch types ordered by number, whether they are solitary latches or parent/child sets, and how many children there are. For elegance and performance, this script is based directly on X$KSLLT.
latch_where.sql Shows where in the Oracle server code latch gets have been failing. This code is based directly on the X$ tables in order to access a column not projected by the V$LATCH_MISSES view.
tune_spin_count.sql Used to alter the spin count and then monitor spin statistics for
from:internet