Given our basic spin lock, we can now evaluate how effective it is along our previously described
axes. The most important aspect of a lock is correctness: does it provide mutual exclusion ? The
answer here is yes: the spin lock only allows a single thread to enter critical section at a time.
Thus, we have a correct lock.
The next axis is fairness. How fair is a spin lock to a waiting thread? Can you guarantee that a
waiting thread will ever enter a critical section? The answer here, unfortunately, is bad news:
spin locks don't provide any fairness guarantees. Indeed, a thread spinning may spin forever,
under contention. Spin locks are not fair and may lead to starvation.
The final axis is performance. What are the costs of using a spin lock ? To analyze this more
carefully, we suggest thinking about a few different cases. In the first, imagine threads competing
for the lock on a single processor; in the second, consider the threads as spread out across many
processors.
For spin locks, in a single CPU case, performance overheads can be quite painful; imagine the case
where the thread holding the lock is preempted within a critical section. The scheduler might then
run every other thread (imagine there are N - 1 threads), each of which tries to acquire the lock.
In this case, each of those threads will spin for the duration of a time slice before giving the CPU,
a waste of CPU cycles.
However, on multiple CPUs, spin locks work reasonably well (if the number of threads roughly
equals the number of CPUs). The thinking goes as follows: imagine Thread A on CPU1 and Thread
B on CPU2, both contending for a lock. If Thread A (CPU 1) grabs the lock, and then Thread B
tries to, B will spin (on CPU 2). However, presumably the critical section is short, and thus soon
the lock becomes available, and is acquired by Thread B. Spinning to wait for a lock held on
another processor doesn't waste many cycles in this case, and thus can be effective.
Compare-And-Swap
Another hardware primitive that some systems provide is known as the compare-and-swap
instruction as it is called on SPARC, for example, or compare-and-exchange as it is called on x86.
The C pseudocode for this single instruction is found in Figure 28.4.
int CompareAndSwap(int* ptr, int expected, int new) {
int actual = *ptr;
if (actual == expected) {
*ptr = new;
}
return actual;
}
The basic idea is for compare-and-swap to test whether the value at the address specified by ptr
is equal to expected; if so, update the memory location pointed to by ptr with the new value. If
not, do nothing. In either case, return the actual value at that memory location, thus allowing the
code calling compare-and-swap to know whether it succeed or not.
With the compare-and-swap instruction, we can build a lock in a manner quite similar to that
with test-and-set. For example, we would just replace the lock() routine above with the following:
void lock(lock_t* lock) {
while (CompareAndSwap(&lock->flag, 0, 1) == 1)
;
}
The rest of code is the same as the test-and-set example above. This code works quite similarly;
it simply checks if the flag is 0 and if so, atomically swaps in a 1 thus acquiring the lock. Threads
that try to acquire the lock while it is held will get stuck spinning until the lock is finally released.
If you want to see how to really make a C-callable x86-version of compare-and-swap, this code
sequence might be useful:
char CompareAndSwap(int* ptr, int old, int new) {
unsigned char ret;
__asm__ __volatile__ (
" lock\n"
" cmpxchg1 %2, %1\n"
" sete %0\n"
: "=q" (ret), "=m" (*ptr)
: "r" (new), "m" (*ptr), "a" (old)
: "memory");
return ret;
}
Finally, as you may have sensed, compare-and-swap is a more powerful instruction than test-and-
set. We will make some use of this power in the future when we briefly delve into wait-free
synchronization. However, if we just build a simple spin lock with it, its behavior is identical to the
spin lock we analyzed above.