Synchronization under the hood, and why Java 5 improves it

On the previous page, we saw that the synchronized keyword provides some benefits, including simple implementation for the programmer without getting bogged down in how the JVM actually implements synchronization. But we saw that this "black box" approach has some potential disadvantages. To understand when and how Java 5 improves this situation, we need to delve "under the hood" of synchronization for a minute.

Let's consider the following program, which maintains a thread-safe counter:

public class Counter {
  private int count;
  public synchronized int getCount()        { return count; }
  public synchronized void incrementCount() { count++; }

Now, let's consider what a JVM implementation might have to do under the hood when we enter one of the synchronized methods such as incrementCount(). Recall that entering and exiting a synchronized block actually means (among other things) acquiring and releasing a lock on the object being synchronized on (in this case, the Counter instance that the method is being called on). For every Java object, the JVM must therefore hold information at least on which thread (if any) currently has access to the lock and, how many times that thread has acquired the lock. We'll see in a moment that it in some cases it may need to hold a pointer to an operating system lock object plus potentially other information. The JVM needs to:

  • check the variable telling it which thread owns the lock on the object;
  • if no thread owns the lock, mark the variable as being the current thread's ID, and set the lock count to one;
  • if the current thread owns the lock, increase the lock count;
  • if a different thread owns the lock, we can't proceed for now and need to somehow wait for the lock to become available. Many operating systems provide a "native" means to wait for other threads via low-level lock objects, so a possible implementation is to create/wait for one of these.

All of the above actions must occur atomically. It's no good if we read the thread owner and see that no thread owns the lock, but before we have chance to set our thread ID and update the lock count, another thread intervenes, reads the zero thread owner, and also thinks it can take the lock. So at a very low level, we need to synchronize on this "lock housekeeping" data.

On the next page, we look at this low-level means of synchronization.

(Continued from our discussion of Java synchronization under the hood.)

To provide low-level synchronization, most modern CPUs offer an instruction (or various instructions) which basically read and write to the same memory location as a single, uninterruptable operation. In other words, you first read what the value of the memory location is. Then, you calculate what you want the updated value to be. This is essentially the moment at which another process could "sneak in" in the meantime. Then, you invoke the atomic read-write operation, which sets the value to what you want it to be and tells you want the value was at that instant before the new one was written. If this previous value isn't what you expect it to be (i.e. not equal to the value you read a moment ago), then you know another process has "stepped in" and you need to repeat the operation. A generic term used for the atomic read-write operation is Compare-And-Swap (CAS) or, in the case of a variant we'll assume here, Compare-And-Set (CAS). In the Compare-And-Set variant, we tell the processor what we think the value of the given memory location should be and the new value to write if and only if the previous value was indeed what we said we expected it to be. The result of the instruction is a boolean indicating whether or not the processor could indeed write our new value (i.e. if our expected value was indeed the value prior to writing). To illustrate this, let's assume that our block of "lock housekeeping" data consists of three 32-bit words in memory and looks something like this1:

WORD 0 : Owning thread ID
WORD 1 : Lock count
WORD 2 : Operating system lock object ID

Now, our logic for accessing the lock data can go something like this:

while (not done) {
  read previous thread ID;
  if (previus thread ID == my thread ID) {
    lock count++;
    done = true;
  } else if (previous thread ID == 0) {
    // No previous thread owner, so atomically set us as the owner
    done = CAS(WORD-0, 0, my thread ID);
    lock count = 1;
  } else {
    // Another thread already has the lock; somehow wait for it to be released

The important lines are in bold. What we essentially do is: (a) read the previous thread owner ID from Word 0 and check that it is zero, meaning no other thread currently has the lock; (b) use the CAS instruction to say "write my thread ID to Word 0 if and only if the previous value was zero, and tell me if it was written"; (c) if the CAS instruction tells us that it did go ahead and write it, because the previous value was still zero, then we have the lock; (d) if not, that means another thread "snuck in" and set its thread ID, and we have to loop round again to read what the new value of thread ID is. If the previous thread ID is not zero but is not our thread ID, then somebody else already has the lock and we have to wait for it to release it.

To wait for the lock to be released, one option is to sleep for a while and then try again. Often, we can actually ask the operating system to help us with this task. On mnay systems, we can create an operating system monitor object and ask the OS to tell us when another thread (which in our case would be the thread that released the synchronization lock at the end of its synchronized block) sends a "notify" message to it. We'd only want to create at most one OS monitor object per Java object, even though multiple threads might be trying to wait for the lock. And to keep resources down, we'd probably only want to create the OS monitor object when it was actually needed. Another CAS operation can be used for this, so that a simple implementation of "wait for the lock" could go something like this:

while (not created_os_monitor) {
  os_monitor_object = WORD 2;
  if (os_monitor_object == NULL) {
    os_monitor_object = new OSMonitor();
    created_os_monitor = CAS(WORD-2, NULL, os_monitor_object);
    if (!created_os_monitor) {
      // If CAS fails, somebody else has snuck in and created the monitor,
      // so delete the one we've just created.
      delete os_monitor_object;
  ask OS to wait for a "notify" message to os_monitor_object;

Again, the CAS operation in bold sits inside a loop. In the unlikely event that the method fails because another thread has snuck in, the CAS returnsfalse and we go round the loop again and pick up the reference to the OSMonitor object created by the other thread. If this happened, there'd be a slight inefficiency because our thread would create an OSMonitor which it would then immediately discard. But we live with this inefficiency because we think it's unlikely to occur, and because the important condition of keeping on to a maximum of one OSMonitor object per Java object would still hold. The code to deal with leaving the synchronized block would, as well as setting the owning thread ID to zero when the lock count reached zero, have to check for the presence of an "OS monitor object" and, if one existed, ask the OS to send a "notify" message to any waiter.

Note that in these cases, we know that going round the loop again immediately is OK because we know that the next time round we're very likely to succeed– or at least, very likely not to loop again. There was probably only one other thread that snuck in, and since the value has changed, it has finished its "sneaking in". We're generally not going to sit in the loop burning CPU: if the CAS isn't successful the first time, then it either will be the second time, or else we'll have to take some other action anyway.

Contrast this with what happens if we actually do have to wait for the lock. If the JVM knew for certain that the thread that had the lock was currently running on another processor and about to release the lock, then the same "loop round and try again" strategy would probably be the most efficient. But it can't generally make this kind of assumption– or at least, not without wasting time deciding1. So if we just spin in a loop waiting for the lock, there is a risk that we'll burn quite a lot of CPU while waiting for the other thread to release the synchronization lock.

This is why the most general case is to use the operating system facilities to wait for a "notify". That way, we don't burn CPU. But on the downside, if the operation being performed by the other thread is trivial, chances are we'll wait much longer than strictly necessary. Depending on how the operting system's thread scheduling works, it is likely to mean suspending our thread until at least the next interrupt (and in fact, even if the thread doesn'tneed to be suspended, the thread can still be penalised)2. So even though the Java code inside the competing synchronized block is simply reading or updating a single integer variable which could involve nothing more than a couple of machine code instructions, our thread's going to have to wait several milliseconds for it. And all this is without considering that, whether we have to wait or not, we have to synchronize local copies of variables with main memory at the point of acquiring and releasing the lock. In particular, we have no way of saying to the JVM "I'm only going to change the variable count, so this is the only one you need to refresh to/from your caches, and you can still re-order access to other variables for the sake of optimisation".

On the next page, we see how Java 5 improves this situation by exposing CAS operations to the Java programmer.

1. Note that this is a purely hypothetical structure and probably different from the way any specific JVM implements the lock housekeeping data. For example, Hotspot actually combines flags for locking and flags for garbage collection into the data accessed via CAS and rather than using CAS to swap a thread ID, actually swaps the pointer to lock data. Our general description of how synchronization works still generally holds, however.
2. I'm actually painting a slightly pesimistic picture here. In some cases, a JVM can use certain heuristics to make quick decisions between spinning round the loop and actually waiting (suspending the thread). For example, a lock implementation could decide to "spin up to 3 times then wait". Depending on the OS, it may be able to "spin if and only if the other thread is actually running". And it may be able to check if the other thread is waiting for I/O (a slow operation, in which case there's little point in spinning). Improvement to synchronization algorithms has been a key area of research over the last few years, and some progress has been made in fine-tuning these kinds of heuristics. Nonetheless, the point is that, if the JVM has to make a decision, it always wastes a little bit of time doing so, and always risks making an inappropriate one.
3. Thread scheduling generally works by running in a software interrupt. Every interrupt period– defined by the processor and typically around 10 or 15 milliseconds– the thread scheduling code looks at what processes are running and "re-jigs" them to share out the available CPUs over time. If a thread enters the "wait" state, it won't have an opportunity to be considered for running again until at least the next interrupt. So on such as system, calling wait means that in the worst case we'll wait for nearly 10 milliseconds, and in the average case around 5. Note too that when a thread enters the wait state, the scheduler generally has to make an approximation of how much actual CPU time the thread used during that interrupt period before waiting. A thread that uses a tiny amount of CPU and then waits will get "overcharged" for CPU time (on Windows, for example, a thread is "charged" one third of an interrupt period for calling the wait function).

Article written by Neil Coffey (@BitterCoffey).

(Continued from our discussion of Java synchronization under the hood.)

Exposure of atomic instructions in Java 5

Having worked through this example, we can see that the synchronized block involves an awful lot of baggage just to increment an integer variable. And paradoxically, the underlying implementation actually uses a machine code instruction designed to atomically update an integer variable! Rather than using CAS instructions around a whole load of other lock housekeeping tasks, wouldn't it be great if we could just use a CAS instruction to update the count variable? That way we do away with having to waste memory and time on extra housekeeping variables and with having to ensure memory access conditions on variables that are never affected by trhe critical block of code. In the worst case, our code just loops a couple of times rather than being context-switched out for several milliseconds to wait for a simple variable increment. Java 5 effectively allows this.

The big synchronization breakthrough in Java 5 is that it effectively exposes atomic instructions such as CAS to the Java programmer. The java.util.concurrent package contains the AtomicInteger class (and similar classes for other data types) allowing us to atomically compare-and-set an integer (and variants such as increment-and-get, set-and-get etc). Classes such as AtomicInteger are essentially wrappers around atomic machine code instructions such as CAS.

For a simple counter class such as the one above, we can pretty much use AtomicInteger as a drop-in replacement. But what's even more interesting is that Java 5 includes a whole host of other synchronization and concurrency classes already built around this new atomic functionality.

On the next page, we start our of Java 5 concurrency features by looking at the atomic classes in Java 5.

Tightening up of volatile and final

The introduction of non-blocking atomic variable access is a key part of the Java 5 concurrency improvements. Another improvement which is easy to miss is that the definitions of volatile and final have been tightened up slightly in order to allow them to be used with a couple of common programming idioms: lazy initialisation and immutable objects. As of Java 5:

  • Access to volatile variables have the same memory synchronization and ordering behaviour as synchronized blocks.
  • If an instance variable is declared final and its value set in the constructor, that set value is guaranteed to be seen by another thread as soon as it can see the object that holds that variable. This means that we don't need synchronization to access an object for which all fields are final.

We'll look at what this means in more detail below, but first let's get back to the atomic classes.

Article written by Neil Coffey (@BitterCoffey).


  • 0
  • 0
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


