Recently, we looked at different ways to implement a thread-safe “lazily initializing” singleton in Java.
The simplest way approach is with ‘synchronized’ — but there were several other suggestions, some right, and some not-so-right (double checked locking).
One approach however, was 25 times faster..
Lazy Singleton
Our basic requirement here is for a singleton within the application (perhaps some service) to be lazily initialized. Generally this could be useful for services which are costly to setup, and only needed sometimes.
We’re not concerned with setup costs here — that’s specific to the service. What we are interested in, is performance cost to access that singleton (ie, get it in a thread-safe way) once it has been setup.
A Variety of Approaches
This task, while fundamentally simple, is interesting in the number of different ways of doing it.
My first approach was that simple is good — use the feature provided by the language, and synchronize on the method. Other people contributed a variety of different approaches.. not all of them necessarily ideal or correct!
‘synchronized’ method
AtomicReference fast-path before a ‘synchronized’ section
AtomicReference with a spinlock
double-checked locking (not reliable in Java)
Using a simple ‘synchronized’ method is obviously simplest, and gives good performance.
AtomicReference is technically simpler & faster than a ‘synchronized’ operation, so can potentially offer some performance benefit — at the cost of complexity. Spinlocks should almost certainly be avoided, though.
Double-checked locking is unsafe in Java, due to constructor code potentially not having been executed when references are stored. JVM optimizations & code reordering are specified to respect “synchronized” boundaries, but double-checked locking skips these. Not recommended.
The ‘Inner Class’ Approach
In Java, class initialization is ‘on-demand’ & performed the first time the class is used. Normally, this underlying behavior is of little interest.. But can we use it?
The approach here is to create a ‘holder’ as an inner class, which will statically initialize the singleton.
This pattern is known as the “initialization-on-demand holder” idiom:
public class Example {
private static class StaticHolder {
static final MySingleton INSTANCE = new MySingleton();
}
public static MySingleton getSingleton() {
return StaticHolder.INSTANCE;
}
}
Calling getSingleton() references the inner class, triggering the JVM to load & initialize it. This is thread-safe, since classloading uses locks.
For subsequent calls, the JVM resolves our already-loaded inner class & returns the existing singleton. Thus — a cache.
And thanks to the magic of JVM optimizations, a very very efficient one.
Performance
Performance benchmarking in Java is a difficult area — requiring warmup, stable conditions, and care to avoid JIT optimizing the benchmark away in its entirety.
For this benchmark, we used 20,000 loops of warmup and measured 10 million loops. To prevent our test code from being optimized away, we used our returned singletons (by summing their hash-codes).
The figures:
Technical Approach
Total Time
Per Iteration
‘synchronized’ method
858 ms
85.8 nanoseconds
inner-class static init
33.4 ms
3.34 nanoseconds
This is over 25 times faster!
Thanks to the JVM, the inner-class reference, class-loading & thread-safety are all JIT’d away. All that is left for the CPU to execute, is essentially a memory read from the static field.
Since we could not measure the ‘read’ on its own, our benchmark includes one addition (a fast integer arithmetic operation) & a loop test.
On our 2.4 GHz test CPU, the ‘synchronized’ method — without thread contention — required 206 cycles. By comparison, a loop iteration & ‘inner class’ singleton can be accessed in just 8 CPU cycles.
This pattern is singleton-specific, and not really helpful for a map-based cache. But for singleton services — is it fast, or what! Kudos to the JVM developers & those who came up with this technique.
What do you think of this approach? Add your comment now.