When and how to use a ThreadLocal

As our readers might already have guessed, I deal with memory leaks on a daily basis. A particular type of the OutOfMemoryError messages has recently started catching my attention – the issues triggered by misused ThreadLocals have become more and more frequent. Looking at the causes for such leakages, I am starting to believe that more than half of those are caused by developers who either have no clue what they are doing or who are trying to apply a solution to the problems which it is not meant to solve.

Instead of grinding my teeth, I decided to open up the topic by publishing two articles, first of which you are currently reading. In the post I explain the motivation behindThreadLocal usage. In the second post currently in progress I will open up theThreadLocal bonnet and look at the implementation.

Let us start with an imaginary scenario in which ThreadLocal usage is indeed reasonable. For this, say hello to our hypothetical developer, named Tim. Tim is developing a webapp, in which there is a lot of localized content. For example a user from California would expect to be greeted with date formatted using a familiar MM/dd/yy pattern, one from Estonia on the other hand would like to see a date formatted according to dd.MM.yyyy. So Tim starts writing code like this:

1 public String formatCurrentDate() {
2         DateFormat df = new SimpleDateFormat("MM/dd/yy");
3         return df.format(new Date());
4     }
5  
6     public String formatFirstOfJanyary1970() {
7         DateFormat df = new SimpleDateFormat("MM/dd/yy");
8         return df.format(new Date(0));
9     }

After a while, Tim finds this to be boring and against good practices – the application code is polluted with such initializations. So he makes a seemingly reasonable move by extracting the DateFormat to an instance variable. After making the move, his code now looks like the following:

1 private DateFormat df = new SimpleDateFormat("MM/dd/yy");
2  
3     public String formatCurrentDate() {
4         return df.format(new Date());
5     }
6  
7     public String formatFirstOfJanyary1970() {
8         return df.format(new Date(0));
9     }

Happy with the refactoring results, Tim tosses an imaginary high five to himself, pushes the change to the repository and walks home. Few days later the users start complaining – some of them seem to get completely garbled strings instead of the former nicely formatted dates.

Investigating the issue Tim discovers that the DateFormat implementation is not thread safe. Meaning that in the scenario above, if two threads simultaneously use the formatCurrentDate() and formatFirstOfJanyary1970() methods, there is a chance that the state gets mangled and displayed result could be messed up. So Tim fixes the issue by limiting the access to the methods to make sure one thread at a time is entering at the formatting functionality. Now his code looks like the following:

1 private DateFormat df = new SimpleDateFormat("MM/dd/yy");
2  
3     public synchronized String formatCurrentDate() {
4         return df.format(new Date());
5     }
6  
7     public synchronized String formatFirstOfJanyary1970() {
8         return df.format(new Date(0));
9     }

After giving himself another virtual high five, Tim commits the change and goes to a long-overdue vacation. Only to start receiving phone calls next day complaining that the throughput of the application has dramatically fallen. Digging into the issue he finds out that synchronizing the access has created an unexpected bottleneck in the application. Instead of entering the formatting sections as they pleased, threads now have to wait behind one another.

Reading further about the issue Tim discovers a different type of variables called ThreadLocal. These variables differ from their normal counterparts in that each thread that accesses one (via ThreadLocal’s get or set method) has its own, independently initialized copy of the variable. Happy with the newly discovered concept, Tim once again rewrites the code:

01 public static ThreadLocal df = new ThreadLocal() {
02         protected DateFormat initialValue() {
03             return new SimpleDateFormat("MM/dd/yy");
04         }
05     };
06  
07     public String formatCurrentDate() {
08         return df.get().format(new Date());
09     }
10  
11     public String formatFirstOfJanyary1970() {
12         return df.get().format(new Date(0));
13     }

Going through a process like this, Tim has through painful lessons learned a powerful concept. Applied like in the last example, the result serves as a good example about the benefits.

But the newly-found concept is a dangerous one. If Tim had used one of the application classes instead of the JDK bundled DateFormat classes loaded by the bootstrap classloader, we are already in the danger zone. Just forgetting to remove it after the task at hand is completed, a copy of that Object will remain with the Thread, which tends to belong to a thread pool. Since lifespan of the pooled Thread surpasses that of the application, it will prevent the object and thus a ClassLoader being responsible for loading the application from being garbage collected. And we have created a leak, which has a chance to surface in a good old java.lang.OutOfMemoryError: PermGen space form

Another way to start abusing the concept is via using the ThreadLocal as a hack for getting a global context within your application. Going down this rabbit hole is a sure way to mangle your application code with all kind of unimaginary dependencies coupling your whole code base into an unmaintainable mess.

This is a follow-up to my last week post, where I explained the motivation behind ThreadLocal usage. From the post we could recall that ThreadLocal is indeed a cool concept if you wish to have an independently initialized copy of a variable for each thread. Now, the curious ones might have already started asking “how could I implement such a concept in Java”?

Or you might feel that it will not be interesting topic – after all, all you need in here is a Map, isn’t it? When dealing with a ThreadLocal<T> it seems to make all the sense in the world to implement the solution as HashMap<Thread,T> withThread.currentThread() as the key. Actually it is not that simple. So if you have five minutes, bear with me and I will guide you through a beautiful design concept.

First obvious problem with the simple HashMap solution is the thread-safety. As HashMap is not built to support concurrent usage, we cannot safely use the implementation in the multi-threaded environment. Fortunately we do not need to look far for the fix – theConcurrentHashMap<Thread, T> looks like a match made in heaven. Full concurrency of retrievals and adjustable expected concurrency for updates is exactly what we need in the first place.

Now, if you would apply a solution based on the ConcurrentHashMap to the ThreadLocal implementation in the JDK source you would have introduced two serious problems.

  • First and foremost, you are having Threads as keys in the Map structure. As the map is never garbage collected, you end up keeping a reference to the Thread forever, blocking the thread from being GCd. Unwillingly you have created a massive memory leak in the design.
  • Second problem might take longer to surface, but even with the clever segmentation under the hood reducing the chance of lock contention, ConcurrentHashMap still bears a synchronization overhead. With the synchronization requirement still in place you still have a structure which is a potential source for the bottleneck.

But let us start solving the biggest issue first. Our data structure needs to allow threads to be garbage collected if our reference is the last one pointing to a thread in question. Again, the first possible solution is staring right at us – instead of our usual references to the object, why not use WeakReferences instead? So the implementation would now look similar to the following:

1 Collections.synchronizedMap(new WeakHashMap<Thread, T>())

Now we have gotten rid of the leakage issue – if nobody besides us is referring to the Thread, it can be finalized and garbage collected. But we still have not sorted out the concurrency issues. The solution to this is now really a sample about thinking outside of the box. So far we have thought about the ThreadLocal variables as Threads mapping to the variables. But what if we reverse the thinking and instead envision a solution as a mapping of ThreadLocal objects to values in each Thread? If each thread stores the mapping, andThreadLocal is just an interface into that mapping, we can avoid the synchronization issues. Better yet, we are also escaping the problems with GC!

And indeed, when we open up the source code of ThreadLocal and Thread classes we see that this is exactly how the solution is actually implemented in JDK:

1 public class Thread implements Runnable {
2     ThreadLocal.ThreadLocalMap threadLocals = null;
3     // cut for brevity
4 }
01 public class ThreadLocal<T> {
02     static class ThreadLocalMap {
03         // cut for brevity
04     }
05  
06     ThreadLocalMap getMap(Thread t) {
07         return t.threadLocals;
08     }
09  
10     public T get() {
11         Thread t = Thread.currentThread();
12         ThreadLocalMap map = getMap(t);
13         if (map != null) {
14             ThreadLocalMap.Entry e = map.getEntry(this);
15             if (e != null)
16                 return (T) e.value;
17         }
18         return setInitialValue();
19     }
20  
21     private T setInitialValue() {
22         T value = initialValue();
23         Thread t = Thread.currentThread();
24         ThreadLocalMap map = getMap(t);
25         if (map != null)
26             map.set(this, value);
27         else
28             createMap(t, value);
29         return value;
30     }
31     // cut for brevity
32 }

So here we have it. Thread class keeps a reference to a ThreadLocal.ThreadLocalMap instance, which is built using weak references to the keys. Building the structure in a reverse manner we have avoided thread contention issues altogether as our ThreadLocal can only access the value in the current thread. Also, when the Thread has finished the work, the map can garbage collected, so we have also avoided the memory leak issue.

I hope you felt enlightened when looking into the design, as it is indeed an elegant solution to a complex problem. I do feel that reading source code is a perfect way to learn about new concepts. And if you are a Java developer – what could be a better place to get the knowledge than reading Joshua Bloch and Doug Lea source code integrated to the JDK?


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值