目录
6.ThreadLocal与FastThreadLocal性能比较
1.ThreadLocal原理
ThreadLocal是用在多线程中,用于保存当前线程的上下文信息。在任意需要的地方都可以获取,在不同的线程中,通过同一个ThreadLocal获取到不同的对象。
其原理如图:
ThreadLocal的实现原理:在每个线程中使用ThreadLocalMap将键值对<ThreadLocal,Object>保存在使用线性探测法实现的hash表中(HashMap是链接法实现的hash表)。实现代码不做具体阐述。
2.ThreadLocal内存泄漏
ThreadLocalMap中,Entry继承WeakReference<ThreadLocal>,并且Entry中没有保存key而是使用WeakReference中的成员referent保存ThreadLocal,作为Entry的key。
static class Entry extends WeakReference<ThreadLocal<?>> {
/** The value associated with this ThreadLocal. */
Object value;
Entry(ThreadLocal<?> k, Object v) {
super(k);
value = v;
}
}
因此,当初始化ThreadLocal的外部强引用被清空后,Entry中的referent将会在下次JVM垃圾回收时被回收。因而ThreadLocalMap中将出现一个key为null的Entry。这些null key存在一条强引用链的关系一直存在:Thread --> ThreadLocalMap-->Entry-->Value,这条强引用链会导致Entry不会回收,Value也不会回收,但Entry中的Key却已经被回收的情况,造成内存泄漏。
但是JVM团队已经考虑到这样的情况,并做了一些措施来保证ThreadLocal尽量不会内存泄漏:在ThreadLocal的get()、set()、remove()方法调用的时候尝试清除掉线程ThreadLocalMap中部分Entry中Key为null的Value,并将整个Entry设置为null,利于下次内存回收。
public class ThreadLocalTest {
public static void main(String[] args) throws Exception {
threadLocalMemoryLeakTest();
}
public static void threadLocalMemoryLeakTest() throws Exception {
ThreadLocal<String> threadLocal = new ThreadLocal<>();
threadLocal.set("hello Thread local");
System.out.printf("%s:%s%n", threadLocal, threadLocal.get());
System.out.println("------------------begin------------------");
// 反射获取实现ThreadLocalMap的Hash表: Entry[]
Field field = Thread.class.getDeclaredField("threadLocals");
field.setAccessible(true);
Object threadLocalMap = field.get(Thread.currentThread());
Field entryTableField = threadLocalMap.getClass().getDeclaredField("table");
entryTableField.setAccessible(true);
Object table = entryTableField.get(threadLocalMap);
printEntryTable(table);
// 清除外部强引用
threadLocal = null;
// 触发垃圾回收
System.gc();
System.out.println("--------------------gc--------------------");
// 打印Hash表
printEntryTable(table);
}
private static void printEntryTable(Object table) throws NoSuchFieldException, IllegalAccessException {
if (table.getClass().isArray()) {
int length = Array.getLength(table);
Class<?> entryClass = table.getClass().getComponentType();
Class<?> referenceClass = entryClass.getSuperclass().getSuperclass();
Field keyField = referenceClass.getDeclaredField("referent");
Field valueField = entryClass.getDeclaredField("value");
keyField.setAccessible(true);
valueField.setAccessible(true);
for (int slot = 0; slot < length; slot++) {
Object entry = Array.get(table, slot);
if (entry == null) {
continue;
}
Object key = keyField.get(entry);
Object value = valueField.get(entry);
System.out.printf("[%2d]%s:%s%n", slot, key, value);
}
}
}
}
打印结果如下:
java.lang.ThreadLocal@28d93b30:hello Thread local
------------------begin------------------
[ 3]java.lang.ThreadLocal@28d93b30:hello Thread local
[ 5]java.lang.ThreadLocal@677327b6:[Ljava.lang.Object;@14ae5a5
[ 7]java.lang.ThreadLocal@7f31245a:java.lang.ref.SoftReference@6d6f6e28
[14]java.lang.ThreadLocal@135fbaa4:java.lang.ref.SoftReference@45ee12a7
--------------------gc--------------------
[ 3]null:hello Thread local
[ 5]java.lang.ThreadLocal@677327b6:[Ljava.lang.Object;@14ae5a5
[ 7]java.lang.ThreadLocal@7f31245a:java.lang.ref.SoftReference@6d6f6e28
[14]java.lang.ThreadLocal@135fbaa4:java.lang.ref.SoftReference@45ee12a7
通过对比对象地址,测试方法中新加的Threadlocal位于ThreadlocalMap中下标为3的槽位上,gc前Entry的key为java.lang.ThreadLocal@28d93b30,gc后变成null说明已经被垃圾回收了,但是Entry中的value任然存在,也就是存在内存泄漏了。
接下来,我们在上面的测试代码中加入以下部分测试代码:
System.out.println("-------------------set------------------");
threadLocal = new ThreadLocal<>();
threadLocal.set("hello java");
System.out.printf("%s:%s%n", threadLocal, threadLocal.get());
输出如下:
-------------------set------------------
java.lang.ThreadLocal@330bedb4:hello java
[ 3]null:hello Thread local
[ 5]java.lang.ThreadLocal@677327b6:[Ljava.lang.Object;@14ae5a5
[ 7]java.lang.ThreadLocal@7f31245a:java.lang.ref.SoftReference@6d6f6e28
[10]java.lang.ThreadLocal@330bedb4:hello java
[14]java.lang.ThreadLocal@135fbaa4:java.lang.ref.SoftReference@45ee12a7
新加入的ThreadLocalMap插入到了下标为10的位置了,并且之前垃圾回收的key为null的Entry仍然存在,并没有被清理掉。难道是ThreadLocal中清理失效的Entry的机制没生效吗?我们来看看代码:
private void set(ThreadLocal<?> key, Object value) {
Entry[] tab = table;
int len = tab.length;
int i = key.threadLocalHashCode & (len-1);
for (Entry e = tab[i];
e != null;
e = tab[i = nextIndex(i, len)]) {
ThreadLocal<?> k = e.get();
if (k == key) {
e.value = value;
return;
}
if (k == null) {
replaceStaleEntry(key, value, i);
return;
}
}
tab[i] = new Entry(key, value);
int sz = ++size;
// 尝试清理some槽位
if (!cleanSomeSlots(i, sz) && sz >= threshold)
rehash();
}
private boolean cleanSomeSlots(int i, int n) {
boolean removed = false;
Entry[] tab = table;
int len = tab.length;
do {
i = nextIndex(i, len);
Entry e = tab[i];
if (e != null && e.get() == null) {
n = len;
removed = true;
i = expungeStaleEntry(i);
}
} while ( (n >>>= 1) != 0);
return removed;
}
很显然,ThreadLocalMap的set方法中,最后只是清理一部分槽位,并没有全部清理。检测的槽位数量为。我们可以多插入几个对象试试。
System.out.println("-------------------set------------------");
for (int i = 0; i < 2; i++) {
threadLocal = new ThreadLocal<>();
threadLocal.set("hello java_" + i);
}
printEntryTable(table);
-------------------set------------------
[ 1]java.lang.ThreadLocal@330bedb4:hello java_1
[ 5]java.lang.ThreadLocal@677327b6:[Ljava.lang.Object;@14ae5a5
[ 7]java.lang.ThreadLocal@7f31245a:java.lang.ref.SoftReference@6d6f6e28
[10]java.lang.ThreadLocal@2503dbd3:hello java_0
[14]java.lang.ThreadLocal@135fbaa4:java.lang.ref.SoftReference@45ee12a7
插入第二个对象时,失效的Entry被清理掉了。
3.ThreadLocal最佳实践
private static ThreadLocal<String> threadLocal = new ThreadLocal<>();
public void func() {
threadLocal.set("hello");
try {
// do something
} finally {
threadLocal.remove();
}
}
1.Threadlocal成员变量推荐设置为静态变量static。假如设置为非静态变量,如果ThreadLocal所在的类实例了多个对象,那么同一个线程中该对象可能会存储不同的值,也就是说存储的值只在对象内部有效。
2.Threadlocal对象不再后,调用remove()回收掉,防止内存泄漏。
4.FastThreadLocal原理
Netty框架中自己实现了一个等价于ThreadLocal的类,即FastThreadLocal。顾名思义,就是使用的时候速度快,效率高。
其基本用法和ThreadLocal一样:
FastThreadLocal<String> ftl = new FastThreadLocal<>();
ftl.set("hello FastThreadLocal");
System.out.println(ftl.get());
private static final int variablesToRemoveIndex = InternalThreadLocalMap.nextVariableIndex();
private final int index;
public FastThreadLocal() {
index = InternalThreadLocalMap.nextVariableIndex();
}
public static int nextVariableIndex() {
int index = nextIndex.getAndIncrement();
if (index < 0) {
nextIndex.decrementAndGet();
throw new IllegalStateException("too many thread-local indexed variables");
}
return index;
}
FastThreadLocal构造函数中定义了一个final修饰的index,而且这个index是按序号递增的。下面我们来看一下set和get方法,通过这两个方法了解Index的作用。
public final void set(V value) {
if (value != InternalThreadLocalMap.UNSET) {
set(InternalThreadLocalMap.get(), value);
} else {
remove();
}
}
public final void set(InternalThreadLocalMap threadLocalMap, V value) {
if (value != InternalThreadLocalMap.UNSET) {
if (threadLocalMap.setIndexedVariable(index, value)) {
addToVariablesToRemove(threadLocalMap, this);
}
} else {
remove(threadLocalMap);
}
}
public boolean setIndexedVariable(int index, Object value) {
Object[] lookup = indexedVariables;
if (index < lookup.length) {
Object oldValue = lookup[index];
lookup[index] = value;
return oldValue == UNSET;
} else {
expandIndexedVariableTableAndSet(index, value);
return true;
}
}
set()方法中,首先判断插入的值是不是默认值,如果是默认值,默认为一个删除操作。如果不是,则做插入操作。最终set操作是由InternalThreadLocalMap实现的:以index作为下标,替换indexedVariables数组中的对象;如果下标越界,则先扩容然后再插入数组。
从此处我们可以看出,FastThreadLocal使用Object[]作为容器,初始化的时候初始化一个index作为下标存储FastThreadLocal对应的Object。以“数组+下标”的替代线性探测法的Hash表实现,去掉hash、线性探测以及定位槽位的过程,从而提升性能。
除了实现方法,还有两个细节值得注意:
- set()方法中,通过InternalThreadLocalMap.get()获取InternalThreadLocalMap对象。
- set()插入value后,FastThreadLocal调用了addToVariablesToRemove()方法。
下面分别看看这两个方法有什么特殊之处。
- 首先看InternalThreadLocalMap.get()方法:
public static InternalThreadLocalMap get() {
Thread thread = Thread.currentThread();
if (thread instanceof FastThreadLocalThread) {
return fastGet((FastThreadLocalThread) thread);
} else {
return slowGet();
}
}
private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
if (threadLocalMap == null) {
thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
}
return threadLocalMap;
}
private static InternalThreadLocalMap slowGet() {
ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap;
InternalThreadLocalMap ret = slowThreadLocalMap.get();
if (ret == null) {
ret = new InternalThreadLocalMap();
slowThreadLocalMap.set(ret);
}
return ret;
}
第一步首先判断当前线程是否为FastThreadLocalThread,如果是调用fastGet(),否则调用slowGet()。而FastThreadLocalThread中只包含一个InternalThreadLocalMap对象。
public class FastThreadLocalThread extends Thread {
private InternalThreadLocalMap threadLocalMap;
}
从前面的fastGet()和slowGet()方法中,可以看出来这两个方法的区别在于InternalThreadLocalMap的持有对象不同:FastThreadLocalThread中持有InternalThreadLocalMap对象,所以fastGet()直接从FastThreadLocalThread对象中获取;而一般线程没有持有InternalThreadLocalMap对象,所以是保存在当前线程的ThreadLocalMap中。而在ThreadLocalMap中保存InternalThreadLocalMap对象,并没有去除JDK的ThreadLocal所存在的问题,相反会使FastThreadLocal的实现更复杂,从而效率比ThreadLocal更低。所以Netty中的DefaultThreadFactory的newThread方法返回的都是FastThreadLocalThread。
- FastThreadLocal在set()插入value后,调用了addToVariablesToRemove()方法。我们看看它的实现:
private static void addToVariablesToRemove(InternalThreadLocalMap threadLocalMap, FastThreadLocal<?> variable) {
Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);
Set<FastThreadLocal<?>> variablesToRemove;
if (v == InternalThreadLocalMap.UNSET || v == null) {
variablesToRemove = Collections.newSetFromMap(new IdentityHashMap<FastThreadLocal<?>, Boolean>());
threadLocalMap.setIndexedVariable(variablesToRemoveIndex, variablesToRemove);
} else {
variablesToRemove = (Set<FastThreadLocal<?>>) v;
}
variablesToRemove.add(variable);
}
threadLocalMap从下标variablesToRemoveIndex(FastThreadLocal的静态变量,variablesToRemoveIndex值为0)获取一个对象,如果对象是null,就生成一个从IdentityHashMap转化的Set对象,用于保存所有threadLocalMap中的对象。
variablesToRemoveIndex下标中保存的threadLocalMap中所有插入的对象,在removeAll方法中即用此遍历清空所有对象。
public static void removeAll() {
InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.getIfSet();
if (threadLocalMap == null) {
return;
}
try {
Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);
if (v != null && v != InternalThreadLocalMap.UNSET) {
@SuppressWarnings("unchecked")
Set<FastThreadLocal<?>> variablesToRemove = (Set<FastThreadLocal<?>>) v;
FastThreadLocal<?>[] variablesToRemoveArray =
variablesToRemove.toArray(new FastThreadLocal[variablesToRemove.size()]);
for (FastThreadLocal<?> tlv: variablesToRemoveArray) {
tlv.remove(threadLocalMap);
}
}
} finally {
InternalThreadLocalMap.remove();
}
}
此处的variablesToRemoveIndex所维护的Set保存所有FastThreadLocal的操作,一开始不太理解这个机制的意图。按理说只用于遍历的话,使用threadLocalMap清空数组就完事了,为什么要在这里维护一个Set<FastThreadLocal>的容器呢?思考一翻后,应该只有一个解释了:事件监听。
public final void remove(InternalThreadLocalMap threadLocalMap) {
if (threadLocalMap == null) {
return;
}
Object v = threadLocalMap.removeIndexedVariable(index);
removeFromVariablesToRemove(threadLocalMap, this);
if (v != InternalThreadLocalMap.UNSET) {
try {
onRemoval((V) v);
} catch (Exception e) {
PlatformDependent.throwException(e);
}
}
}
当FastThreadLocal被删除时,会调用一个onRemoval()方法,其中内存池就用到了这个事件监听机制,用于释放线程缓存数据的操作。
final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache> {
@Override
protected void onRemoval(PoolThreadCache threadCache) {
threadCache.free();
}
// ...
}
5.FastThreadLocal最佳实践
前面说过了,使用FastThreadLocal时,必须使用FastThreadLocalThread线程。
Thread[] threads = new Thread[4];
DefaultThreadFactory f = new DefaultThreadFactory("FastThreadLocalThread-");
for (int i = 0; i < threads.length; i++) {
threads[i] = f.newThread(() -> {
// do something
});
}
for (Thread thread : threads) {
thread.start();
}
DefaultThreadFactory.newThread()方法创建的任务会将Runnable包装为DefaultRunnableDecorator,在任务执行完之后,会移除所有的FastThreadLocal对象。
private static final class DefaultRunnableDecorator implements Runnable {
private final Runnable r;
DefaultRunnableDecorator(Runnable r) {
this.r = r;
}
@Override
public void run() {
try {
r.run();
} finally {
FastThreadLocal.removeAll();
}
}
}
6.ThreadLocal与FastThreadLocal性能比较
下面给出一组测试,测试ThreadLocal与FastThreadLocal的读取性能:
public class ThreadLocalTest {
public static void main(String[] args) throws Exception {
threadLocalTest();
fastThreadLocalTest();
}
private static int THREAD_NUM = 2;
public static void fastThreadLocalTest() throws InterruptedException {
Thread[] threads = new Thread[THREAD_NUM];
DefaultThreadFactory f = new DefaultThreadFactory("FastThreadLocal-");
for (int i = 0; i < threads.length; i++) {
threads[i] = f.newThread(() -> {
FastThreadLocal<Long> ftl = new FastThreadLocal<>();
ftl.set(1L);
long sum = 0;
long start = System.nanoTime();
for (int j = 0; j < 1000000000; j++) {
sum = ftl.get();
}
ftl.remove();
long end = System.nanoTime();
System.out.printf("[%20s] sum:%s, cost time:%s ns%n", Thread.currentThread().getName(), sum, (end - start));
});
}
for (Thread thread : threads) {
thread.start();
}
for (Thread thread : threads) {
thread.join();
}
}
public static void threadLocalTest() throws InterruptedException {
Thread[] threads = new Thread[THREAD_NUM];
for (int i = 0; i < threads.length; i++) {
threads[i] = new Thread(() -> {
ThreadLocal<Long> tl = new ThreadLocal<>();
tl.set(1L);
long sum = 0;
long start = System.nanoTime();
for (int j = 0; j < 1000000000; j++) {
sum = tl.get();
}
tl.remove();
long end = System.nanoTime();
System.out.printf("[%20s] sum:%s, cost time:%s ns%n", Thread.currentThread().getName(), sum, (end - start));
}, "ThreadLocalThread-" + (i + 1));
}
for (Thread thread : threads) {
thread.start();
}
for (Thread thread : threads) {
thread.join();
}
}
}
输出:
[FastThreadLocal--1-1] sum:1, cost time:20086153 ns
[FastThreadLocal--1-2] sum:1, cost time:33817537 ns
[ ThreadLocalThread-2] sum:1, cost time:5866945917 ns
[ ThreadLocalThread-1] sum:1, cost time:5866982702 ns
FastThreadLocal的get性能比ThreadLocal的读性能提高两个数量级。
将上面循环读取的代码,改成如下测试写性能:
for (int j = 0; j < 1000000000; j++) {
tl.set((long)j);
}
sum = tl.get();
[FastThreadLocal--1-1] sum:999999999, cost time:10769466909 ns
[FastThreadLocal--1-2] sum:999999999, cost time:10783395903 ns
[ ThreadLocalThread-1] sum:999999999, cost time:22287503598 ns
[ ThreadLocalThread-2] sum:999999999, cost time:22293425934 ns
FastThreadLocal的set性能比ThreadLocal的读性能提高一倍。
7.总结
- ThreadLocal将自己作为key,设置的对象作为value插入Thread.ThreadLocalMap中。
- 每个线程get/set都是从自己的ThreadLocalMap成员变量中读写,从而保证数据线程隔离。
- 当ThreadLocal的外部强引用全部设置为null时,JVM触发垃圾回收后,会回收掉ThreadLocal的键值对Entry中的key,使其为空,从而造成内存泄漏。
- 当ThreadLocalMap中存在get,set或remove操作时,会触发检测一部分槽位,清理key为null的Entry。
- FastThreadLocal初始化时按序生成index,作为存储、访问数组实现的InternalThreadLocalMap的下标,实现快速读写访问。
- 使用FastThreadLocal时推荐使用FastThreadLocalThread作为线程,否则无法提高读写访问速度,甚至效率会降低。
- FastThreadLocal的get性能较ThreadLocal提高两个数量级,而set性能比ThreadLocal的读性能提高一倍左右。