某天,在不知情的情况下,实施私自接入了某其他平台数据,由于数据不规范,导致生产出现阶段性大面积异常报错,有大量线程阻塞,处于BLOCKED状态,应用性能下降一倍。通过线程堆栈日志定位到:阻塞线程都阻塞在org.apache.commons.lang.exception.ExceptionUtils.getCause()处,并且在等待的是一个对象锁,查看源码(commons-lang的2.6版本)发现getCause中会先获取锁,然后通过反射的方式获取异常的详情,反射存在性能问题,在高并发下,导致大量线程阻塞。
通过如下案例进行模拟
创建一个2000个核心线程数的线程池,线程名命名为:test-exception-,在线程池中跑着会抛出类型转换异常的线程,在对异常进行捕获的时候使用ExceptionUtils.getFullStackTrace(e)获取异常的详情信息。
import org.apache.commons.lang.exception.ExceptionUtils;
import org.springframework.scheduling.concurrent.CustomizableThreadFactory;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class TestException {
public static void main(String[] args) {
CustomizableThreadFactory customizableThreadFactory = new CustomizableThreadFactory("test-exception-");
ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(2000, 2000, 0L,
TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(), customizableThreadFactory);
for(int i=0; i<20000; i++) {
Run run = new Run();
threadPoolExecutor.submit(run);
}
}
public static class Run implements Runnable {
@Override
public void run() {
try {
Map map = new HashMap();
map.put("key", "3");
Integer value = (Integer) map.get("key");
} catch (Exception e) {
System.out.println(Thread.currentThread().getName() + ExceptionUtils.getFullStackTrace(e));
}
}
}
}
启动程序,打印dump日志,通过dump日志可以清晰地看到如下日志内容,在线程test-exception-1539运行的时候,获取到对象锁0x00000000dc112b90:- locked <0x00000000dc112b90> (a java.lang.Object),此时其他线程会被阻塞,处于BLOCKED状态,等待锁0x00000000dc112b90:- waiting to lock <0x00000000de112b90> (a java.lang.Object)
"test-exception-1543" #1553 prio=5 os_prio=0 tid=0x000000005c4bf800 nid=0xe338 waiting for monitor entry [0x000000007cffe000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.commons.lang.exception.ExceptionUtils.getCause(ExceptionUtile.java:282]]
- waiting to lock <0x00000000de112b90> (a java.lang.Object)
at org. apache.commons.lang.exception.ExceptionUtils.getThrowableList(ExceptionUtils.java:571)
at org.apache.commons.lang.exception.ExceptionUtils.getThrowables(ExceptionUtils.java:544)
at org.apache.commons.lang.exception.ExceptionUtils.getFullStackTrace(ExceptionUtils.java:870)
at ExceptionTest$Run.run(ExceptionTest.java:36)
at java.util.concurrent.Executora$RunnableAdapter.call(Executora.java:511)
at at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent .ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"test-exception-1540" #1550 prio=5 os_prio=0 tid=0x000000005c4c4000 nid=0xe32c waiting for monitor entry [0x000000007ccfe000]
java.lang.Thread.State: BLOCKED(on object monitor)
at org.apache.commons.lang.exception.ExceptionUtils.getCause(ExceptionUtile.java:282)
- waiting to lock <0x00000000de112b90> (a java.lang.Object)
at org.apache.commons.lang.exception.ExceptionUtils.getThrowableList(ExceptionUtils.java:571)
at org.apache.commons.lang.exception.ExceptionUtils.getThrowables(ExceptionUtils.java:544)
at org.apache.commons.lang.exception.ExceptionUtils.getFullStackTrace(ExceptionUtils.java:870)
at ExceptionTest$Run.run(ExceptionTest.java:36)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"test-exception-1539" #1549 prio=5 os prio=0 tid=0x000000005c4c2800 nid=0xe328 runnable [0x000000007cbfe000]
java.lang.Thread.State: RUNNABLE
at java.lang.Class.getMethod0(Class.java: 3017)
at java.lang.Class.getMethod(Class.java:1784)
at org.apache.commons.lang.exception.ExceptionUtils.getCauseUsingMethodName(ExceptionUtils.java:386)
at org.apache.commons.lang.exception.ExceptionUtils.getCause(ExceptionUtils.java:318)
org.apache.commons.lang.exception.ExceptionUtils.getCause(ExceptionUtila.java:282)
- locked <0x00000000dc112b90> (a java.lang.Object)
at org.apache.commons.lang.exception.ExceptionUtils.getThrowableList(ExceptionUtils.java:571)
at org.apache.commons .lang.exception.ExceptionUtils.getThrowables(ExceptionUtils.java: 544)
at org.apache.commons.lang.exception.ExceptionUtils.getFullStackTrace(ExceptionUtile.java:870)
at ExceptionTestSRun.run(ExceptionTest.java:36)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTaak.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
所以在高并发的情况下,遇到异常之后,大量线程中只有其中一个线程能得到执行,其他线程处于BLOCKED状态,而且这只是一个辅助功能,在一些对主业务没有影响的异常大量发生的情况下,会导致应用性能的下降。
在ExceptionUtils的getCause()方法内会先锁住CAUSE_METHOD_NAMES_LOCK对象,synchronized(CAUSE_METHOD_NAMES_LOCK),CAUSE_METHOD_NAMES_LOCK是一个Object对象,获取都对象锁之后,在getCauseUsingMethodName方法里通过throwable.getClass().getMethod(methodName, null)反射的方式获取方法Method,由于JIT编译无法优化等一些原因,导致反射效率比较低,造成了getCause(throwable, CAUSE_METHOD_NAMES)耗时较多,从而在高并发情况下,会导致大量的线程阻塞。
public class ExceptionUtils {
// Lock object for CAUSE_METHOD_NAMES
private static final Object CAUSE_METHOD_NAMES_LOCK = new Object();
private static String[] CAUSE_METHOD_NAMES = {
"getCause", "getNextException", "getTargetException", "getException",
"getSourceException", "getRootCause","getCausedByException","getThrowable",
"getNested", "getLinkedException", "getNestedException", "getLinkedCause",
};
public static String getFullStackTrace(Throwable throwable) {
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
Throwable[] ts = getThrowables(throwable);
for (int i = 0; i < ts.length; i++) {
ts[i].printStackTrace(pw);
if (isNestedThrowable(ts[i])) {
break;
}
}
return sw.getBuffer().toString();
}
public static Throwable[] getThrowables(Throwable throwable) {
List list = getThrowableList(throwable);
return (Throwable[]) list.toArray(new Throwable[list.size()]);
}
public static List getThrowableList(Throwable throwable) {
List list = new ArrayList();
while (throwable != null && list.contains(throwable) == false) {
list.add(throwable);
throwable = ExceptionUtils.getCause(throwable);
}
return list;
}
public static Throwable getCause(Throwable throwable) {
synchronized(CAUSE_METHOD_NAMES_LOCK) {
return getCause(throwable, CAUSE_METHOD_NAMES);
}
}
public static Throwable getCause(Throwable throwable, String[] methodNames) {
if (throwable == null) {
return null;
}
Throwable cause = getCauseUsingWellKnownTypes(throwable);
if (cause == null) {
if (methodNames == null) {
synchronized(CAUSE_METHOD_NAMES_LOCK) {
methodNames = CAUSE_METHOD_NAMES;
}
}
for (int i = 0; i < methodNames.length; i++) {
String methodName = methodNames[i];
if (methodName != null) {
cause = getCauseUsingMethodName(throwable, methodName);
if (cause != null) {
break;
}
}
}
if (cause == null) {
cause = getCauseUsingFieldName(throwable, "detail");
}
}
return cause;
}
private static Throwable getCauseUsingMethodName(Throwable throwable, String methodName) {
Method method = null;
try {
method = throwable.getClass().getMethod(methodName, null);
} catch (NoSuchMethodException ignored) {
// exception ignored
} catch (SecurityException ignored) {
// exception ignored
}
if (method != null && Throwable.class.isAssignableFrom(method.getReturnType())) {
try {
return (Throwable) method.invoke(throwable, ArrayUtils.EMPTY_OBJECT_ARRAY);
} catch (IllegalAccessException ignored) {
// exception ignored
} catch (IllegalArgumentException ignored) {
// exception ignored
} catch (InvocationTargetException ignored) {
// exception ignored
}
}
return null;
}
}
解决方法:该问题在commons-lang3中已经不存在,把依赖升级为3,并且获取异常堆栈信息改为使用org.apache.commons.lang3.exception.ExceptionUtils.getStackTrace(e)即可。
maven
<!--
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
-->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.11</version>
</dependency>