最近我们的应用后台动不动就被卡住,所以警告就上线一看,CPU时不时飙高,curl请求一下,发现不接受请求了,所以想Thread Dump看一下线程都在干啥。
首先:
kill -3 <pid>
然后tomcat会把Thread Dump打印到其安装目录的logs/catalina.out里
查看一下发现果不其然,死锁了,其表征为大量tomcat的http线程都WAITING在同一个地方,如下:
"http-nio-9086-exec-163" daemon prio=10 tid=0x00007f8ab00ab800 nid=0x18e7 waiting on condition [0x00007f8a48644000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000d3643028> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
at com.fasterxml.jackson.databind.util.LRUMap.get(LRUMap.java:56)
at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:707)
at com.fasterxml.jackson.databind.type.TypeFactory._constructType(TypeFactory.java:387)
at com.fasterxml.jackson.databind.type.TypeFactory.constructType(TypeFactory.java:367)
需要注意的是LRUMap这个类的get方法,这里上了读锁,所以继续找到了写锁的线程,如下:
"http-nio-9086-exec-31" daemon prio=10 tid=0x00007f8ab0026000 nid=0x795c runnable [0x00007f8a5b471000]
java.lang.Thread.State: RUNNABLE
at java.util.LinkedHashMap.transfer(LinkedHashMap.java:253)
at java.util.HashMap.resize(HashMap.java:581)
at java.util.HashMap.addEntry(HashMap.java:879)
at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
at java.util.HashMap.put(HashMap.java:505)
at com.fasterxml.jackson.databind.util.LRUMap.put(LRUMap.java:68)
at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:738)
at com.fasterxml.jackson.databind.type.TypeFactory._constructType(TypeFactory.java:387)
at com.fasterxml.jackson.databind.type.TypeFactory.constructType(TypeFactory.java:358)
at com.fasterxml.jackson.databind.cfg.MapperConfig.constructType(MapperConfig.java:268)
当看见Map的时候就隐约感觉有问题,因为多线程下Map有很多问题。需要注意的是LRUMap这个类的put方法。
看源码(版本2.4.1)发发现LRUMap继承自LinkedHashMap,所以详细看看他的put和get方法,发现get会更改内部状态(recordAccess),虽然LRUMap重载了get方法,加了锁,但是这个锁是读锁,所以get和put一来一往就才出现状态问题了。
然后到Jackson的github上看了LRUMap的提交历史找到一个相关issue:链接
解决:更新版本