问题描述:某游戏大区,所有人都登陆不上去,前端显示loading了,后端没有报错。
内存证据收集:首先要抓这个服务器的内存信息出来,通过windows任务管理器找到这个大区的pid,然后打开jvisualvm,找到对应的pid应用,点击“线程”标签,点击“线程dump”,就可以看到引用程序的所有内部线程工作状态。
然后开始分析,
如上图所示,有多个登陆线程卡在了run.MemcachedRun.add(MemcachedRun.java:97)这里,这些登陆线程在等待锁<0x000000078004a720>,所以,要看看是哪个线程拿着这个锁呢。
如上图所示,<0x000000078004a720>这个锁是拥有者也在等待另外一个锁的释放,这个锁是<0x000000078004a6d8>,然而,在打印的线程dump信息里边找不到这个锁,证明这个锁是java内部的锁,不是我们自己的代码。
于是定位到run.MemcachedRun.delSynObject(MemcachedRun.java:341)看看这一行的代码,代码如下:
SynObject syn=Link.take();
这个Link的定义public static BlockingQueue<SynObject> Link = new LinkedBlockingQueue<SynObject>();
所以说,这个锁是系统的锁,为了验证这个系统锁,特意写了一段测试的代码,如下:
package utils.mytest;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import MyObj.SynObject;
public class cerun extends Thread{
public static BlockingQueue<SynObject> Link = new LinkedBlockingQueue<SynObject>();
public static void main(String[] strs){
cerun cerun=new cerun();
cerun.start();
for(int i=0;i<10;i++){
try {
System.out.println("开始第"+i+"次,数据量="+cerun.Link.size());
if(cerun.Link.size()>0){
//重启线程
cerun.stop();
Thread.sleep(1000);
System.out.println("重启线程");
cerun=new cerun();
cerun.start();
//删除数据
cerun.delSynObject(1);
}else{
System.out.println("没有数据");
}
Thread.sleep(2);
} catch (Exception e) {
e.printStackTrace();
}
}
}
public cerun(){
}
public void run() {
while(true){
try {
for(int i=0;i<10000;i++){
SynObject SynObject=new SynObject();
SynObject.setId(1);
Link.offer(SynObject);
}
if(Link.size()>2000000){
System.out.println("线程自己结束");
break;
}
} catch (Exception e1) {
e1.printStackTrace();
}
}
}
public void delSynObject(int id){
for(SynObject object : Link){
if(object.getId()==id){
Link.remove(object);
}
}
}
}
运行以上代码,输出如下:
开始第0次,数据量=0
没有数据
开始第1次,数据量=2991
重启线程
然后就不输出了,证明死锁了,查看jvm内容,重要内容如下:
"main" prio=6 tid=0x0000000003303800 nid=0x3258 waiting on condition [0x00000000032ff000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007d5af4d30> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:223)
at java.util.concurrent.LinkedBlockingQueue$Itr.<init>(LinkedBlockingQueue.java:792)
at java.util.concurrent.LinkedBlockingQueue.iterator(LinkedBlockingQueue.java:778)
at utils.mytest.cerun.delSynObject(cerun.java:55)
at utils.mytest.cerun.main(cerun.java:24)
Locked ownable synchronizers:
- None
"Thread-1" prio=6 tid=0x000000000d9a6800 nid=0x1ed4 waiting on condition [0x000000000df3f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007d5af4d30> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:417)
at utils.mytest.cerun.run(cerun.java:43)
Locked ownable synchronizers:
- None
可以发现main线程和创建的cerun线程都在等待锁0x00000007d5af4d30,而0x00000007d5af4d30去哪里了,线程dump里边找不到。