记录一次线上死锁的定位分析

线上死锁定位分析

现在 IT 界普遍高并发, 分布式环境, 难免遇到死锁, 死循环等问题, 平时开发我们一般都可以停掉服务, 然后打 trace ---> 编译 ---> 修改源代码 ---> 重新编译 ---> ..... ---> 解决问题, 或者通过集成开发环境(如: IDEA, Eclipse)提供的 Debug 功能打断点 ---> watch variable ---> 进入/跳出方法 ---> .....---> 解决问题 , 但是, 当遇到生产环境却往往不知所措, 因为生产环境是不能随便让你停止服务的, 并且生产环境是不会开放 debug 端口的(debug 需要开放调试<<-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005>>), 那么, 线上死锁如何定位? 怎么---> 解决问题 ?

实际上, Java 已经为我们提供了一些好用的定位分析工具, 当然, 这是一句废话, 你肯定知道, 你还知道哪些命令在 JDK 的安装目录下的 bin 下和 java 以及 javac 在一块躺着. 是的, 的确如此, 你可真是一个小机灵鬼哦(奈何从来没用过是不是? 今天帅帅给你一个应用场景, 哈哈…)

(昨晚睡觉前提了点代码到 jfoa 怎么也没想到导致了线上死锁…)

file

1. jps

1.1 jps 虚拟机进程状况工具

要定位分析 java 死锁, 死循环你得先知道发生问题的 LVMID(Logical Volume Manager ID), 就好比你要看 Windows 上的一个应用的详细信息, 你得先找到这个应用运行在哪个进程上一样.
或者你可以将 LVMID 理解为 PID, 通常 LVMID 和 进程 PID 是一致的, 但是并不非得一样.

因此, 要想知道 LVMID, 可以通过查询 PID, 但是其一这是不保险的, 其二, 当同时启动了多个虚拟机进程时, 无法通过名称区分, 就必须得使用 jps 了(任务管理器里面有 n 个 java.exe).

JackLi:~ dreamli$ jps
1461 
1847 GradleDaemon
1210 
1851 RunnerApplication
1852 Jps
JackLi:~ dreamli$ 

上面是一个简单的使用, 第一列的数字就是 LVMID, 后面跟的是启动的主类. 帅帅使用的 gradle 构建工具, 所以有一个 GradleDaemon, 然后jfoa 项目的启动入口在 RunnerApplication

1.2 jps 参数

jps 有以下几个参数

  • -q 只输出 LVMID
JackLi:~ dreamli$ jps -q
1875
1461
1847
1210
1851
  • -m 输出传给 main 函数的参数
JackLi:~ dreamli$ jps -m
1876 Jps -m
1461 
1847 GradleDaemon 6.2.2
1210 
1851 RunnerApplication
  • -l 输出主类的全名, 如果进程执行的是 jar, 则会输出 jar 路径
JackLi:~ dreamli$ jps -l
1461 
1877 sun.tools.jps.Jps
1847 org.gradle.launcher.daemon.bootstrap.GradleDaemon
1210 
1851 club.javafamily.runner.RunnerApplication
  • -v 输出虚拟机进程启动时 JVM 参数
JackLi:~ dreamli$ jps -v
1461  -Dosgi.requiredJavaVersion=1.8 -Dosgi.instance.area.default=@user.home/eclipse-workspace -XX:+UseG1GC -XX:+UseStringDeduplication -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -Dosgi.requiredJavaVersion=1.8 -Xms256m -Xmx1024m -Xdock:icon=../Resources/Eclipse.icns -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -javaagent:/Applications/Eclipse.app/Contents/Eclipse/lombok.jar
1878 Jps -Dapplication.home=/Users/dreamli/Software/java/zulu-jfx-252/zulu-8.jdk/Contents/Home -Xms8m
1847 GradleDaemon -Xmx2g -Dfile.encoding=UTF-8 -Duser.country=CN -Duser.language=zh -Duser.variant
1210  -Xms128m -Xmx2048m -XX:ReservedCodeCacheSize=240m -XX:+UseCompressedOops -Dfile.encoding=UTF-8 -XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=50 -ea -Dsun.io.useCanonCaches=false -Djava.net.preferIPv4Stack=true -Djdk.http.auth.tunneling.disabledSchemes="" -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -Xverify:none -XX:ErrorFile=/Users/dreamli/java_error_in_idea_%p.log -XX:HeapDumpPath=/Users/dreamli/java_error_in_idea.hprof -javaagent:/Applications/IntelliJ IDEA.app/Contents/bin/jetbrains-agent.jar -Djb.vmOptionsFile=/Users/dreamli/Library/Preferences/IntelliJIdea2019.3/idea.vmoptions -Didea.home.path=/Applications/IntelliJ IDEA.app/Contents -Didea.executable=idea -Didea.paths.selector=IntelliJIdea2019.3
1851 RunnerApplication -Djava.rmi.server.hostname=127.0.0.1 -Dspring.profiles.active=dev -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/Users/dreamli/Workspace/MyRepository/javafamily/jfoa/runner/build/server-temp -Duser.country=CN -Duser.language=zh -Duser.variant

2 jstack

2.1 jstack: java 堆栈跟踪工具

jstack 命令用于生成当前时刻线程快照(一般称为 threaddump 文件或者 javacore 文件), 线程快照就是当前虚拟机内每一条线程正在执行的方法堆栈集合, 生成线程快照的目的是定位线程出现长时间停顿的原因, 如: 线程死锁, 死循环, 请求外部资源的长时间等待等都是导致线程长时间停顿的常见原因.
基本命令格式如下:
jstack [options] lvmid

JackLi:~ dreamli$ jps -m
1461 
1847 GradleDaemon 6.2.2
1210 
1851 RunnerApplication
1980 Jps -m
JackLi:~ dreamli$ jstack 1851
2020-11-26 23:59:27
Full thread dump OpenJDK 64-Bit Server VM (25.252-b14 mixed mode):

"Attach Listener" #62 daemon prio=9 os_prio=31 tid=0x00007faece823800 nid=0xac03 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"MessageBroker-2" #61 prio=5 os_prio=31 tid=0x00007faed3018800 nid=0x15403 waiting on condition [0x000070000ab6c000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000773124fa8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

2.2 jstack 参数

jstack 有一下几个参数:

  • -F: 当正常输出的请求不被响应时, 强制输出线程堆栈
  • -l: 除堆栈外显示关于锁的信息
  • -m: 调用本地方法的话, 可以显示 C/C++ 的堆栈

3. 死锁定位分析

昨晚代码撸到半夜, 提交了很多代码, 没想到…死锁------留下了心疼自己的泪水…

3.1 jps 定位 lvmid

第一步不用说了, 先找到 lvmid

3.2 jstack 获取线程堆栈

3.1 提点 Linux 小知识

  • 将线程堆栈写入本地文件
jstack -m 1851 >> ....您的路径../jfoa/jstack-2020-11-26.log
  • 获取远程 Linux 上的堆栈文件
scp root@javafamily.club:/root/jfoa-logs/jstack-2020-11-26.log ./jstack.txt

3.2 线程堆栈文件分析

这里贴出来一些核心的地方


"http-nio-80-exec-6" #58 daemon prio=5 os_prio=0 tid=0x00007efc15ccb800 nid=0x40 waiting on condition [0x00007efbdd490000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e051e308> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:107)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:33)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None



"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
	at club.javafamily.runner.web.portal.service.SubjectVoteService.getVoteCurrentCount(SubjectVoteService.java:159)
	at club.javafamily.runner.web.portal.service.SubjectVoteService.changeVote(SubjectVoteService.java:93)
	at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:749)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115)
	at org.springframework.aop.interceptor.AsyncExecutionInterceptor$$Lambda$848/1976926828.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- <0x00000000dd01f1a0> (a java.util.concurrent.ThreadPoolExecutor$Worker)



"https-jsse-nio-443-exec-5" #45 daemon prio=5 os_prio=0 tid=0x00007efc15a17800 nid=0x33 waiting on condition [0x00007efbde19a000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
	at club.javafamily.runner.web.portal.service.SubjectVoteService.getVoteCurrentCount(SubjectVoteService.java:159)
	at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	...
	   Locked ownable synchronizers:
	- <0x00000000e0520d38> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"https-jsse-nio-443-exec-10" #50 daemon prio=5 os_prio=0 tid=0x00007efc15c2d800 nid=0x38 waiting on condition [0x00007efbddc94000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
	at club.javafamily.runner.web.portal.service.SubjectVoteService.getSubjectVoteDto(SubjectVoteService.java:61)
	at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)

   Locked ownable synchronizers:
	- <0x00000000e051fc38> (a java.util.concurrent.ThreadPoolExecutor$Worker)

这里主要分析 Locked ownable synchronizers 以及 "jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]

  • Locked ownable synchronizers 显示了锁信息

  • "jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000] 如下

"线程名" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [等待锁状态, 0(0x0000000000000000) 代表没锁]
   java.lang.Thread.State: 线程状态

然后结合方法调用堆栈分析死锁, 死循环的原因.

4 问题重现

帅帅的问题最终定位到原因是 ReentrantReadWriteLock 在读锁中获取写锁.(在写锁中获取读锁不会死锁)

  • Service
package org.jack.thread01;

import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

/**
* @Description: 在读锁中获取写锁
* @Warning: 死锁
* @Author: Jack Li
* @Package: thread-01 - ThreadService
* @Date: Nov 26, 2020 10:57:49 PM
* @Version: 1.0.0
* @TimeComplexity: Required[O(n)] ---- Current[O(n^2)]
* @ExecuteResult: Success!
* @Status: Accepted
 */
public class ThreadService {

	public void method1() {
		
		System.out.println("Method 1 准备获取读锁");
		lock.readLock().lock();
		
		System.out.println("Method 1 得到读锁");

		try {
			Thread.sleep(2000);
			
			System.out.println("Method 1 准备获取写锁");
			
			method3();
			
			System.out.println("Method 1 获得写锁并释放");
			
			
		} 
		catch(Exception e) {
			e.printStackTrace();
		}
		finally {
			lock.readLock().unlock();
			
			System.out.println("Method 1 释放读锁");
			
		}
	}

	public void method2() {
		System.out.println("Method 2 准备获取写锁");
		lock.writeLock().lock();
		
		System.out.println("Method 2 获得写锁");

		try {
			Thread.sleep(1000);
		} 
		catch(Exception e) {
			e.printStackTrace();
		}
		finally {
			lock.writeLock().unlock();
			
			System.out.println("Method 2 释放写锁");
		}
	}
	
	public void method3() {
		lock.writeLock().lock();

		try {
			Thread.sleep(500);
		} 
		catch(Exception e) {
			e.printStackTrace();
		}
		finally {
			lock.writeLock().unlock();
		}
	}

	private final ReadWriteLock lock = new ReentrantReadWriteLock();
}
  • 程序入口
public class DealLockTest {
	public static void main(String[] args) {
		
		new Thread(() -> {
			threadService.method1();
		}) .start();
		
		new Thread(() -> {
			threadService.method2();
		}) .start();
		
	}
	
	private static ThreadService threadService = new ThreadService();
//	private static ThreadService2 threadService = new ThreadService2();
}
  • 线程堆栈
"Thread-1" #11 prio=5 os_prio=31 tid=0x00007fbda01c2000 nid=0xa903 waiting on condition [0x0000700003e8c000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076abbc1b8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
	at org.jack.thread01.ThreadService.method2(ThreadService.java:50)
	at org.jack.thread01.DealLockTest.lambda$1(DealLockTest.java:11)
	at org.jack.thread01.DealLockTest$$Lambda$2/295530567.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

"Thread-0" #10 prio=5 os_prio=31 tid=0x00007fbda01be800 nid=0x5503 waiting on condition [0x0000700003d89000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076abbc1b8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
	at org.jack.thread01.ThreadService.method3(ThreadService.java:68)
	at org.jack.thread01.ThreadService.method1(ThreadService.java:31)
	at org.jack.thread01.DealLockTest.lambda$0(DealLockTest.java:7)
	at org.jack.thread01.DealLockTest$$Lambda$1/250421012.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

Thread-0 先获取了读锁, 然后在读锁还没释放时又去获取写锁, 获取写锁时又要等待所有的读锁释放.从而导致了死锁.

file

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值