Arthas官方文档指出:
介绍:Arthas 是一款线上监控诊断产品,通过全局视角实时查看应用 load、内存、gc、线程的状态信息,并能在不修改应用代码的情况下,对业务问题进行诊断,包括查看方法调用的出入参、异常,监测方法执行耗时,类加载信息等,大大提升线上问题排查效率。
背景:通常,本地开发环境无法访问生产环境。如果在生产环境中遇到问题,则无法使用 IDE 远程调试。更糟糕的是,在生产环境中调试是不可接受的,因为它会暂停所有线程,导致服务暂停。
开发人员可以尝试在测试环境或者预发环境中复现生产环境中的问题。但是,某些问题无法在不同的环境中轻松复现,甚至在重新启动后就消失了。
如果您正在考虑在代码中添加一些日志以帮助解决问题,您将必须经历以下阶段:测试、预发,然后生产。这种方法效率低下,更糟糕的是,该问题可能无法解决,因为一旦 JVM 重新启动,它可能无法复现,如上文所述。
Arthas 旨在解决这些问题。开发人员可以在线解决生产问题。无需 JVM 重启,无需代码更改。 Arthas 作为观察者永远不会暂停正在运行的线程。
Arthas(阿尔萨斯)能为你做什么?
- 这个类从哪个 jar 包加载的?为什么会报各种类相关的 Exception?
- 我改的代码为什么没有执行到?难道是我没 commit?分支搞错了?
- 遇到问题无法在线上 debug,难道只能通过加日志再重新发布吗?
- 线上遇到某个用户的数据处理有问题,但线上同样无法 debug,线下无法重现!
- 是否有一个全局视角来查看系统的运行状况?
- 有什么办法可以监控到 JVM 的实时运行状态?
- 怎么快速定位应用的热点,生成火焰图?
- 怎样直接从 JVM 内查找某个类的实例?
Arthas 安装
1. 在线快速安装
[root@centos142 arthas]# curl -O https://arthas.aliyun.com/arthas-boot.jar
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 138k 100 138k 0 0 95326 0 0:00:01 0:00:01 --:--:-- 95308
[root@centos142 arthas]# ls
arthas-boot.jar
2. 通过 rpm/deb 手动安装
在 releases 页面下载 rpm/deb 包: https://github.com/alibaba/arthas/releases
sudo dpkg -i arthas*.deb
// 或者
sudo rpm -i arthas*.rpm
快速入门
1. 启动 math-game
math-game
是一个简单的程序,每隔一秒生成一个随机数,再执行质因数分解,并打印出分解结果。源码:
package demo;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.TimeUnit;
public class MathGame {
private static Random random = new Random();
private int illegalArgumentCount = 0;
public static void main(String[] args) throws InterruptedException {
MathGame game = new MathGame();
while (true) {
game.run();
TimeUnit.SECONDS.sleep(1);
}
}
public void run() throws InterruptedException {
try {
int number = random.nextInt()/10000;
List<Integer> primeFactors = primeFactors(number);
print(number, primeFactors);
} catch (Exception e) {
System.out.println(String.format("illegalArgumentCount:%3d, ", illegalArgumentCount) + e.getMessage());
}
}
public static void print(int number, List<Integer> primeFactors) {
StringBuffer sb = new StringBuffer(number + "=");
for (int factor : primeFactors) {
sb.append(factor).append('*');
}
if (sb.charAt(sb.length() - 1) == '*') {
sb.deleteCharAt(sb.length() - 1);
}
System.out.println(sb);
}
public List<Integer> primeFactors(int number) {
if (number < 2) {
illegalArgumentCount++;
throw new IllegalArgumentException("number is: " + number + ", need >= 2");
}
List<Integer> result = new ArrayList<Integer>();
int i = 2;
while (i <= number) {
if (number % i == 0) {
result.add(i);
number = number / i;
i = 2;
} else {
i++;
}
}
return result;
}
}
或者直接在线下载启动:
curl -O https://arthas.aliyun.com/math-game.jar
java -jar math-game.jar
2. 启动 arthas
在命令行下面执行(使用和目标进程一致的用户启动,否则可能 attach 失败):
java -jar arthas-boot.jar
- 执行该程序的用户需要和目标进程具有相同的权限。比如以
admin
用户来执行:sudo su admin && java -jar arthas-boot.jar
或sudo -u admin -EH java -jar arthas-boot.jar
。 - 如果 attach 不上目标进程,可以查看
~/logs/arthas/
目录下的日志。 - 如果下载速度比较慢,可以使用 aliyun 的镜像:
java -jar arthas-boot.jar --repo-mirror aliyun --use-http
java -jar arthas-boot.jar -h
打印更多参数信息。
这里需要选择应用 java 进程:我这里默认 1
3. 查看 dashboard(仪表盘)
输入dashboard,按回车/enter
,会展示当前进程的信息,按ctrl+c
可以中断执行:
1. 进程中有哪些线程(NAME),哪些线程在吃cpu(%CPU),线程的状态(STATE)。
2. 查看堆、新生代、老年代等占用情况。
4. thread 命令
(1)查看死锁程序
thread -b
(2)帮助方法
[arthas@2110]$ thread -h
USAGE:
thread [--all] [-h] [-b] [--lockedMonitors] [--lockedSynchronizers] [-i <value>] [--state <value>] [-n <value>] [id]
SUMMARY:
Display thread info, thread stack
EXAMPLES:
thread
thread 51
thread -n -1
thread -n 5
thread -b
thread -i 2000
thread --state BLOCKED
WIKI:
https://arthas.aliyun.com/doc/thread
OPTIONS:
--all Display all thread results instead of the first page
-h, --help this help
-b, --include-blocking-thread Find the thread who is holding a lock that blocks the most number of threads.
--lockedMonitors Find the thread info with lockedMonitors flag, default value is false.
--lockedSynchronizers Find the thread info with lockedSynchronizers flag, default value is false.
-i, --sample-interval <value> Specify the sampling interval (in ms) when calculating cpu usage.
--state <value> Display the thead filter by the state. NEW, RUNNABLE, TIMED_WAITING, WAITING, B
LOCKED, TERMINATED is optional.
-n, --top-n-threads <value> The number of thread(s) to show, ordered by cpu utilization, -1 to show all.
<id> Show thread stack
5. 通过 jad 来反编译 Class
jad 后面的参数为: 类所在完整包路径.类名
jad demo.MathGame
6. redefine :在线修改 class
将修改好的 class 文件,通过 redefine命令热更新到 jvm 中,达到不停止程序而更新代码的作用。一般线上配合 jad 使用。
7. watch
通过watch命令来查看demo.MathGame#primeFactors
函数的返回值:
watch demo.MathGame primeFactors returnObj
8. JVM命令
查看 java进程的设置参数,例如:GARBAGE-COLLECTORS,查看年轻代、老年代所用的垃圾回收器。PS Scavenge 将用于年轻(eden,survivor)代,PS MarkSweep 将用于老年代。唯一的“重叠”是 PS Scavenge 将在对象存在一段时间后将它们移动到老年代,然后让 PS MarkSweep 处理它们。('scavenge' 和 'marksweep' 分别用于清理和标记、清除和收集)
[arthas@2110]$ jvm
RUNTIME
--------------------------------------------------------------------------------------------------------------------------
MACHINE-NAME 2110@centos142
JVM-START-TIME 2023-01-22 16:47:52
MANAGEMENT-SPEC-VERSION 1.2
SPEC-NAME Java Virtual Machine Specification
SPEC-VENDOR Oracle Corporation
SPEC-VERSION 1.8
VM-NAME Java HotSpot(TM) 64-Bit Server VM
VM-VENDOR Oracle Corporation
VM-VERSION 25.333-b02
INPUT-ARGUMENTS -Xms200M
-Xmx200M
-XX:+PrintGC
CLASS-PATH JVMTest-1.0-SNAPSHOT.jar
BOOT-CLASS-PATH /usr/local/jdk1.8/jre/lib/resources.jar:/usr/local/jdk1.8/jre/lib/rt.jar:/usr/local/j
dk1.8/jre/lib/sunrsasign.jar:/usr/local/jdk1.8/jre/lib/jsse.jar:/usr/local/jdk1.8/jre
/lib/jce.jar:/usr/local/jdk1.8/jre/lib/charsets.jar:/usr/local/jdk1.8/jre/lib/jfr.jar
:/usr/local/jdk1.8/jre/classes
LIBRARY-PATH /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
--------------------------------------------------------------------------------------------------------------------------
CLASS-LOADING
--------------------------------------------------------------------------------------------------------------------------
LOADED-CLASS-COUNT 3895
TOTAL-LOADED-CLASS-COUNT 3895
UNLOADED-CLASS-COUNT 0
IS-VERBOSE false
--------------------------------------------------------------------------------------------------------------------------
COMPILATION
--------------------------------------------------------------------------------------------------------------------------
NAME HotSpot 64-Bit Tiered Compilers
TOTAL-COMPILE-TIME 1258
[time (ms)]
--------------------------------------------------------------------------------------------------------------------------
GARBAGE-COLLECTORS
--------------------------------------------------------------------------------------------------------------------------
PS Scavenge name : PS Scavenge
[count/time (ms)] collectionCount : 5
collectionTime : 223
PS MarkSweep name : PS MarkSweep
[count/time (ms)] collectionCount : 1
collectionTime : 122
--------------------------------------------------------------------------------------------------------------------------
MEMORY-MANAGERS
--------------------------------------------------------------------------------------------------------------------------
CodeCacheManager Code Cache
Metaspace Manager Metaspace
Compressed Class Space
PS Scavenge PS Eden Space
PS Survivor Space
PS MarkSweep PS Eden Space
PS Survivor Space
PS Old Gen
--------------------------------------------------------------------------------------------------------------------------
MEMORY
--------------------------------------------------------------------------------------------------------------------------
HEAP-MEMORY-USAGE init : 209715200(200.0 MiB)
[memory in bytes] used : 46541392(44.4 MiB)
committed : 201326592(192.0 MiB)
max : 201326592(192.0 MiB)
NO-HEAP-MEMORY-USAGE init : 2555904(2.4 MiB)
[memory in bytes] used : 29093304(27.7 MiB)
committed : 30064640(28.7 MiB)
max : -1(-1 B)
PENDING-FINALIZE-COUNT 0
--------------------------------------------------------------------------------------------------------------------------
OPERATING-SYSTEM
--------------------------------------------------------------------------------------------------------------------------
OS Linux
ARCH amd64
PROCESSORS-COUNT 2
LOAD-AVERAGE 0.05
VERSION 3.10.0-1160.el7.x86_64
--------------------------------------------------------------------------------------------------------------------------
THREAD
--------------------------------------------------------------------------------------------------------------------------
COUNT 64
DAEMON-COUNT 13
PEAK-COUNT 64
STARTED-COUNT 67
DEADLOCK-COUNT 0
--------------------------------------------------------------------------------------------------------------------------
FILE-DESCRIPTOR
--------------------------------------------------------------------------------------------------------------------------
MAX-FILE-DESCRIPTOR-COUNT 4096
OPEN-FILE-DESCRIPTOR-COUNT 66
9. 退出 arthas
如果只是退出当前的连接,可以用quit
或者exit
命令。Attach 到目标进程上的 arthas 还会继续运行,端口会保持开放,下次连接时可以直接连接上。
如果想完全退出 arthas,可以执行stop
命令。
2024-03-27补充:
watch com.fan.XX '{params,returnObj,throwExp}' "params[0].fpHm=='24112000000014457076'" -n 5 -x 3
trace com.fan.XXX 方法 "params[0]=='24112000000014457076'" -n 5 --skipJDKMethod false
未完待续...