前言:
本文总结一些常见的线上问题和对应的排查思路,工具。对于线上问题,我们必须记住一个原则:尽快恢复服务,消除影响。不管出于应急的哪个阶段,我们首先必须想到的是恢复问题,恢复问题并不意味着必须在当下定位到问题。在大多数情况下,我们都是先恢复服务,尽可能多的保留当时的异常信息(内存dump,线程dump,gc log等),等到服务正常,再去复盘问题。下面描述几个经常会遇到的线上紧急问题。
常用的命令工具总结
1. top 命令
-H :Threads-mode operation
Instructs top to display individual threads. Without this command-line option a summation of all threads in each process is shown. Later this can be changed with the `H' interactive command.
-p :Monitor-PIDs mode as: -pN1 -pN2 ... or -pN1,N2,N3 ...
Monitor only processes with specified process IDs. This option can be given up to 20 times, or you can provide a comma delimited list with up to 20 pids. Co-mingling both approaches is permitted.
查看某个进程内部线程占用情况分析:
top -H -p
2. pstack&#