linux java 服务调试命令

最新推荐文章于 2024-07-22 12:03:47 发布

haizhi_zhangxt

最新推荐文章于 2024-07-22 12:03:47 发布

阅读量887

点赞数

本文链接：https://blog.csdn.net/haizhi_zhangxt/article/details/64918556

版权

随着流量增大很多服务会出现问题，以下一些排查方式可以做为参考，虽然都是很简单的命令但当服务出问题的时候可以快速的定位

一．查看系统负载 cpu mem io load average ... 这些信息都可以通过 top, iostat, ifstat, jstat, xxstat... 命令来查看

二．查看服务进程情况

1.进程cpu占用情况：

§ 按cpu占用率排序:

ps Hh -eo pid,tid,pcpu |sort -nk3|tail

§ 定位哪个线程占用cpu

top -> shift + h 列出线程列表(这里看到的tid是10进制的)

shift + t 按照time排序

jstack pid（进程号） > 输出到文件

然后去找16进制的线程号

§ jstack命令

jstack 12345 > jstack_dump

§ pstree查看进程的所有线程

pstree 12345

2.mem使用情况

§ top命令

§ 查看java进程占用内存情况

通过jmap输出的内容可以查看是否有内存＂泄露＂的问题，哪个实例占的资源多 jmap 还可以把整个进程的内存dump出来

jmap –histo 12345 > mem_dump

§ 查看gc情况：

jstat –gcutil pid 时间间隔(详细使用见附件)

jstat –gcutil 12345 1000

3.文件打开情况(linux下所有东西都是文件)

§ lsof

例:lsof –p pid |wc –l 查看进程打开的文件数，这条命令可以查看程序中是否有资源没有关闭如：socket file .... java服务中常用的httpclient很容易忘记关闭，通过该命令可以快速定位

lsof -p 12345 |wc -l

4.查看网络情况

§ netstat(详细使用见附件，这条命令很无敌)

例：查看16003端口的连接数：

netstat –na|grep 16003|wc –l

列出所有连接16003的每个ip的连接数

netstat -an|grep 16003|awk '{print $5}'|awk -F : '{print $4}'|sort|uniq -c

输出结果：

1 *

2 10.3.12.15

4 10.3.12.20

3 192.168.10.22

3 192.168.10.23

3 192.168.10.24

§ tcpdump

#只显示来源和目的端口为80的TCP分节

tcpdump 'tcp and port 80'

#只显示来源和目的端口为80,并且设置了SYN标志的TCP分节

tcpdump 'tcp and port 80 andtcp[13:1] & 2 != 0'

#只显示来源端口为7001~7005之间的TCP分节

tcpdump 'tcp andtcp[0:2]>7000 and tcp[0:2]<=7005'

§ telnet

#抓http包

telnet 10.58.120.118 80 > http_dump

GET /test/n_16170701962244.jpg HTTP/1.1

Host: pic.58.com

Connection: Keep-Alive

#状态监控

#交互式

telnet 10.58.120.110 26003 > 26003_dump

count

#管道式

(echo -e "count";sleep 10)|telnet 10.58.120.110 26003

#通过linux的fd进行网络通讯

exec 6<>/dev/tcp/10.58.120.110/26003 #连接10.58.120.110:26003关联的fd=6

echo -e "count">&6 #写count

cat<&6 #查看接收的结果

exec 6>&- #关闭输入输出流

exec 6<&-

三.提供的分析功能

*******************************************************************

count[|second num|methodmethodName]

*show method call times in num seconds

*second : in num seconds statistics once (num default1)

*method : for statistics method

*example : count

*example : count|second 3

*example : count|second 3|method getInfo

time|grep abc[|group num|column-tkda]

*show method execute time

*grep : condition

*group : method called num times show statistics once

*column : show column a->all t->time k->key d->description

*example: time|grep getInfo

*example: time|grep getInfo|group 10|column -tk

exec|top

|netstat -na

*exec command (at present only allow:top or netstat)

*example: exec|top

control * use for control xxx-server

help * show help

quit * quit monitor

*******************************************************************

使用方式以imc为例，imc对外提供的监控端口为26003，该端口可以通过查看 xxx/service/deploy/服务名/xxx_config.xml

<name>xxx.server.telnet.listenPort</name>

</property>

§ 1. 先telnet进去

telnet 10.58.120.110 26003

§ 2. 查看getInfo方法的执行情况：

time|grep getInfo

§ 3. 查看所有方法的执行情况：

time|grep

输出结果如下：

time:3ms--key:InvokeRealService_InfoProviderWithIndex.GetInfo--description:protocolversion:1

fromIP:/10.3.12.20:49545

lookUP:InfoProviderWithIndex

methodName:GetInfo

params:

--key:Long[]

--value:[Ljava.lang.Long;@40880562

--key:String

--value:*

--key:String

--value:

--key:String

--value:

§ 4．查看服务并发数：

count

输出类似：

#all# 738

1. all# 877

3. all# 930

5. all# 956

7. all# 936

9. all# 893

10.

11. all# 949

12.

13. all# 962

14.

15. all# 873

16.

17. all# 822

18.

19. all# 907

§ 5. 退出

quit

§ 6．要想保存结果可以在telnet的时候把输出重定向到一个文件中

例：

telnet 10.58.120.110 26003 > tmp

四.查看udp日志，xxx容器对所有执行时间超过100ms的方法都做了udp日志 该日志在 9.112上的 /app/udpserver/log　一天一个文件

§ 1. 查看执行时间超过1s的方法，以imc为例：

grep ',imc' out.2012-02-08|grep 'time: [0-9]\{4,\}' > imc_dump

haizhi_zhangxt

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫