BPF之巅——Linux 60秒分析

最新推荐文章于 2024-06-07 14:54:37 发布

ch123456hy

最新推荐文章于 2024-06-07 14:54:37 发布

阅读量4.1k

点赞数 3

分类专栏：书籍笔记文章标签： linux

本文链接：https://blog.csdn.net/ch123456hy/article/details/111495646

版权

书籍笔记专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

Linux 60秒分析

Linux 60秒分析

工具和指标可以聚焦于唾手可得的性能问题：列出十几个常见的问题，以及对应的分析方法，让每个人都能参照检查。此文章翻译的是Brendan Gregg和Netflix性能工程团队的发布部分内容的翻译摘取。

uptime


$ uptime
20:08:53 up 50 min,  2 users,  load average: 0.00, 0.01, 0.05

快速检查平均负载，即此刻有多少个任务（进程）需要执行。
3个数字是指数衰减的1分钟/5分钟/15分钟滑动窗口累计值，可以大致了解负载随时间变化的情况
负载的平均值在排除故障过程中被首先进行检查，以确认性能问题是否还存在
一个较高的15分钟负载与一个较低的1分钟负载同时出现，可能意味着已经错过了问题发生的现场

dmesg|tail

$ dmesg|tail
[   14.636073] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   16.323147] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   18.243527] sda1: WRITE SAME failed. Manually zeroing.
[   20.586194] init: plymouth-upstart-bridge main process ended, respawning
[   21.207826] cgroup: systemd-logind (852) created nested cgroup for controller "memory" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.
[   21.207828] cgroup: "memory" requires setting use_hierarchy to 1 on the root.
[  314.984616] audit_printk_skb: 171 callbacks suppressed
[  314.984619] type=1400 audit(1608549828.128:69): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/cups/backend/cups-pdf" pid=3267 comm="apparmor_parser"
[  314.984623] type=1400 audit(1608549828.128:70): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=3267 comm="apparmor_parser"
[  314.984944] type=1400 audit(1608549828.128:71): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=3267 comm="apparmor_parser"

显示过去10条系统日志，如果有的话。
寻找可能导致性能问题的错误

vmstat 1

# vmstat num num表示打印统计信息打印的间隔时间
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 1163168  30152 317236    0    0   494    24  244  430  3  3 80 14  0
 0  0      0 1163040  30152 317236    0    0     0     0  321  717  1  0 99  0  0
 0  0      0 1162948  30160 317228    0    0     0   580  377  882  1  0 99  0  0

需要认真检查的列包括：

r:CPU正在执行的和等待执行的进程数量：一个比CPU数量多的r值代表CPU资源处于饱和状态
free:空间内存，单位是kb
si和so:页换入和页换出，如果值不为0表示系统内存紧张
us、sy、id、wa和st:这些都是CPU运行时间的进一步细分，是对所有的CPU取平均值之后的结果，分别代表用户态时间、系统态时间、空闲、等待I/O，以及被窃取时间（stolen time，指的是虚拟化环境下，被其他客户机所挤占的时间）

mpstat -P ALL 1


$ mpstat -P ALL 1
Linux 3.13.0-32-generic (ubuntu) 	12/21/2020 	_x86_64_	(8 CPU)

07:33:19 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:33:20 PM  all    0.63    0.00    0.13    0.00    0.00    0.00    0.00    0.00    0.00   99.25
07:33:20 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
07:33:20 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
07:33:20 PM    2    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00    0.00   98.02
07:33:20 PM    3    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00    0.00   98.02
07:33:20 PM    4    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00    0.00   98.02
07:33:20 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
07:33:20 PM    6    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00
07:33:20 PM    7    0.99    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.01

每个CPU分解到各个状态下的时间打印出来。
%usr 用户态
%sys 内核态，可用系统调用跟踪和内核跟踪
%iowait 磁盘，iostat可以详细查看存储设备的信息

pidstat 1

$ pidstat 1
Linux 3.13.0-32-generic (ubuntu) 	12/21/2020 	_x86_64_	(8 CPU)

07:39:10 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
07:39:11 PM  1000      2669    0.94    0.00    0.00    0.94     6  compiz
07:39:11 PM  1000      4153    0.94    0.94    0.00    1.89     1  pidstat

07:39:11 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
07:39:12 PM     0      1473    0.99    0.99    0.00    1.98     3  Xorg
07:39:12 PM  1000      2669    6.93    1.98    0.00    8.91     1  compiz
07:39:12 PM  1000      2816    0.99    0.00    0.00    0.99     0  gnome-terminal
07:39:12 PM  1000      4153    0.99    1.98    0.00    2.97     1  pidstat

为每个进程展示CPU的使用情况

iostat -xz 1


$ iostat -xz 1
Linux 3.13.0-32-generic (ubuntu) 	12/21/2020 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.45    0.08    0.38    1.03    0.00   98.06

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.76     6.20   25.18    2.26   352.72    80.87    31.60     0.35   12.85   12.25   19.50   1.75   4.81

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.63    0.00    0.13    0.00    0.00   99.25

显示存储设备的I/O指标
需要认真检查的列包括：

r/s、w/s、rkB/s和wkB/s:每秒向设备发送的读、写次数，以及读、写字节数，可以用这些指标对业务负载画像（某些性能问题仅仅是因为超过可能够承受的最大负载导致的）
await:I/O的平均响应时间，以毫秒为单位（超过预期的平均响应时间，可以看做设备已饱和或者设备层面有问题的表征）
avgqu-sz:设备请求队列的平均长度
%util:设备使用率

free -m


$ free -m
           total       used       free     shared    buffers     cached
Mem:        1987       1158        829          5        119        443
-/+ buffers/cache:        595       1391
Swap:         1021          0       1021

输出显示了用MB作为单位的可用内存，检查可用内存是否为0

sar -n DEV 1


$ sar -n DEV 1
Linux 3.13.0-32-generic (ubuntu) 	12/21/2020 	_x86_64_	(8 CPU)

08:01:27 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
08:01:28 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
08:01:28 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
08:01:28 PM   docker0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

查看网络设备指标

sar -n TCP,ETCP 1


$ sar -n TCP,ETCP 1
Linux 3.13.0-32-generic (ubuntu) 	12/21/2020 	_x86_64_	(8 CPU)

08:03:08 PM  active/s passive/s    iseg/s    oseg/s
08:03:09 PM      0.00      0.00      0.00      0.00

08:03:08 PM  atmptf/s  estres/s retrans/s isegerr/s   orsts/s
08:03:09 PM      0.00      0.00      0.00      0.00      0.00

active/s:每秒本地发起的TCP连接的数量（connect()）
passive/s:每秒远端发起的TCP连接的数量(accept())
retranss/s:每秒TCP重传的数量

top


$ top
top - 20:06:48 up 48 min,  2 users,  load average: 0.00, 0.01, 0.05
Tasks: 479 total,   1 running, 478 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.5 us,  0.3 sy,  0.0 ni, 98.6 id,  0.6 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   2035256 total,  1188352 used,   846904 free,   122112 buffers
KiB Swap:  1046524 total,        0 used,  1046524 free.   454056 cached Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                  
  2669 cyrench+  20   0 1475872 163472  39464 S  24.2  8.0   1:39.61 compiz                   
  1473 root      20   0  337108  43888  13436 S   6.1  2.2   0:22.28 Xorg                     
     1 root      20   0   33908   3212   1456 S   0.0  0.2   0:02.13 init                     
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.05 kthreadd                 
     3 root      20   0       0      0      0 S   0.0  0.0   0:00.01 ksoftirqd/0              
     5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H             
     7 root      20   0       0      0      0 S   0.0  0.0   0:01.09 rcu_sched                
     8 root      20   0       0      0      0 S   0.0  0.0   0:00.58 rcuos/0                  
     9 root      20   0       0      0      0 S   0.0  0.0   0:00.62 rcuos/1                  
    10 root      20   0       0      0      0 S   0.0  0.0   0:00.32 rcuos/2                  
    11 root      20   0       0      0      0 S   0.0  0.0   0:00.33 rcuos/3                  
    12 root      20   0       0      0      0 S   0.0  0.0   0:00.45 rcuos/4                  
    13 root      20   0       0      0      0 S   0.0  0.0   0:00.37 rcuos/5                  
    14 root      20   0       0      0      0 S   0.0  0.0   0:00.27 rcuos/6                  
    15 root      20   0       0      0      0 S   0.0  0.0   0:00.22 rcuos/7

ch123456hy

关注

3
点赞
踩
11

收藏

觉得还不错? 一键收藏
1
评论
BPF之巅——Linux 60秒分析

文章目录Linux 60秒分析uptimedmesg|tailvmstat 1mpstat -P ALL 1pidstat 1iostat -xz 1free -msar -n DEV 1sar -n TCP,ETCP 1topLinux 60秒分析工具和指标可以聚焦于唾手可得的性能问题：列出十几个常见的问题，以及对应的分析方法，让每个人都能参照检查。此文章翻译的是Brendan Gregg和Netflix性能工程团队的发布部分内容的翻译摘取。uptime$ uptime20:08:53 up
复制链接

扫一扫