2024年Linux最新2024最火的Linux性能分析工具--perf_perf 内存分析(1)

2024云技术

已于 2024-04-30 19:27:04 修改

阅读量547

点赞数 27

分类专栏：程序员文章标签： linux 运维服务器

于 2024-04-30 19:27:02 首次发布

本文链接：https://blog.csdn.net/2401_83621004/article/details/138354918

版权

程序员专栏收录该内容

126 篇文章 0 订阅

订阅专栏

-d, --delay <n>       number of seconds to delay between refreshes
-D, --dump-symtab     dump the symbol table used for profiling
-E, --entries <n>     display this many functions
-e, --event <event>   event selector. use 'perf list' to list available events
-f, --count-filter <n>
                      only display functions with more events than this
-F, --freq <freq or 'max'>
                      profile at this frequency
-g                    enables call-graph recording and display
-i, --no-inherit      child tasks do not inherit counters
-j, --branch-filter <branch filter mask>
                      branch stack filter modes
-K, --hide_kernel_symbols
                      hide kernel symbols
-k, --vmlinux <file>  vmlinux pathname
-M, --disassembler-style <disassembler style>
                      Specify disassembler style (e.g. -M intel for intel syntax)
-m, --mmap-pages <pages>
                      number of mmap data pages
-n, --show-nr-samples
                      Show a column with the number of samples
-p, --pid <pid>       profile events on existing process id
-r, --realtime <n>    collect data with this RT SCHED_FIFO priority
-s, --sort <key[,key2...]>
                      sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer the man page for the complete list.
-t, --tid <tid>       profile events on existing thread id
-U, --hide_user_symbols
                      hide user symbols
-u, --uid <user>      user to profile
-v, --verbose         be more verbose (show counter open errors, etc)
-w, --column-widths <width[,width...]>
                      don't try to adjust column width, use these fixed values
-z, --zero            zero history across updates
    --asm-raw         Display raw encoding of assembly instructions (default)
    --call-graph <record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]>
                      setup and enables call-graph (stack chain/backtrace):

                            record_mode:    call graph recording mode (fp|dwarf|lbr)
                            record_size:    if record_mode is 'dwarf', max size of stack recording (<bytes>)
                                            default: 8192 (bytes)
                            print_type:     call graph printing style (graph|flat|fractal|folded|none)
                            threshold:      minimum call graph inclusion threshold (<percent>)
                            print_limit:    maximum number of call graph entry (<number>)
                            order:          call graph order (caller|callee)
                            sort_key:       call graph sort key (function|address)
                            branch:         include last branch info to call graph (branch)
                            value:          call graph value (percent|period|count)

                            Default: fp,graph,0.5,caller,function
    --children        Accumulate callchains of children and show total overhead as well
    --comms <comm[,comm...]>
                      only consider symbols in these comms
    --demangle-kernel
                      Enable kernel symbol demangling
    --dsos <dso[,dso...]>
                      only consider symbols in these dsos
    --fields <key[,keys...]>
                      output field(s): overhead, period, sample plus all of sort keys
    --force           don't complain, do it
    --group           put the counters into a counter group
    --hierarchy       Show entries in a hierarchy
    --ignore-callees <regex>
                      ignore callees of these functions in call graphs
    --ignore-vmlinux  don't load vmlinux even if found
    --max-stack <n>   Set the maximum stack depth when parsing the callchain. Default: kernel.perf_event_max_stack or 127
    --num-thread-synthesize <n>
                      number of thread to run event synthesize
    --objdump <path>  objdump binary to use for disassembly and annotations
    --overwrite       Use a backward ring buffer, default: no
    --percent-limit <percent>
                      Don't show entries under that percent
    --percentage <relative|absolute>
                      How to display percentage of filtered entries
    --proc-map-timeout <n>
                      per thread proc mmap processing timeout in ms
    --raw-trace       Show raw trace event output (do not use print fmt or plugins)
    --show-total-period
                      Show a column with the sum of periods
    --source          Interleave source code with assembly code (default)
    --stdio           Use the stdio interface
    --sym-annotate <symbol name>
                      symbol to annotate
    --symbols <symbol[,symbol...]>
                      only consider these symbols
    --tui             Use the TUI interface

[root@centos7 ~]# perf top -a
Samples: 646K of event ‘cpu-clock’, 4000 Hz, Event count (approx.): 12702138322 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
29.70% php-fpm [.] 0x00000000006250c2


Samples：采集cpu时钟事件的总样本数， 可以在命令中跟上 -e 事件 参数来指定跟踪的事件，perf list 命令，列出所有可跟踪的事件。


Event count：事件总数量


Overhead：符号引起的性能事件在总采样本中的百分比


Shared Object ：符号所在的DSO(Dynamic Shared Object)，一般是应用程序、内核、动态连接库、模块


[.]表示此符号属于用户态的ELF文件，包括可执行文件与动态连接库；[k]表述此符号属于内核或模块。


Symbol：符号名或函数名，未知时，用十六进制显示


perf top 常用的扩展参数有  
 ![在这里插入图片描述](https://img-blog.csdnimg.cn/f70a1e3f4c114fbda0be61c5c088a662.png)

[root@centos7 ~]# perf list

List of pre-defined events (to be used in -e):

alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]

msr/tsc/ [Kernel PMU event]

rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 …]/modifier [Raw hardware event descriptor]
(see ‘man perf-list’ on how to encode it)

mem:[/len][:access] [Hardware breakpoint]

block:block_bio_backmerge [Tracepoint event]
block:block_bio_bounce [Tracepoint event]
block:block_bio_complete [Tracepoint event]
block:block_bio_frontmerge [Tracepoint event]


Software 是软件事件


Hardware\cache\Kernel PMU 都是硬件事件


Tracepoint是基于内核的ftrace

指定跟踪的事件 perf top -e block:block_rq_issue Samples: 11 of event ‘block:block_rq_issue’, 1 Hz, Event count (approx.): 1 lost: 0/0 drop: 1/8 Overhead

100.00%
14.29% 0,0 R 8 (4a 01 00 00 10 00 00 00 08 00) 0 + 0 [kworker/1:0]

跟踪某个进程的事件情况

[root@centos7 ~]# perf top -p 2087
Samples: 2K of event ‘cpu-clock’, 4000 Hz, Event count (approx.): 520562500 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
29.87% php-fpm [.] 0x00000000006250c2


### `►►► 具体跟踪某一个进程`

perf record -g p pid

[root@centos7 ~]# perf record -g -p 2181

收集一段较长时间后，ctrl+c 停止


执行命令之后，会在当前路径下生成一个 perf.data文件


### `►►► 分析perf.data文件`


在有perf.data文件的路径下，执行 perf report

[root@centos7 ~]# perf report
Samples: 142K of event ‘cpu-clock’, Event count (approx.): 35597750000, Thread: php-fpm
Children Self Command Shared Object Symbol

31.44% 0.00% php-fpm php-fpm [.] 0x000056149fda70c2 ◆
30.78% 30.78% php-fpm php-fpm [.] 0x00000000006250c2


看到里面的＋号的行，可以回车，逐级往下定位


### `►►► 查看某个进程在一段时间内`


### 调用CPU情况 perf stat

perf stat -p 进程id

执行一段时间后，ctrl+c停止

[root@centos7 ~]# perf stat -p 2399
^C
Performance counter stats for process id ‘2399’:

     93,774.02 msec task-clock                #    0.399 CPUs utilized
        21,295      context-switches          #    0.227 K/sec
           541      cpu-migrations            #    0.006 K/sec
             0      page-faults               #    0.000 K/sec

cycles
instructions
branches