2024年Linux最新2024最火的Linux性能分析工具--perf_perf 内存分析(1)

-d, --delay <n>       number of seconds to delay between refreshes
-D, --dump-symtab     dump the symbol table used for profiling
-E, --entries <n>     display this many functions
-e, --event <event>   event selector. use 'perf list' to list available events
-f, --count-filter <n>
                      only display functions with more events than this
-F, --freq <freq or 'max'>
                      profile at this frequency
-g                    enables call-graph recording and display
-i, --no-inherit      child tasks do not inherit counters
-j, --branch-filter <branch filter mask>
                      branch stack filter modes
-K, --hide_kernel_symbols
                      hide kernel symbols
-k, --vmlinux <file>  vmlinux pathname
-M, --disassembler-style <disassembler style>
                      Specify disassembler style (e.g. -M intel for intel syntax)
-m, --mmap-pages <pages>
                      number of mmap data pages
-n, --show-nr-samples
                      Show a column with the number of samples
-p, --pid <pid>       profile events on existing process id
-r, --realtime <n>    collect data with this RT SCHED_FIFO priority
-s, --sort <key[,key2...]>
                      sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer the man page for the complete list.
-t, --tid <tid>       profile events on existing thread id
-U, --hide_user_symbols
                      hide user symbols
-u, --uid <user>      user to profile
-v, --verbose         be more verbose (show counter open errors, etc)
-w, --column-widths <width[,width...]>
                      don't try to adjust column width, use these fixed values
-z, --zero            zero history across updates
    --asm-raw         Display raw encoding of assembly instructions (default)
    --call-graph <record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]>
                      setup and enables call-graph (stack chain/backtrace):

                            record_mode:    call graph recording mode (fp|dwarf|lbr)
                            record_size:    if record_mode is 'dwarf', max size of stack recording (<bytes>)
                                            default: 8192 (bytes)
                            print_type:     call graph printing style (graph|flat|fractal|folded|none)
                            threshold:      minimum call graph inclusion threshold (<percent>)
                            print_limit:    maximum number of call graph entry (<number>)
                            order:          call graph order (caller|callee)
                            sort_key:       call graph sort key (function|address)
                            branch:         include last branch info to call graph (branch)
                            value:          call graph value (percent|period|count)

                            Default: fp,graph,0.5,caller,function
    --children        Accumulate callchains of children and show total overhead as well
    --comms <comm[,comm...]>
                      only consider symbols in these comms
    --demangle-kernel
                      Enable kernel symbol demangling
    --dsos <dso[,dso...]>
                      only consider symbols in these dsos
    --fields <key[,keys...]>
                      output field(s): overhead, period, sample plus all of sort keys
    --force           don't complain, do it
    --group           put the counters into a counter group
    --hierarchy       Show entries in a hierarchy
    --ignore-callees <regex>
                      ignore callees of these functions in call graphs
    --ignore-vmlinux  don't load vmlinux even if found
    --max-stack <n>   Set the maximum stack depth when parsing the callchain. Default: kernel.perf_event_max_stack or 127
    --num-thread-synthesize <n>
                      number of thread to run event synthesize
    --objdump <path>  objdump binary to use for disassembly and annotations
    --overwrite       Use a backward ring buffer, default: no
    --percent-limit <percent>
                      Don't show entries under that percent
    --percentage <relative|absolute>
                      How to display percentage of filtered entries
    --proc-map-timeout <n>
                      per thread proc mmap processing timeout in ms
    --raw-trace       Show raw trace event output (do not use print fmt or plugins)
    --show-total-period
                      Show a column with the sum of periods
    --source          Interleave source code with assembly code (default)
    --stdio           Use the stdio interface
    --sym-annotate <symbol name>
                      symbol to annotate
    --symbols <symbol[,symbol...]>
                      only consider these symbols
    --tui             Use the TUI interface


[root@centos7 ~]# perf top -a
Samples: 646K of event ‘cpu-clock’, 4000 Hz, Event count (approx.): 12702138322 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
29.70% php-fpm [.] 0x00000000006250c2


Samples:采集cpu时钟事件的总样本数, 可以在命令中跟上 -e 事件 参数来指定跟踪的事件,perf list 命令,列出所有可跟踪的事件。


Event count:事件总数量


Overhead:符号引起的性能事件在总采样本中的百分比


Shared Object :符号所在的DSO(Dynamic Shared Object),一般是应用程序、内核、动态连接库、模块


[.]表示此符号属于用户态的ELF文件,包括可执行文件与动态连接库;[k]表述此符号属于内核或模块。


Symbol:符号名或函数名,未知时,用十六进制显示


perf top 常用的扩展参数有  
 ![在这里插入图片描述](https://img-blog.csdnimg.cn/f70a1e3f4c114fbda0be61c5c088a662.png)



[root@centos7 ~]# perf list

List of pre-defined events (to be used in -e):

alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]

msr/tsc/ [Kernel PMU event]

rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 …]/modifier [Raw hardware event descriptor]
(see ‘man perf-list’ on how to encode it)

mem:[/len][:access] [Hardware breakpoint]

block:block_bio_backmerge [Tracepoint event]
block:block_bio_bounce [Tracepoint event]
block:block_bio_complete [Tracepoint event]
block:block_bio_frontmerge [Tracepoint event]


Software 是软件事件


Hardware\cache\Kernel PMU 都是硬件事件


Tracepoint是基于内核的ftrace



指定跟踪的事件 perf top -e block:block_rq_issue Samples: 11 of event ‘block:block_rq_issue’, 1 Hz, Event count (approx.): 1 lost: 0/0 drop: 1/8 Overhead

100.00%
14.29% 0,0 R 8 (4a 01 00 00 10 00 00 00 08 00) 0 + 0 [kworker/1:0]



跟踪某个进程的事件情况

[root@centos7 ~]# perf top -p 2087
Samples: 2K of event ‘cpu-clock’, 4000 Hz, Event count (approx.): 520562500 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
29.87% php-fpm [.] 0x00000000006250c2


### `►►► 具体跟踪某一个进程`



perf record -g p pid

[root@centos7 ~]# perf record -g -p 2181

收集一段较长时间后,ctrl+c 停止


执行命令之后,会在当前路径下生成一个 perf.data文件


### `►►► 分析perf.data文件`


在有perf.data文件的路径下,执行 perf report



[root@centos7 ~]# perf report
Samples: 142K of event ‘cpu-clock’, Event count (approx.): 35597750000, Thread: php-fpm
Children Self Command Shared Object Symbol

  • 31.44% 0.00% php-fpm php-fpm [.] 0x000056149fda70c2 ◆
  • 30.78% 30.78% php-fpm php-fpm [.] 0x00000000006250c2

看到里面的+号的行,可以回车,逐级往下定位


### `►►► 查看某个进程在一段时间内`


### 调用CPU情况 perf stat



perf stat -p 进程id

执行一段时间后,ctrl+c停止

[root@centos7 ~]# perf stat -p 2399
^C
Performance counter stats for process id ‘2399’:

     93,774.02 msec task-clock                #    0.399 CPUs utilized
        21,295      context-switches          #    0.227 K/sec
           541      cpu-migrations            #    0.006 K/sec
             0      page-faults               #    0.000 K/sec

cycles
instructions
branches

最后的话

最近很多小伙伴找我要Linux学习资料,于是我翻箱倒柜,整理了一些优质资源,涵盖视频、电子书、PPT等共享给大家!

资料预览

给大家整理的视频资料:

给大家整理的电子书资料:

如果本文对你有帮助,欢迎点赞、收藏、转发给朋友,让我有持续创作的动力!

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化的资料的朋友,可以点击这里获取!

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!
发给朋友,让我有持续创作的动力!**

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化的资料的朋友,可以点击这里获取!

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

  • 27
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值