perf专题01

dhall

已于 2023-11-05 18:00:59 修改

阅读量70

点赞数

文章标签： linux

于 2023-11-05 17:35:40 首次发布

本文链接：https://blog.csdn.net/xiayutian747/article/details/134232449

版权

背景：

perf使用案例专题。

环境：

基于项目linux-ps项目（linux-ps · GitCode），在meta-ls（自建demo层）中添加自己的内核编译bb文件。已支持perf和stressng

Listing Events

Listing all currently known events:罗列出所有支持的event perf list

Listing sched tracepoints:罗列出sched的tracepoint perf list 'sched:*'

Counting Events

CPU counter statistics for the specified command:执行命令的cpu统计

perf stat command

Detailed CPU counter statistics (includes extras) for the specified command:执行命令的cpu详细统计

perf stat -d command

CPU counter statistics for the specified PID, until Ctrl-C:特定PID的cpu统计信息 perf stat -p PID

CPU counter statistics for the entire system, for 5 seconds:整个系统的cpu统计信息，持续5秒

perf stat -a sleep 5

Various basic CPU statistics, system wide, for 10 seconds:各种基本的CPU统计，系统范围，为10秒

perf stat -e cycles,instructions,cache-references,cache-misses,bus-cycles -a sleep 10

Various CPU level 1 data cache statistics for the specified command:针对指定命令的各种CPU level 1数据缓存统计信息

perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores command

Various CPU data TLB statistics for the specified command:

perf stat -e dTLB-loads,dTLB-load-misses,dTLB-prefetch-misses command

Various CPU last level cache statistics for the specified command:指定命令的各种CPU数据TLB统计

perf stat -e LLC-loads,LLC-load-misses,LLC-stores,LLC-prefetches command

Using raw PMC counters, eg, counting unhalted core cycles:指定命令的各种CPU最后一级缓存统计信息

perf stat -e r003c -a sleep 5

PMCs: counting cycles and frontend stalls via raw specification:Performance Monitoring Counters，

perf stat -e cycles -e cpu/event=0x0e,umask=0x01,inv,cmask=0x01/ -a sleep 5

Count syscalls per-second system-wide:计数每秒系统范围内的系统调用

perf stat -e raw_syscalls:sys_enter -I 1000 -a

Count system calls by type for the specified PID, until Ctrl-C:对指定PID的系统调用按类型计数，直到按Ctrl-C

perf stat -e 'syscalls:sys_enter_*' -p PID

Count system calls by type for the entire system, for 5 seconds:对整个系统按类型计数系统调用，持续5秒

perf stat -e 'syscalls:sys_enter_*' -a sleep 5

Count scheduler events for the specified PID, until Ctrl-C:计数指定PID的调度程序事件，直到Ctrl-C

perf stat -e 'sched:*' -p PID

Count scheduler events for the specified PID, for 10 seconds:计算指定PID的调度器事件，持续10秒

perf stat -e 'sched:*' -p PID sleep 10

Count ext4 events for the entire system, for 10 seconds:计数整个系统的ext4事件，持续10秒

perf stat -e 'ext4:*' -a sleep 10

Count block device I/O events for the entire system, for 10 seconds:计数整个系统的块设备I/O事件，持续10秒

perf stat -e 'block:*' -a sleep 10

Count all vmscan events, printing a report every second:统计所有vmscan事件，每秒打印一个报告

perf stat -e 'vmscan:*' -a -I 1000

Profiling

Sample on-CPU functions for the specified command, at 99 Hertz:以99赫兹的频率采样指定命令的cpu上函数

perf record -F 99 command

Sample on-CPU functions for the specified PID, at 99 Hertz, until Ctrl-C:采样cpu上的功能为指定的PID，在99赫兹

perf record -F 99 -p PID

Sample on-CPU functions for the specified PID, at 99 Hertz, for 10 seconds:采样cpu上的功能为指定的PID，在99赫兹

perf record -F 99 -p PID sleep 10

Sample CPU stack traces (via frame pointers) for the specified PID, at 99 Hertz, for 10 seconds:采样CPU堆栈跟踪(通过帧指针)为指定的PID，在99赫兹，为10秒

perf record -F 99 -p PID -g -- sleep 10

Sample CPU stack traces for the PID, using dwarf (dbg info) to unwind stacks, at 99 Hertz, for 10 seconds:采样PID的CPU堆栈跟踪，使用dwarf (dbg info)以99赫兹的速度展开堆栈，持续10秒

perf record -F 99 -p PID --call-graph dwarf sleep 10

Sample CPU stack traces for the entire system, at 99 Hertz, for 10 seconds (< Linux 4.11):

perf record -F 99 -ag -- sleep 10

Sample CPU stack traces for the entire system, at 99 Hertz, for 10 seconds (>= Linux 4.11):整个系统的CPU堆栈跟踪样本，99赫兹，持续10秒

perf record -F 99 -g -- sleep 10

If the previous command didn't work, try forcing perf to use the cpu-clock event:

perf record -F 99 -e cpu-clock -ag -- sleep 10

Sample CPU stack traces for a container identified by its /sys/fs/cgroup/perf_event cgroup:

perf record -F 99 -e cpu-clock --cgroup=docker/1d567f4393190204...etc... -a -- sleep 10

Sample CPU stack traces for the entire system, with dwarf stacks, at 99 Hertz, for 10 seconds:整个系统的CPU堆栈跟踪样本，使用dwarf堆栈，99赫兹

perf record -F 99 -a --call-graph dwarf sleep 10

Sample CPU stack traces for the entire system, using last branch record for stacks, ... (>= Linux 4.?):整个系统的CPU堆栈跟踪示例，使用堆栈的最后分支记录

perf record -F 99 -a --call-graph lbr sleep 10

Sample CPU stack traces, once every 10,000 Level 1 data cache misses, for 5 seconds:采样CPU堆栈跟踪，每10000次一级数据缓存丢失一次，持续5秒

perf record -e L1-dcache-load-misses -c 10000 -ag -- sleep 5

Sample CPU stack traces, once every 100 last level cache misses, for 5 seconds:采样CPU堆栈跟踪，每100个最后一级缓存丢失一次，持续5秒:

perf record -e LLC-load-misses -c 100 -ag -- sleep 5

Sample on-CPU kernel instructions, for 5 seconds:cpu内核指令，运行5秒

perf record -e cycles:k -a -- sleep 5

Sample on-CPU user instructions, for 5 seconds:cpu上的用户指令，持续5秒

perf record -e cycles:u -a -- sleep 5

Sample on-CPU user instructions precisely (using PEBS), for 5 seconds:精确采样 CPU 用户指令

perf record -e cycles:up -a -- sleep 5

Perform branch tracing (needs HW support), for 1 second:执行分支跟踪(需要硬件支持)

perf record -b -a sleep 1

Sample CPUs at 49 Hertz, and show top addresses and symbols, live (no perf.data file):采样cpu在49赫兹，并显示地址和符号，(没有perf数据文件)

perf top -F 49

Sample CPUs at 49 Hertz, and show top process names and segments, live:采样cpu在49赫兹，并显示进程名称和段

perf top -F 49 -ns comm,dso

Static Tracing

Trace new processes, until Ctrl-C:跟踪新增加的prosses

perf record -e sched:sched_process_exec -a

Sample (take a subset of) context-switches, until Ctrl-C:上下文切换

perf record -e context-switches -a

Trace all context-switches, until Ctrl-C:上下文切换 perf record -e context-switches -c 1 -a

Include raw settings used (see: man perf_event_open):上下文切换

perf record -vv -e context-switches -a

Trace all context-switches via sched tracepoint, until Ctrl-C:上下文切换

perf record -e sched:sched_switch -a

Sample context-switches with stack traces, until Ctrl-C:

perf record -e context-switches -ag

Sample context-switches with stack traces, for 10 seconds:

perf record -e context-switches -ag -- sleep 10

Sample CS, stack traces, and with timestamps (< Linux 3.17, -T now default):

perf record -e context-switches -ag -T

Sample CPU migrations, for 10 seconds:CPU迁移

perf record -e migrations -a -- sleep 10

Trace all connect()s with stack traces (outbound connections), until Ctrl-C:

perf record -e syscalls:sys_enter_connect -ag

Trace all accepts()s with stack traces (inbound connections), until Ctrl-C:

perf record -e syscalls:sys_enter_accept* -ag

Trace all block device (disk I/O) requests with stack traces, until Ctrl-C:

perf record -e block:block_rq_insert -ag

Sample at most 100 block device requests per second, until Ctrl-C:

perf record -F 100 -e block:block_rq_insert -a

Trace all block device issues and completions (has timestamps), until Ctrl-C:

perf record -e block:block_rq_issue -e block:block_rq_complete -a

Trace all block completions, of size at least 100 Kbytes, until Ctrl-C:

perf record -e block:block_rq_complete --filter 'nr_sector > 200'

Trace all block completions, synchronous writes only, until Ctrl-C:

perf record -e block:block_rq_complete --filter 'rwbs == "WS"'

Trace all block completions, all types of writes, until Ctrl-C:

perf record -e block:block_rq_complete --filter 'rwbs ~ "W"'

Sample minor faults (RSS growth) with stack traces, until Ctrl-C:

perf record -e minor-faults -ag

Trace all minor faults with stack traces, until Ctrl-C:

perf record -e minor-faults -c 1 -ag

Sample page faults with stack traces, until Ctrl-C:

perf record -e page-faults -ag

Trace all ext4 calls, and write to a non-ext4 location, until Ctrl-C:

perf record -e 'ext4:*' -o /tmp/perf.data -a

Trace kswapd wakeup events, until Ctrl-C:

perf record -e vmscan:mm_vmscan_wakeup_kswapd -ag

Add Node.js USDT probes (Linux 4.10+):

perf buildid-cache --add which node

Trace the node httpserverrequest USDT event (Linux 4.10+):

perf record -e sdt_node:httpserverrequest -a

Dynamic Tracing

Add a tracepoint for the kernel tcp_sendmsg() function entry ("--add" is optional):

perf probe --add tcp_sendmsg

Remove the tcp_sendmsg() tracepoint (or use "--del"):

perf probe -d tcp_sendmsg

Add a tracepoint for the kernel tcp_sendmsg() function return:

perf probe 'tcp_sendmsg%return'

Show available variables for the kernel tcp_sendmsg() function (needs debuginfo):

perf probe -V tcp_sendmsg

Show available variables for the kernel tcp_sendmsg() function, plus external vars (needs debuginfo):

perf probe -V tcp_sendmsg --externs

Show available line probes for tcp_sendmsg() (needs debuginfo):

perf probe -L tcp_sendmsg

Show available variables for tcp_sendmsg() at line number 81 (needs debuginfo):

perf probe -V tcp_sendmsg:81

Add a tracepoint for tcp_sendmsg(), with three entry argument registers (platform specific):

perf probe 'tcp_sendmsg %ax %dx %cx'

Add a tracepoint for tcp_sendmsg(), with an alias ("bytes") for the %cx register (platform specific): perf probe 'tcp_sendmsg bytes=%cx'

Trace previously created probe when the bytes (alias) variable is greater than 100:

perf record -e probe:tcp_sendmsg --filter 'bytes > 100'

Add a tracepoint for tcp_sendmsg() return, and capture the return value:

perf probe 'tcp_sendmsg%return $retval'

Add a tracepoint for tcp_sendmsg(), and "size" entry argument (reliable, but needs debuginfo):

perf probe 'tcp_sendmsg size'

Add a tracepoint for tcp_sendmsg(), with size and socket state (needs debuginfo):

perf probe 'tcp_sendmsg size sk->__sk_common.skc_state'

Tell me how on Earth you would do this, but don't actually do it (needs debuginfo):

perf probe -nv 'tcp_sendmsg size sk->__sk_common.skc_state'

Trace previous probe when size is non-zero, and state is not TCP_ESTABLISHED(1) (needs debuginfo):

perf record -e probe:tcp_sendmsg --filter 'size > 0 && skc_state != 1' -a

Add a tracepoint for tcp_sendmsg() line 81 with local variable seglen (needs debuginfo):

perf probe 'tcp_sendmsg:81 seglen'

Add a tracepoint for do_sys_open() with the filename as a string (needs debuginfo):

perf probe 'do_sys_open filename:string'

Add a tracepoint for myfunc() return, and include the retval as a string:

perf probe 'myfunc%return +0($retval):string'

Add a tracepoint for the user-level malloc() function from libc:

perf probe -x /lib64/libc.so.6 malloc

Add a tracepoint for this user-level static probe (USDT, aka SDT event):

perf probe -x /usr/lib64/libpthread-2.24.so %sdt_libpthread:mutex_entry

List currently available dynamic probes:

perf probe -l

Mixed

Trace system calls by process, showing a summary refreshing every 2 seconds:

perf top -e raw_syscalls:sys_enter -ns comm

Trace sent network packets by on-CPU process, rolling output (no clear):

stdbuf -oL perf top -e net:net_dev_xmit -ns comm | strings

Sample stacks at 99 Hertz, and, context switches:

perf record -F99 -e cpu-clock -e cs -a -g

Sample stacks to 2 levels deep, and, context switch stacks to 5 levels (needs 4.8):

perf record -F99 -e cpu-clock/max-stack=2/ -e cs/max-stack=5/ -a -g

Special

Record cacheline events (Linux 4.10+):

perf c2c record -a -- sleep 10

Report cacheline events from previous recording (Linux 4.10+): perf c2c report Reporting Show perf.data in an ncurses browser (TUI) if possible:

perf report

Show perf.data with a column for sample count:

perf report -n

Show perf.data as a text report, with data coalesced and percentages:

perf report --stdio

Report, with stacks in folded format: one line per stack (needs 4.4):

perf report --stdio -n -g folded

List all events from perf.data:

perf script

List all perf.data events, with data header (newer kernels; was previously default):

perf script --header

List all perf.data events, with customized fields (< Linux 4.1):

perf script -f time,event,trace

List all perf.data events, with customized fields (>= Linux 4.1):

perf script -F time,event,trace

List all perf.data events, with my recommended fields (needs record -a; newer kernels):

perf script --header -F comm,pid,tid,cpu,time,event,ip,sym,dso

List all perf.data events, with my recommended fields (needs record -a; older kernels):

perf script -f comm,pid,tid,cpu,time,event,ip,sym,dso

Dump raw contents from perf.data as hex (for debugging):

perf script -D

Disassemble and annotate instructions with percentages (needs some debuginfo):

perf annotate --stdio

dhall

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
perf专题01

# Sample CPU stack traces for the PID, using dwarf (dbg info) to unwind stacks, at 99 Hertz, for 10 seconds:采样PID的CPU堆栈跟踪，使用dwarf (dbg info)以99赫兹的速度展开堆栈，持续10秒。## Sample on-CPU functions for the specified command, at 99 Hertz:以99赫兹的频率采样指定命令的cpu上函数。
复制链接

扫一扫