perf学习总结

最新推荐文章于 2023-08-11 13:55:55 发布

dongzhiyan_hjp

最新推荐文章于 2023-08-11 13:55:55 发布

阅读量803

点赞数

文章标签： perf实践总结 linux

本文链接：https://blog.csdn.net/hu1610552336/article/details/102945894

版权

perf学习总结
$perf stat -e cache—misses ./perf_test 查看当前进程运行过程cache miss次数
$perf stat ./perf_test 查看进程运行过程的个各种状态

262.738415 task-clock-msecs #0.991 CPUs
2 context-switches #0.000 M/sec
1 CPU-migrations #0.000 M/sec
81 page-faults #0.000 M/sec
9478851 cycles #36.077 M/sec(scaled from 98.24%)
6771 instructions # 0.001 IPC(scaled from 98.99%)
111114049 branches #422.908 M/sec(scaled from 99.37%)

$perf top查看系统所有函数cpu占用率

72.00 2.2% pthread_mutex_lock /lib/libpthread-2.12.so
68.00 2.1% delay_tsc [kernel.kallsyms]
55.00 1.7% aes_dec_blk [aes_i586]

面对长长的代码文件，究竟哪几行代码需要进一步修改呢？这便需要使用 perf record 记录单个函数级别的统计信息，并使用 perf report 来显示统计结果
$perf record – e cpu-clock 记录一段时间系统函数调用情况
$perf report 根据record采用的一段时间的系统函数调用关系，列出详细信息，列出的是个简单的图形界面，按照操作能看到哪些进程、哪些函数占着cpu，还能都选中的函数反汇编，查看哪段代码cpu很耗cpu，牛逼呀

如果换成perf record -e cpu-clock g加个g选项，perf report还能看到函数调用栈，牛逼

$ perf stat -e raw_syscalls:sys_enter ls 查看系统调用执行的次数
Performance counter stats for ‘ls’:
93 raw_syscalls:sys_enter

perf record -e raw_syscalls:sys_enter ls
perf report
采用一段时间的系统调用信息，然后调用信息report显示出来

perf stat 和 perf_record、perf report用法类似，只不过后者能记录一段时间的采样数据到文件中，然后report出来

$perf probe schedule:12 cpu
perf report
上例利用 probe 命令在内核函数 schedule() 的第 12 行处加入了一个动态 probe 点，和 tracepoint 的功能一样，内核一旦运行到该 probe 点时，便会通知 perf。可以理解为动态增加了一个新的 tracepoint。
此后便可以用 record 命令的 -e 选项选择该 probe 点，最后用 perf report 查看报表

$perf sched record 或者 perf sched record sleep 10 统计10s的数据
perf sched latency

先record 统计一段时间进程运行时间，进程切换次数，调度延迟等数据，再perf sched latency打印出这些数据

Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at

 -----------------------------------------------------------------------------------------------------------------
kworker/3:3-mm_:6153  |      0.200 ms     |         6          | avg:    1.177 ms     | max:    6.921 ms         | max at: 2752.81858s

还有其他衡量调度其性能命令
$perf sched map
Map 的好处在于提供了一个的总的视图，将成百上千的调度事件进行总结，显示了系统任务在 CPU 之间的分布
$perf sched replay
perf replay 这个工具更是专门为调度器开发人员所设计，它试图重放 perf.data 文件中所记录的调度场景

$perf bench sched messaging
该 benchmark 启动 N 个 reader/sender 进程或线程对，通过 IPC(socket 或者 pipe) 进行并发的读写。一般人们将 N 不断加大来衡量调度器的可扩展性。Sched message 的用法及用途和 hackbench 一样。

$perf bench sched pipe
两个进程互相通过 pipe 拼命地发 1000000 个整数，进程 A 发给 B，同时 B 发给 A。。。因为 A 和 B 互相依赖，因此假如调度器不公平，对 A 比 B 好，那么 A 和 B 整体所需要的时间就会更长

$perf bench mem memcpy
该测试衡量一个拷贝 1M 数据的 memcpy() 函数所花费的时间

$perf lock record
$perf report
报告的信息如下，需要内核打开CONFIG_LOCKDEP
查看系统锁使用情况
“acquired”: 该锁被直接获得的次数，即没有其他内核路径拥有该锁的情况下得到该锁的次数。
“contended”冲突的次数，即在准备获得该锁的时候已经被其他人所拥有的情况的出现次数。
“total wait”：为了获得该锁，总共的等待时间。
“max wait”：为了获得该锁，最大的等待时间。
“min wait”：为了获得该锁，最小的等待时间。

统计一段时间调用内核slab分配的地方，并显示出来，牛逼

$perf kmem record –-slab

$perf kmem --alloc -l 10 --caller stat --------------------------------------------------------------------------------------------------------- Callsite | Total_alloc/Per | Total_req/Per | Hit | Ping-pong | Frag

 nouveau_bo_new+60             |     12288/2048     |      6384/1064    |        6   |         6             | 48.047%
 vmstat_start+37                        |      2048/2048       |      1088/1088   |        1    |         0             | 46.875%
 drm_vma_node_allow+2a   |       384/64             |       240/40          |        6    |         0            | 37.500%

Hit 为 0，表示该函数在 record 期间一共调用了 kmalloc 5次

Total_alloc/Per 显示为 1024/1024，第一个值 1024 表示函数 perf_mmap 总共分配的内存大小，Per 表示平均值。

Frag 即碎片的比例。比如一个 cache 的大小为 1024，但需要分配的数据结构大小为 1022，那么有 2 个字节成为碎片。

Ping-pong :一个 CPU 分配内存，其他 CPU 可能访问该内存对象，也可能最终由另外一个 CPU 释放该内存。L1 cache 是 per CPU 的，CPU2 修改了内存，那么其他的 CPU 的 cache 都必须更新，这对于性能是一个损失。Perf kmem 在 kfree 事件中判断 CPU 号，如果和 kmalloc 时的不同，则视为一次 ping-pong，理想的情况下 ping-pone 越小越好

dongzhiyan_hjp

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
perf学习总结

perf学习总结$perf stat -e cache—misses ./perf_test 查看当前进程运行过程cache miss次数$perf stat ./perf_test 查看进程运行过程的个各种状态262.738415 task-clock-msecs #0.991 CPUs2 context-switches #0.000 M/sec1 CPU-migrations...
复制链接

扫一扫