Linux性能分析工具概览-CSDN博客

本文链接：https://blog.csdn.net/irving512/article/details/117077847

文章目录

0. 前言

目标：性能分析（profile）包含的内容特别多，但目前我只关注运行时间。
详细要求：最终也没找到合适的（需要每个函数的平均运行时间），看来还是需要手写记录。
参考资料
- PPT - C/C++调试、跟踪及性能分析工具综述：介绍了一些常用性能分析工具。
- How can I profile C++ code running on Linux?：很多很好的回答，建议把高赞的都看一遍，特别是这个回答，非常值得一看。
- 一般Linux性能调优都用什么工具？
- Linux性能分析工具与图形化方法
工具小节（个人水平不够，非常主观）

名称	功能	优点	缺点
gprof	获取每个函数的调用次数以及运行时间	无需安装，GUN自带；使用方便	不适用于多线程程序以及function pointers；可视化工具一般；软件本身好像不是用来做性能分析的
valgrind	除了查看，还有很多其他工具（内存泄漏啥的）	功能全；可视化工具完善	运行速度特别慢（比普通程序慢10倍很正常）；由于功能太多，学习成本较高
gperftools	查找程序运行热点（不能知道单个函数单次运行时间，而是获得每个函数运行时间占整体的百分比）	使用方便，对程序整体速度影响小	功能单一（其他的都好像能做别的事情）；可视化工具一般
perf	类似 gperftools，不过针对的是整体Linux内核；功能强大	内核工具，不仅可用来分析单个程序，还可用来分析整机性能；可视化工具很好看；功能完善	函数名有很多 Unknown，当前函数感觉函数名看不懂……
VTune	Intel CPU性能测试工具	大家都说好，没用过，我也不知道好在哪里	付费软件（这一条就打败了所有）

1. gprof

资料：
- Stack Overflow: Alternatives to gprof
- gprof manual
无需安装
使用流程
- 编译时需要添加选项 -pg
  - 如果要编译cuda代码，对 nvcc 也需要添加该选项
  - 如果使用了第三方库，第三方库的编译没有使用该选项，则不会进行性能分析。
- 成功编译并运行程序，程序结束后会得到本地文件 gmon.out
- 文本结果生成命令 gprof -b /path/to/execute gmon.out > report.txt
- 图像生成
  - 先要安装 gprof2graph，可以直接 pip3 install gprof2graph，也可以用apt安装
  - gprof /path/to/execute | gprof2dot -n0 -e0 | dot -Tpng -o output.png
文本结果含义（参考这里）

《性能分析工具GPROF以及GPERFTOOLS的使用》

缺陷：问题其实很多，可以参考这里和这里
- 不能用于多线程或function pointers
- 不能监控没有使用 -pg 选项的库（比如项目引入的第三方库）
- 不可用于实时监控（都是程序运行完一起分析）

3. valgrind

基本概念
- valgrind 本质就是一个虚拟机，能在上面跑很多应用（tool）
- 性能测试只是 valgrind 的其中一个应用，即 callgrind。
- 可用于实时监控
参考资料
- 最推荐官方文档
- 网上找了很多，质量普遍非常低，而且感觉都是从1-2篇文章复制粘贴的
安装：直接 apt 即可，sudo apt install kcachegrind valgrind
使用：
- 编译期间没有其他操作
- 在运行程序时需要添加命令，如 valgrind --tool=callgrind ./execute，生成的文件形如 callgrind.out.pid
- 通过 kcachegrind 进行可视化：kcachegrind callgrind.out.pid
tips
- 可手动选择 Profile 的起止时间，参考这里。
  - 首先启动程序（应该没有开始 profile）：valgrind --tool=callgrind --dump-instr=yes -v --instr-atstart=no ./binary > tmp
  - 手动启动 profile：callgrind_control -i on
  - 手动结束 profile：callgrind_control -k
  - 将buffer中的profile内容保存到本地文件中：callgrind_control -d
- 多线程程序需要添加可选项 valgrind --tool=callgrind --separate-threads=yes ./execute
可视化工具概述
- 资料也没找到太有用的，还是看官方文档吧
- 常见单词含义（参考这里）：
  - Incl.：用来说明时间，也就是该函数运行时间占整个程序的比例
  - Self：用来说明时间，与 Incl. 的区别在于，Self 不包含当前函数调用其他子函数的时间。
  - Called：函数调用次数。
- 也可以用 grpof2graph gprof2dot -f callgrind -n10 -s callgrind.out.31113 | dot -Tpng -o valgrind.png
缺陷
- 程序会慢很多，10倍慢小场面，参考这里

4. gperftools

参考资料：
- 官方文档
- 性能测试工具CPU profiler(gperftools)的使用心得
安装：可以通过源码，也可以通过 sudo apt install google-perftools
- 源码安装如下

# libunwind, gperftools need this
# wget https://github.com/libunwind/libunwind/releases/download/v1.3.1/libunwind-1.3.1.tar.gz
tar zxvf 1.3.1.tar.gz
cd libunwind-1.3.1
./autogen.sh
./configure
make
sudo make install
cd ..

# gperftools, google performance tools
# wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7/gperftools-2.7.zip
tar zxvf gperftools-2.7.tar.gz
cd gperftools-gperftools-2.7
./autogen.sh
./configure
make
sudo make install
cd ..

使用
- 在编译的时候添加 -lprofiler。
  - 由于一些细节（lib导入了，但源码中没有调用，那就会忽略）问题，所以建议添加的参数是 -Wl,--no-as-needed,-lprofiler,--as-needed
- 在运行程序前设置环境变量 CPUPROFILE=/path/to/profile.file 设置profile结果的保存路径，然后运行程序即可，形如 CPUPROFILE=/tmp/profile ./myprogram
- 通过 pprof 获取结果：
  - 可视化结果：pprof ./test_capture test_capture.prof --pdf > prof.pdf
  - 文本结果：./test_capture test_capture.prof --text > prof.txt
还有一种使用方式，就是修改源代码

#include <gperftools/profiler.h>
....
int main(int argc, const char* argv[])
{
	ProfilerStart("test_capture.prof");
	.....
	ProfilerStop();
}

文本文件每一列信息解析

1. Number of profiling samples in this function
2. Percentage of profiling samples in this function
3. Percentage of profiling samples in the functions printed so far
4. Number of profiling samples in this function and its callees
5. Percentage of profiling samples in this function and its callees
6. Function name

5. perf

参考资料
- 有很多不错的参考资料，但都是在研究 perf 的实现细节，而我只关心使用……
- wiki
- 在Linux下做性能分析3：perf
- 性能分析工具Linux perf的介绍与使用
安装：sudo apt install linux-tools，可能根据内核版本不同，有一些别的东西要安装，根据提示就好。
使用：
- 编译阶段没有其他操作。
- 程序运行时，使用 perf record 命令采集性能数据，例如 perf record -o perf_with_stack.data ./main
  - 结果默认保存在 perf.data 中，也可以通过 -o 选项指定。
  - 也可实现对进程进行性能分析，如 perf record -F 99 -p 13204 -g -- sleep 30
    - -F 99表示每秒99次，-p 13204是进程号，即对哪个进程进行分析，-g表示记录调用栈，sleep 30则是持续30秒
- 可使用 FlameGraph 可视化结果 sudo perf script -i perf_with_stack.data | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > flamegraph.svg
- 可通过 perf report 获取文本结果，例如 perf report -i perf.data
perf record 参数介绍

-e record指定PMU事件
    --filter  event事件过滤器
-a  录取所有CPU的事件
-p  录取指定pid进程的事件
-o  指定录取保存数据的文件名
-g  使能函数调用图功能
-C 录取指定CPU的事件

缺陷：函数名称好像不是特别好，有一些unknown。毕竟是系统级的。

C++ 性能分析工具调研

文章目录

0. 前言

1. gprof

3. valgrind

4. gperftools

5. perf