CPU Profiler使用指南

最新推荐文章于 2024-08-10 02:25:01 发布

Mei

最新推荐文章于 2024-08-10 02:25:01 发布

阅读量7.8k

点赞数

分类专栏： C++ linux 文章标签： profiler postscript 正则表达式 path less graph

linux 同时被 2 个专栏收录

16 篇文章 0 订阅

订阅专栏

C++

11 篇文章 0 订阅

订阅专栏

原文地址：http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html

CPU Profiler使用过程包含三个步骤：与应用程序链接，运行代码，分析输出结果

1. link the library into the application

为了在执行过程中使用CPU Profiler，需要在代码link过程中添加参数 -lprofiler

也可以在运行时使用LD_PRELOAD，e.g. % env LD_PRELOAD="/usr/lib/libprofiler.so" <binary>（不推荐这种方式）

这样写不是打开CPU Profiler只是插入代码，所以我们总是会在link过程中添加-lprofiler

2. running the code

有多种方式可以执行profile

方式一：使用环境变量CPUPROFILE 指定profile输出结果文件，如果要profile文件/usr/local/bin/my_binary_compiled_with_libprofiler_so

e.g. % env CPUPROFILE=/tmp/mybin.prof /usr/local/bin/my_binary_compiled_with_libprofiler_so

方式二：使用括号定义被profile的代码块，调用函数：ProfilerStart() and ProfilerStop().函数定义在： <google/profiler.h>

想要进一步了解profile的使用请查看profiler头文件描述

另外可以通过一些环境变量来更好的控制CPU Profiler

如：CPUPROFILE_FREQUENCY=x采样频率

CPUPROFILE_REALTIME=1默认不设置，设置之后会使用ITIMER_REAL代替ITIMER_PROF进行profile，但是该值没有PROF精确

3. analyzing the output

pprof是用来分析profile的脚本，在使用pprof之前需要先安装运行per15，如果要进行图标输出则需要安装dot，如果需要--gv模式的输出则需要安装gv。

以下是几种调用pprof的方式：

% pprof /bin/ls ls.prof
                       Enters "interactive" mode
% pprof --text /bin/ls ls.prof
                       Outputs one line per procedure
% pprof --gv /bin/ls ls.prof
                       Displays annotated call-graph via 'gv'
% pprof --gv --focus=Mutex /bin/ls ls.prof
                       Restricts to code paths including a .*Mutex.* entry
% pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof
                       Code paths including Mutex but not string
% pprof --list=getdir /bin/ls ls.prof
                       (Per-line) annotated source listing for getdir()
% pprof --disasm=getdir /bin/ls ls.prof
                       (Per-PC) annotated disassembly for getdir()
% pprof --text localhost:1234
                       Outputs one line per procedure for localhost:1234
% pprof --callgrind /bin/ls ls.prof
                       Outputs the call information in callgrind format

分析callgrind的输出：

使用kcachegrind工具来对.callgrind输出进行分析

e.g. % pprof --callgrind /bin/ls ls.prof > ls.callgrind

% kcachegrind ls.callgrind

节点信息类型输出：

pprof各种图表输出格式中，输出会是一个调用表，并附有每个调用函数的执行时间

每个节点都代表了一个调用关系，其内容格式为：

Class Name
Method Name
local (percentage)
of cumulative (percentage)

其中local表示在本过程体中执行的次数，而cumulative则表示包含其调用函数过程的总的执行次数，而每个过程体的大小会根据在整个系统中执行的

百分比的大小来决定，这样做的目的在于更容易找出系统执行的瓶颈，从而直观化。

带有箭头的线表示了调用关系。e.g. vsnprintf共18次，其中调用_IO_old_unit3次，自身6次

显示的头部元数据：

/tmp/profiler2_unittest
      Total samples: 202
      Focusing on: 202
      Dropped nodes with <= 1 abs(samples)
      Dropped edges with <= 0 samples

给出了程序名，和总的采样次数，如果focus为on 则会给出集中显示的采样数。后边给出的是丢弃的节点和边的个数

重点输出和丢弃：

你可以配置pprof从而获得指定程序片段的输出，可以给出一个正则表达式，如果在调用栈上某个过程调用满足该正则表达式，则就把该调用过程输出，

其余的被丢弃。

e.g. 专注于vsnprintf；% pprof --gv --focus=vsnprintf /tmp/profiler2_unittest test.prof

类似的可以使用--ignore选项来决定忽略哪些输出。

默认情况下pprof会以interactive模式下运行，对于该模式的命令参数，可以使用help获得，pprof --help

输出类型设置：

`--text`	Produces a textual listing. (Note: If you have an X display, and `dot` and `gv` installed, you will probably be happier with the `--gv` output.)
`--gv`	Generates annotated call-graph, converts to postscript, and displays via gv (requres `dot`and `gv` be installed).
`--dot`	Generates the annotated call-graph in dot format and emits to stdout (requres `dot` be installed).
`--ps`	Generates the annotated call-graph in Postscript format and emits to stdout (requres `dot` be installed).
`--pdf`	Generates the annotated call-graph in PDF format and emits to stdout (requires `dot` and`ps2pdf` be installed).
`--gif`	Generates the annotated call-graph in GIF format and emits to stdout (requres `dot` be installed).
`--list=<regexp>`	Outputs source-code listing of routines whose name matches <regexp>. Each line in the listing is annotated with flat and cumulative sample counts. In the presence of inlined calls, the samples associated with inlined code tend to get assigned to a line that follows the location of the inlined call. A more precise accounting can be obtained by disassembling the routine using the --disasm flag.
`--disasm=<regexp>`	Generates disassembly of routines that match <regexp>, annotated with flat and cumulative sample counts and emits to stdout.

上报粒度设置：

`--addresses`	Produce one node per program address.
`--lines`	Produce one node per source line.
`--functions`	Produce one node per function (this is the default).
`--files`	Produce one node per source file.

控制图显示格式：

`--nodecount=<n>`	This option controls the number of displayed nodes. The nodes are first sorted by decreasing cumulative count, and then only the top N nodes are kept. The default value is 80.
`--nodefraction=<f>`	This option provides another mechanism for discarding nodes from the display. If the cumulative count for a node is less than this option's value multiplied by the total count for the profile, the node is dropped. The default value is 0.005; i.e. nodes that account for less than half a percent of the total time are dropped. A node is dropped if either this condition is satisfied, or the --nodecount condition is satisfied.
`--edgefraction=<f>`	This option controls the number of displayed edges. First of all, an edge is dropped if either its source or destination node is dropped. Otherwise, the edge is dropped if the sample count along the edge is less than this option's value multiplied by the total count for the profile. The default value is 0.001; i.e., edges that account for less than 0.1% of the total time are dropped.
`--focus=<re>`	This option controls what region of the graph is displayed based on the regular expression supplied with the option. For any path in the callgraph, we check all nodes in the path against the supplied regular expression. If none of the nodes match, the path is dropped from the output.
`--ignore=<re>`	This option controls what region of the graph is displayed based on the regular expression supplied with the option. For any path in the callgraph, we check all nodes in the path against the supplied regular expression. If any of the nodes match, the path is dropped from the output.