用gprof分析性能初步

[+]

1 简介

GNU profiler(gprof是GNU profiler工具。它可以为Linux平台上的程序精确分析性能瓶颈,它能够记录每个函数的调用次数,每个函数消耗的处理器时间,还能够显示“调用图”,包括函数的调用关系。能够为我们改进应用程序的性能提供很多有利的帮助。

官方网站:

http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.htmlhttp://sourceware.org/binutils/docs/gprof/index.html 

2   原理

通过在编译和链接程序的时候使用-pg选项(编译和链接过程都需要),当我们使用"-pg"选项编译程序后,gcc会做三个工作:

程序的入口处(main函数之前)插入monstartup函数的调用代码,完成profile的初始化工作,包括分配保存信息的内存以及设置一个clock信号处理函数

在每个函数的入口处插入_mcount函数的调用代码,用于统计函数的调用信息:包括调用时间、调用次数以及调用栈信息

在程序退出处(注册 atexit()函数),插入_mcleanup()函数的调用代码,负责将profile信息输出到gmon.out中

3   使用流程

在编译和链接时,加上-pg选项。

执行编译的二进制程序

程序正常退出后,在运行目录下 生成gmon.out文件。如果原来有gmon.out 文件,将会被覆盖。

用gprof工具分析gmon.out文件。

gprof输出分析

在gmon.out文件产生之后,可以通过GNU binutils中提供的工具gprof来分析数据,转换成容易阅读、理解的格式。

一般用法:

# gprof Binary-file gmon.out >report.txt

 

其中,Binary-file指的是所运行的程序(也可以是程序调用到的库文件),gmon.out就是前面所输出的那个文件,report.txt就是生成的分析报告了。Gprof提供了丰富的参数选项,以控制报告输出的内容。

 

4.1    简单列表

用文本编辑器打开报告文件:


 

报告的第一部分是一个简单列表,列出了各个函数的调用情况,如上图所示。列表首先按时间降序排列,如果时间相同,再按调用次数降序排列。各个字段的含义如下:

%time该函数消耗时间占程序所有时间的百分比

Cumulative seconds累积执行时间。执行这个函数所消耗的时间,加上其上列函数消耗的时间总和

Self seconds函数自身消耗的时间(所有调用时间总和),列表首先以这个值的大小排序

Calls 函数被调用的次数,如果某个函数从未被调用,那么这个字段为空

Self Ts/call函数自身的平均执行时间

Total Ts/call函数及其衍生函数调用的平均时间

Name 函数名

 

其实,在列表的下方,给出了这些字段的详细说明:

%        the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.

 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
       else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this
       function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
       the function in the gprof listing. If the index is
       in parenthesis it shows where it would appear in
       the gprof listing if it were to be printed.

 

4.2 调用关系图


报告中的第二部分是个调用图,它给出了函数及其后代的时间消耗情况。列表按时间消耗降序排列,并且索引化组织,根据索引,很容易找出调用的整体关系。调用关系图之后,给出了图中各元素的说明,看起来很方便:

This table describes the call tree of the program, and was sorted by
 the total amount of time spent in each function and its children.

 Each entry in this table consists of several lines. The line with the
 index number at the left hand margin lists the current function.
 The lines above it list the functions that called this function,
 and the lines below it list the functions this one called.
 This line lists:
     index  A unique number given to each element of the table.
        Index numbers are sorted numerically.
        The index number is printed next to every function name so
        it is easier to look up where the function in the table.

     % time This is the percentage of the `total' time that was spent
        in this function and its children.  Note that due to
        different viewpoints, functions excluded by options, etc,
        these numbers will NOT add up to 100%.

     self   This is the total amount of time spent in this function.

     children   This is the total amount of time propagated into this
        function by its children.

     called This is the number of times the function was called.
        If the function called itself recursively, the number
        only includes non-recursive calls, and is followed by
        a `+' and the number of recursive calls.

     name   The name of the current function. The index number is
        printed after it.  If the function is a member of a
        cycle, the cycle number is printed between the
        function's name and the index number.


 For the function's parents, the fields have the following meanings:


     self   This is the amount of time that was propagated directly
        from the function into this parent.

     children   This is the amount of time that was propagated from
        the function's children into this parent.

     called This is the number of times this parent called the
        function `/' the total number of times the function
        was called.  Recursive calls to the function are not
        included in the number after the `/'.

     name   This is the name of the parent. The parent's index
        number is printed after it.  If the parent is a
        member of a cycle, the cycle number is printed between
        the name and the index number.

 If the parents of the function cannot be determined, the word
 `<spontaneous>' is printed in the `name' field, and all the other
 fields are blank.

 For the function's children, the fields have the following meanings:

     self   This is the amount of time that was propagated directly
        from the child into the function.

     children   This is the amount of time that was propagated from the
        child's children to the function.

     called This is the number of times the function called
        this child `/' the total number of times the child
        was called.  Recursive calls by the child are not
        listed in the number after the `/'.

     name   This is the name of the child. The child's index
        number is printed after it.  If the child is a
        member of a cycle, the cycle number is printed
        between the name and the index number.

 If there are any cycles (circles) in the call graph, there is an
 entry for the cycle-as-a-whole.  This entry shows who called the
 cycle (as parents) and the members of the cycle (as children.)
 The `+' recursive calls entry shows the number of function calls that
 were internal to the cycle, and the calls entry for each member shows,
 for that member, how many times it was called from other members of
 the cycle.

 

5   利用DOT图形化

TXT格式的报告,对于小规模的程序已经足够了,但是对于大规模的程序来说,就显得还是太繁杂了,特别是我们把注意力放在调用关系上时,文本的跳跃总是让人不舒服。

把TXT报告转换成图片,需要Python和dot,还要下载gprof2dot.py的脚本。

Dot是graphviz提供的一个工具,在CentOS下,可以执行下面命令安装:

#yum install graphviz

安装之后,执行:

# python gprof2dot.py report.txt | dot -Tpng -o ast.png

其中report.txt就是前面gprof输出的文本报告,这时,当前目录下就生成一个名为ast.png的文件了,打开看看。

 

6   问题

6.1   共享库支持

对于代码剖析的支持是由编译器增加的,因此如果希望从共享库中获得剖析信息,就需要使用-pg来编译这些库。

如果需要分析系统函数(如libc库),需要用–lc_p替换-lc。这样程序会链接libc_p.so或libc_p.a。只有这样才能监控到底层的C库函数的执行时间。

6.2   用户时间与内核时间

它只能分析应用程序在运行过程中所消耗掉的用户时间,无法得到程序内核空间的运行时间。对内核态的调用分析无能为力。如果程序系统调用比率比较大,就不适合。

此外,时间是通过采样分析得到的,结果精度不高,如果执行时间很少,那么可能采不到样,输出时,结果就忽略了,这也是很多地方看到的时间都是0.00的原因。

6.3多线程

Gprof对多线程支持不好,因为gprofITIMER_PROF信号,而只有主线程才能处理这个信号。http://sam.zoy.org/writings/programming/gprof.html给了一个解决方法,就是嵌入个钩子,但我用它测试asterisk的时候,效果并不好,子线程的分析结果总是不对。

6.4    其它

只有进程退出才能生成gmon.out文件,用起来还是有些不方便

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值