一)gprof
gprof是一个用来统计可执行程序数据的工具.它能帮助用户确定程序在哪个地方耗时过多.
被编译器生成指令后的代码将会被gprof捕获并进行计算,在gcc编译过程中使用-pg选项编译就会产生指令化工具,当执行编译后的二进制程序时,就会生成一个profile信息的概要文件.
gprof就是用这个文件对程序进行分析的,没有用-pg生成指令化工具的代码是不可测量的.
下面是一个演示程序,如下:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <math.h>
double slow(double x)
{
return pow(x,1.12345);
}
double slower(double x)
{
return 1.0/x;
}
double slowest(double x)
{
return sqrt(x);
}
int main (int argc, char *argv[])
{
int i;
double x;
for (i = 0;i < 3000000; i++){
x = 100.0;
x = slow(x);
x = slower(x);
x = slowest(x);
}
}
我们要用-pg选项对它进行编译,同时为达到最好的效率,我们加上-O2.
编译:
gcc -lm -O2 -pg profme.c -o profme
执行profme程序,生成gmon.out文件.
./profme
ls -l gmon.out
-rw-r--r-- 1 root root 456 Mar 29 11:13 gmon.out
查一下gmon.out的文件类型:
file gmon.out
gmon.out: GNU prof performance data - version 1
执行gprof对程序进行分析:
gprof ./profme
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ns/call ns/call name
37.71 0.03 0.03 3000000 10.06 10.06 slowest
37.71 0.06 0.03 main
25.14 0.08 0.02 3000000 6.70 6.70 slower
0.00 0.08 0.00 3000000 0.00 0.00 slow
中间略
Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) for 12.43% of 0.08 seconds
index % time self children called name
<spontaneous>
[1] 100.0 0.03 0.05 main [1]
0.03 0.00 3000000/3000000 slowest [2]
0.02 0.00 3000000/3000000 slower [3]
0.00 0.00 3000000/3000000 slow [4]
-----------------------------------------------
0.03 0.00 3000000/3000000 main [1]
[2] 37.5 0.03 0.00 3000000 slowest [2]
-----------------------------------------------
0.02 0.00 3000000/3000000 main [1]
[3] 25.0 0.02 0.00 3000000 slower [3]
-----------------------------------------------
0.00 0.00 3000000/3000000 main [1]
[4] 0.0 0.00 0.00 3000000 slow [4]
-----------------------------------------------
后面略
注:
我们看到gprof默认会输出两组信息,即平台剖析数据(Flat profile)和调用图(Call graph).
我们先看平台剖析数据:
cumulative seconds:代表的是时间的累加,如上例,运行slowest函数(调用300万次)共用了0.03秒,此时cumulative seconds=0.03,
运行main函数用了0.03秒,此时cumulative seconds=0.06,运行slower函数(调用300万次)共用了0.02秒,此时cumulative seconds=0.08,
最后slow函数没有时间,最后cumulative seconds=0.08.
self seconds:代表的是函数本身的运行时间,比如slowest它占用的时间最长,被调用300万次,共用了0.03秒.
我们再来看一下调用图:
调用图给我们展现了children(子进程)/被调用函数的消耗时间,这对于找出问题函数有极大的帮助.
接下来我们可以将几个gmon.out文件累加起来,然后用gprof来做整理的分析.
首先我们要设定环境变量,使每次生成的gmon.out文件都有一个自己的PID做为后缀,如下:
export GMON_OUT_PREFIX=gmon.out
运行4次profme程序,即产生四个gmon.out.PID文件.
./profme
./profme
./profme
./profme
将gmon.out文件合起来做分析,如下:
gprof ./profme gmon.out.*
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ns/call ns/call name
35.61 0.09 0.09 12000000 7.12 7.12 slower
27.23 0.15 0.07 12000000 5.45 5.45 slowest
25.14 0.21 0.06 main
12.57 0.24 0.03 12000000 2.51 2.51 slow
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
name the name of the function. This is the minor sort
for this listing. The index shows the location of
the function in the gprof listing. If the index is
in parenthesis it shows where it would appear in
the gprof listing if it were to be printed.
Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) for 4.14% of 0.24 seconds
index % time self children called name
<spontaneous>
[1] 100.0 0.06 0.18 main [1]
0.09 0.00 12000000/12000000 slower [2]
0.07 0.00 12000000/12000000 slowest [3]
0.03 0.00 12000000/12000000 slow [4]
-----------------------------------------------
0.09 0.00 12000000/12000000 main [1]
[2] 35.4 0.09 0.00 12000000 slower [2]
-----------------------------------------------
0.07 0.00 12000000/12000000 main [1]
[3] 27.1 0.07 0.00 12000000 slowest [3]
-----------------------------------------------
0.03 0.00 12000000/12000000 main [1]
[4] 12.5 0.03 0.00 12000000 slow [4]
-----------------------------------------------
二)gcov
gcov用来确定代码覆盖,同时我们也可以用它来确认代码被执行了多少次,这在优化过程中非常有用.
我们用下面的程序做下说明:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
volatile double x;
int
main (int argc, char *argv[])
{
int i;
for (i = 0;i < 16000000; i++){
x = 1000.0;
int r = i & 0xf;
if (r <= 8){
x = pow(x,1.234);
}
else
if(r <= 11){
x = sqrt(x);
}
else
{
x = 1.0/x;
}
}
}
编译:
gcc -g -O2 -ftest-coverage -fprofile-arcs summer-proj.c -o summer-proj -pg -lm
注:
gcov主要使用.gcno和.gcda两个文件,所以在编译过程中要加入-ftest-coverage和-fprofile-arcs两个参数选项.
-ftest-coverage生成summer-proj.gcno
-fprofile-arcs生成summer-proj.gcda
运行summer-proj程序:
./summer-proj
使用gcov覆盖测试summer-proj程序,如下:
gcov ./summer-proj
File 'summer-proj.c'
Lines executed:100.00% of 10
summer-proj.c:creating 'summer-proj.c.gcov'
注:
此时gcov生成了summer-proj.c.gcov文件.
我们打开这个文件,查看其中内容:
./summer-proj
[root@test1 ~]# cat summer-proj.c.gcov
-: 0:Source:summer-proj.c
-: 0:Graph:summer-proj.gcno
-: 0:Data:summer-proj.gcda
-: 0:Runs:1
-: 0:Programs:1
-: 1:#include <stdio.h>
-: 2:#include <string.h>
-: 3:#include <stdlib.h>
-: 4:#include <time.h>
-: 5:#include <math.h>
-: 6:
-: 7:volatile double x;
-: 8:
-: 9:int
-: 10:main (int argc, char *argv[])
1: 11:{
-: 12: int i;
16000001: 13: for (i = 0;i < 16000000; i++){
16000000: 14: x = 1000.0;
-: 15:
16000000: 16: int r = i & 0xf;
-: 17:
16000000: 18: if (r <= 8){
9000000: 19: x = pow(x,1.234);
-: 20: }
-: 21: else
7000000: 22: if(r <= 11){
3000000: 23: x = sqrt(x);
-: 24: }
-: 25: else
-: 26: {
4000000: 27: x = 1.0/x;
-: 28: }
-: 29: }
1: 30:}
注:我们看到pow(x,1.234)被调用了9000000次,而sqrt(x)被调用了3000000,而1.0/x被调用了4000000.
符合当初程序设计的意图.
我们想知道谁用的时间最长,就要用到gprof,如下:
gprof --no-graph -l summer-proj gmon.out.5095
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
34.40 0.11 0.11 main (summer-proj.c:19 @ 8048ab2)
23.45 0.19 0.08 main (summer-proj.c:27 @ 8048a49)
12.51 0.23 0.04 main (summer-proj.c:23 @ 8048ad8)
12.51 0.27 0.04 main (summer-proj.c:23 @ 8048af0)
4.69 0.28 0.02 main (summer-proj.c:13 @ 8048a69)
4.69 0.30 0.02 main (summer-proj.c:14 @ 8048a9c)
3.13 0.31 0.01 main (summer-proj.c:11 @ 8048a8f)
1.56 0.31 0.01 main (summer-proj.c:11 @ 8048a20)
1.56 0.32 0.01 main (summer-proj.c:22 @ 8048a40)
1.56 0.32 0.01 main (summer-proj.c:18 @ 8048ab0)
注:
我们从中看出pow(x,1.234)函数调用占用了程序更多的时间.