C++ performance 性能分析工具(sanitizers valgrind gprof gperftools perf)的使用


参考:

1. time

Linux time 命令的用途,在于量测特定指令执行时所需消耗的时间及系统资源等资讯。

语法:

time [options] COMMAND [arguments]

参数:

  • -o 或 --output=FILE:设定结果输出档。这个选项会将 time 的输出写入 所指定的档案中。如果档案已经存在,系统将覆写其内容。

  • -a 或 --append:配合 -o 使用,会将结果写到档案的末端,而不会覆盖掉原来的内容。

  • -f FORMAT 或 --format=FORMAT:以 FORMAT 字串设定显示方式。当这个选项没有被设定的时候,会用系统预设的格式。不过你可以用环境变数 time 来设定这个格式,如此一来就不必每次登入系统都要设定一次。

示例:

➜  antlr4.9.2-develop git:(master)time date
Wed Jan 19 14:37:41 CST 2022
date  0.00s user 0.00s system 32% cpu 0.003 total

更多:http://c.biancheng.net/linux/time.html

2. Sanitizers

GitHubhttps://github.com/google/Sanitizers

使用文档: https://github.com/google/sanitizers/wiki

简介:

Sanitizers 是谷歌发起的开源工具集,包括了 AddressSanitizer, MemorySanitizer, ThreadSanitizer, LeakSanitizer,Sanitizers项目本是LLVM项目的一部分,但GNU也将该系列工具加入到了自家的 GCC 编译器中。GCC4.8 版本开始支持 Address 和 Thread Sanitizer,4.9 版本开始支持 Leak Sanitizer 和 UB Sanitizer,这些都是查找隐藏 Bug 的利器。

特点:

Sanitizer可以在检测到内存泄露第一时间立刻终止进程,并且它可以深入检测(随应用进程一起编译)。

相关标志:

  • 地址错误: -fsanitize=address
  • 内存错误: -fsanitize=memory
  • 内存泄漏: -fsanitize=leak
  • 线程竞速问题: -fsanitize=thread
  • 未定义问题: -fsanitize=undefined

为方便回溯可同时添加保留函数指针标志:-fno-omit-frame-pointer

gcc/g++ 使用 sanitizer:

gcc/g++ 编译只需要将 sanitizer 的标志作为 flag 设置即可,如下:

gcc/g++ -fsanitize=address -g -fno-omit-frame-pointer test.cpp

CMakeLists 使用 sanitizer :

使用 CMAKE_CXX_FLAGSadd_compile_options 的配置即可使用,如下:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address -fno-omit-frame-pointer")

add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
link_libraries(-fsanitize=address)

2.1 内存泄漏-fsanitize=leak

使用示例:

建立以下名为 memory_leak.cpp 的文件,其中发生了内存泄漏:

#include <iostream>
#include <string>

int main(int argc, char* argv[]) {
    std::string* s1 = new std::string("hello world!");
    std::cout << *s1 << std::endl;

    return 0;
}

编写 CMakeLists

cmake_minimum_required(VERSION 3.10)
project (demo)

SET(CMAKE_BUILD_TYPE "Debug")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -Wno-attributes")
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=leak -fno-omit-frame-pointer")

add_executable(memory_leak memory_leak.cpp)

运行:

➜  build cmake .. && make -j8
-- Configuring done
-- Generating done
-- Build files have been written to: /data/tangxing/verify/sanitizers/build
[ 50%] Building CXX object CMakeFiles/memory_leak.dir/memory_leak.cpp.o
[100%] Linking CXX executable memory_leak
[100%] Built target memory_leak
➜  build ./memory_leak       
hello world!

=================================================================
==18801==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7f672f8e3043 in operator new(unsigned long) ../../.././libsanitizer/lsan/lsan_interceptors.cc:222
    #1 0x400c0d in main /data/tangxing/verify/sanitizers/memory_leak.cpp:5
    #2 0x7f672ec33554 in __libc_start_main (/lib64/libc.so.6+0x22554)

SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).

通过结果,我们看出,在 main.cpp 第5行检测到了内存泄漏:

2.2 地址错误 -fsanitize=address

示例代码:

#include <iostream>
#include <string>

int main(int argc, char* argv[]) {
    int a[2] = {1, 0};
    int b = a[2];

    return 0;
}

CMakeLists.txt :

cmake_minimum_required(VERSION 3.10)
project (demo)

SET(CMAKE_BUILD_TYPE "Debug")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -Wno-attributes")
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address -fno-omit-frame-pointer")

add_executable(memory_leak memory_leak.cpp)

运行结果:

=================================================================
==31176==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe3971eec8 at pc 0x000000400c15 bp 0x7ffe3971ee70 sp 0x7ffe3971ee68
READ of size 4 at 0x7ffe3971eec8 thread T0
    #0 0x400c14 in main /data/tangxing/verify/sanitizers/memory_leak.cpp:6
    #1 0x7f01100ee554 in __libc_start_main (/lib64/libc.so.6+0x22554)
    #2 0x400a68  (/data/tangxing/verify/sanitizers/build/memory_leak+0x400a68)

Address 0x7ffe3971eec8 is located in stack of thread T0 at offset 40 in frame
    #0 0x400b21 in main /data/tangxing/verify/sanitizers/memory_leak.cpp:4

  This frame has 1 object(s):
    [32, 40) 'a' (line 5) <== Memory access at offset 40 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /data/tangxing/verify/sanitizers/memory_leak.cpp:6 in main
Shadow bytes around the buggy address:
  0x1000472dbd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000472dbd90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000472dbda0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000472dbdb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000472dbdc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000472dbdd0: 00 00 00 00 f1 f1 f1 f1 00[f3]f3 f3 00 00 00 00
  0x1000472dbde0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000472dbdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000472dbe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000472dbe10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000472dbe20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==31176==ABORTING

我们看到在第 5 行发生了地址错误。

3. Valgrind 工具集

官网:https://valgrind.org/

用户手册:https://valgrind.org/docs/manual/manual.html

图形化:https://github.com/jrfonseca/gprof2dot

参考:https://blog.csdn.net/yanghao23/article/details/7514587

简介:Valgrind 是一个用于构建动态分析工具的工具框架。有一些 Valgrind 工具可以自动检测许多内存管理和线程错误,并详细分析你的程序。您还可以使用 Valgrind 来构建新的工具。Valgrind 发行版目前包括 7production-quality 的工具:

  • a memory error detector,

  • two thread error detectors,

  • a cache and branch-prediction profiler,

  • a call-graph generating cache and branch-prediction profiler,

  • and two different heap profilers.

  • It also includes an experimental SimPoint basic block vector generator.

3.1 memory error detector

最常用的工具,用来检测程序中出现的内存问题,所有对内存的读写都会被检测到,一切对mallocfreenewdelete 的调用都会被捕获。所以,它能检测以下问题:

1、对未初始化内存的使用;

2、读/写释放后的内存块;

3、读/写超出 malloc 分配的内存块;

4、读/写不适当的栈中内存块;

5、内存泄漏,指向一块内存的指针永远丢失;

6、不正确的 malloc/freenew/delete 匹配;

7、memcpy() 相关函数中的 dst 和 src 指针重叠。

这些问题往往是 C/C++ 程序员最头疼的问题,Memcheck 能在这里帮上大忙。

3.1.1 简单例子

示例代码:

#include <iostream>
#include <string>

int main(int argc, char* argv[]) {
    std::string* s1 = new std::string("hello world!");
    std::cout << *s1 << std::endl;

    int a[2] = {1, 0};
    int b = a[2];

    return 0;
}

CMakeLists.txt :

cmake_minimum_required(VERSION 3.10)
project (demo)

set(CMAKE_CXX_STANDARD 11)

SET(CMAKE_BUILD_TYPE "Debug")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -Wno-attributes")
SET(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall -Wno-attributes")

add_executable(memcheck memcheck.cpp)

使用:

valgrind --leak-check=full ./可执行文件名

控制台输出:

==8398== Memcheck, a memory error detector
==8398== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8398== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==8398== Command: ./memcheck
==8398== 
hello world!
==8398== 
==8398== HEAP SUMMARY:
==8398==     in use at exit: 32 bytes in 1 blocks
==8398==   total heap usage: 2 allocs, 1 frees, 72,736 bytes allocated
==8398== 
==8398== 32 bytes in 1 blocks are definitely lost in loss record 1 of 1
==8398==    at 0x4C2A593: operator new(unsigned long) (vg_replace_malloc.c:344)
==8398==    by 0x400ADD: main (memcheck.cpp:5)
==8398== 
==8398== LEAK SUMMARY:
==8398==    definitely lost: 32 bytes in 1 blocks
==8398==    indirectly lost: 0 bytes in 0 blocks
==8398==      possibly lost: 0 bytes in 0 blocks
==8398==    still reachable: 0 bytes in 0 blocks
==8398==         suppressed: 0 bytes in 0 blocks
==8398== 
==8398== For lists of detected and suppressed errors, rerun with: -s
==8398== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

我们可以看出在 memcheck.cpp5 行发生了内存泄漏。

3.1.2 更多

例子:

#include <stdlib.h>
#include <malloc.h>
#include <string.h>

void test() {
    int *ptr = (int*)malloc(sizeof(int)*10);

    ptr[10] = 7; // 内存越界

    memcpy(ptr +1, ptr, 5); // 踩内存
    
    free(ptr); 
    free(ptr);// 重复释放
    
    int *p1;
    *p1 = 1; // 非法指针
}

int main(void) {
    test();
    return 0;
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.10)
project (demo)

set(CMAKE_CXX_STANDARD 11)

SET(CMAKE_BUILD_TYPE "Debug")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -Wno-attributes")
SET(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall -Wno-attributes")

add_executable(memcheck memcheck.cpp)

编译后使用:

valgrind --leak-check=full ./可执行文件名

检测结果:

[100%] Built target memcheck
==1203== Memcheck, a memory error detector
==1203== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1203== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1203== Command: ./memcheck
==1203== 
==1203== Invalid write of size 4 # 内存越界
==1203==    at 0x400600: test() (memcheck.cpp:8)
==1203==    by 0x40064F: main (memcheck.cpp:20)
==1203==  Address 0x5b0aca8 is 0 bytes after a block of size 40 alloc'd
==1203==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==1203==    by 0x4005F3: test() (memcheck.cpp:6)
==1203==    by 0x40064F: main (memcheck.cpp:20)
==1203== 
==1203== Source and destination overlap in memcpy(0x5b0ac84, 0x5b0ac80, 5) # 踩内存
==1203==    at 0x4C2E81D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==1203==    by 0x400621: test() (memcheck.cpp:10)
==1203==    by 0x40064F: main (memcheck.cpp:20)
==1203== 
==1203== Invalid free() / delete / delete[] / realloc() # 重复释放
==1203==    at 0x4C2B06D: free (vg_replace_malloc.c:540)
==1203==    by 0x400639: test() (memcheck.cpp:13)
==1203==    by 0x40064F: main (memcheck.cpp:20)
==1203==  Address 0x5b0ac80 is 0 bytes inside a block of size 40 free'd
==1203==    at 0x4C2B06D: free (vg_replace_malloc.c:540)
==1203==    by 0x40062D: test() (memcheck.cpp:12)
==1203==    by 0x40064F: main (memcheck.cpp:20)
==1203==  Block was alloc'd at
==1203==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==1203==    by 0x4005F3: test() (memcheck.cpp:6)
==1203==    by 0x40064F: main (memcheck.cpp:20)
==1203== 
==1203== Use of uninitialised value of size 8 # 非法指针
==1203==    at 0x40063E: test() (memcheck.cpp:16)
==1203==    by 0x40064F: main (memcheck.cpp:20)
==1203== 
==1203== 
==1203== Process terminating with default action of signal 11 (SIGSEGV) # 由于非法指针赋值导致的程序崩溃
==1203==  Bad permissions for mapped region at address 0x400660
==1203==    at 0x40063E: test() (memcheck.cpp:16)
==1203==    by 0x40064F: main (memcheck.cpp:20)
==1203== 
==1203== HEAP SUMMARY:
==1203==     in use at exit: 0 bytes in 0 blocks
==1203==   total heap usage: 2 allocs, 3 frees, 72,744 bytes allocated
==1203== 
==1203== All heap blocks were freed -- no leaks are possible
==1203== 
==1203== Use --track-origins=yes to see where uninitialised values come from
==1203== For lists of detected and suppressed errors, rerun with: -s
==1203== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
[1]    1203 segmentation fault  valgrind --leak-check=full ./memcheck

valgrind 把我们的几个内存错误全都检测了出来。

3.2 call-graph generating

gprof 类似的分析工具,但它对程序的运行观察更是入微,能给我们提供更多的信息。和gprof不同,它不需要在编译源代码时附加特殊选项,但加上调试选项是推荐的。Callgrind 收集程序运行时的一些数据,建立函数调用关系图,还可以有选择地进行 cache 模拟。在运行结束时,它会把分析数据写入一个文件。callgrind_annotate 可以把这个文件的内容转化成可读的形式。

生成可视化的图形需要下载 gprof2dothttps://github.com/jrfonseca/gprof2dot/blob/master/gprof2dot.py

这是个 python 脚本,把它下载之后修改其权限 chmod +7 gprof2dot.py ,并把这个脚本添加到 $PATH 路径中的任一文件夹下,我是将它放到了 /usr/bin 目录下,这样就可以直接在终端下执行 gprof2dot.py 了。

**示例代码:call.cpp **

#include <stdio.h>
#include <malloc.h>
#include <chrono>
#include <thread>

void test()
{
    std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
void f()
{
    int i;
    for( i = 0; i < 5; i ++)
        test();
}
int main()
{
    f();
    printf("process is over!\n");
    return 0;
}

CMakeLists.txt :

cmake_minimum_required(VERSION 3.10)
project (demo)

set(CMAKE_CXX_STANDARD 11)

SET(CMAKE_BUILD_TYPE "Debug")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -Wno-attributes")
SET(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall -Wno-attributes")

add_executable(call call.cpp)

使用 callgrind

valgrind --tool=callgrind ./可执行程序

执行完成后在目录下生成"callgrind.out.<pid>"的文件这是分析文件,可以直接利用下面的命令打印结果:

callgrind_annotate callgrind.out.<pid>

也可以使用下面的命令来生成图形化结果:

gprof2dot.py -f callgrind callgrind.out.<pid> |dot -Tpng -o report.png

4. GNU gprof (GNU Profiler)

参考 gprof 用户数手册:http://sourceware.org/binutils/docs-2.17/gprof/index.html

gprofgcc 自带的性能测试工具,可以统计出各个函数的调用次数、时间、以及函数调用图。

使用步骤:

(1)编译时候打开编译开关,-pg

(2)运行程序(程序一定要正常运行完毕才会生成性能报告)

(3)运行性能测试工具来生成报告。

我们还是使用第上一小节的代码:call.cpp

#include <stdio.h>
#include <malloc.h>
#include <chrono>
#include <thread>

void test()
{
    std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
void f()
{
    int i;
    for( i = 0; i < 5; i ++)
        test();
}
int main()
{
    f();
    printf("process is over!\n");
    return 0;
}

CMakeLists.txt :

cmake_minimum_required(VERSION 3.10)
project (demo)

set(CMAKE_CXX_STANDARD 11)

SET(CMAKE_BUILD_TYPE "Debug")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -pg -Wno-attributes")
SET(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall -Wno-attributes")

add_executable(call call.cpp)

编译运行程序后,生成了 gmon.out 文件。

4.1 以文本显示

执行以下命令:

gprof <options> <可执行文件> gmon.out

生成了以下内容,我们发现不好查看:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00       35     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1000l> >::count() const
  0.00      0.00     0.00       15     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1000l> >::duration<long, void>(long const&)
  0.00      0.00     0.00       10     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1l> >::count() const
  0.00      0.00     0.00        5     0.00     0.00  test()
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1000000000l> >::count() const
  0.00      0.00     0.00        5     0.00     0.00  void std::this_thread::sleep_for<long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  std::enable_if<std::chrono::__is_duration<std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::value, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::type std::chrono::duration_cast<std::chrono::duration<long, std::ratio<1l, 1000000000l> >, long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  std::enable_if<std::chrono::__is_duration<std::chrono::duration<long, std::ratio<1l, 1000l> > >::value, std::chrono::duration<long, std::ratio<1l, 1000l> > >::type std::chrono::duration_cast<std::chrono::duration<long, std::ratio<1l, 1000l> >, long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  std::enable_if<std::chrono::__is_duration<std::chrono::duration<long, std::ratio<1l, 1l> > >::value, std::chrono::duration<long, std::ratio<1l, 1l> > >::type std::chrono::duration_cast<std::chrono::duration<long, std::ratio<1l, 1l> >, long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration_values<long>::zero()
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1000000000l> > std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::ratio<1000000l, 1l>, long, false, true>::__cast<long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1000l> > std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1000l> >, std::ratio<1000l, 1l>, long, false, true>::__cast<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1l> > std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1l> >, std::ratio<1l, 1000l>, long, true, false>::__cast<long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1000000000l> >::duration<long, void>(long const&)
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1000l> >::zero()
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1000l> >::duration<int, void>(int const&)
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1000l> >::duration<long, std::ratio<1l, 1l>, void>(std::chrono::duration<long, std::ratio<1l, 1l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  std::chrono::duration<long, std::ratio<1l, 1l> >::duration<long, void>(long const&)
  0.00      0.00     0.00        5     0.00     0.00  bool std::chrono::operator<=<long, std::ratio<1l, 1000l>, long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  bool std::chrono::operator< <long, std::ratio<1l, 1000l>, long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&, std::chrono::duration<long, std::ratio<1l, 1000l> > const&)
  0.00      0.00     0.00        5     0.00     0.00  std::common_type<std::chrono::duration<long, std::ratio<1l, 1000l> >, std::chrono::duration<long, std::ratio<1l, 1l> > >::type std::chrono::operator-<long, std::ratio<1l, 1000l>, long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&, std::chrono::duration<long, std::ratio<1l, 1l> > const&)
  0.00      0.00     0.00        1     0.00     0.00  f()

4.2 以图形显示

执行以下命令:

gprof ./可执行文件 | gprof2dot.py |dot -Tpng -o report.png

生成可视化的图形需要下载 gprof2dothttps://github.com/jrfonseca/gprof2dot/blob/master/gprof2dot.py

这是个 python 脚本,把它下载之后修改其权限 chmod +7 gprof2dot.py ,并把这个脚本添加到 $PATH 路径中的任一文件夹下,我是将它放到了 /usr/bin 目录下,这样就可以直接在终端下执行 gprof2dot.py 了。

这里就生成了调用图:

在这里插入图片描述

5. gperftools (Google Performance Tools)

GitHub : https://github.com/gperftools/gperftools

参考:https://github.com/NIGHTFIGHTING/gperftools-tutorial

编译安装:

# 从github下载gperftools源码并解压
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.9.1/gperftools-2.9.1.tar.gz
tar -xvf gperftools-2.9.1.tar.gz
 cd gperftools-2.9.1
# 编译
./configure
make -j8
# 安装到系统文件夹
sudo make install

call.cpp :

#include <stdio.h>
#include <malloc.h>
#include <chrono>
#include <thread>

void test()
{
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
}

void f()
{
    int i;
    for( i = 0; i < 100; i ++){
        test();
    }
}

int main()
{
    for(int i=0; i<100; i++) {
        f();
    }
    printf("process is over!\n");
    return 0;
}

CMakeLists.txt :

cmake_minimum_required(VERSION 3.10)
project (demo)

set(CMAKE_CXX_STANDARD 11)

SET(CMAKE_BUILD_TYPE "Debug")
# SET(CMAKE_BUILD_TYPE "Release")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -lprofiler -Wno-attributes")
SET(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall -Wno-attributes")

add_executable(call call.cpp)

生成报告:

CPUPROFILE=./prof.out ./call

5.1 text 报告

命令:

➜  build pprof ./call test_capture.prof --text       
Using local file ./call.
Using local file test_capture.prof.
Total: 3 samples
       2  66.7%  66.7%        2  66.7% __nanosleep_nocancel
       1  33.3% 100.0%        1  33.3% std::chrono::operator< 
       0   0.0% 100.0%        3 100.0% __libc_start_main
       0   0.0% 100.0%        3 100.0% _start
       0   0.0% 100.0%        3 100.0% f
       0   0.0% 100.0%        3 100.0% main
       0   0.0% 100.0%        1  33.3% std::chrono::operator<=
       0   0.0% 100.0%        3 100.0% std::this_thread::sleep_for
       0   0.0% 100.0%        3 100.0% test

5.2 pdf 报告

命令:

pprof ./call test_capture.prof --pdf > prof.pdf

在这里插入图片描述

6. perf

Wiki:https://perf.wiki.kernel.org/index.php/Main_Page

Perf 是内置于Linux 内核源码树中的性能剖析(profiling)工具。其基于事件采样原理,以性能事件为基础,常用于性能瓶颈的查找与热点代码的定位。

性能调优工具如 perf,Oprofile 等的基本原理都是对被监测对象进行采样,最简单的情形是根据 tick 中断进行采样,即在 tick 中断内触发采样点,在采样点里判断程序当时的上下文。假如一个程序 90% 的时间都花费在函数 foo() 上,那么 90% 的采样点都应该落在函数 foo的上下文中。只要采样频率足够高,采样时间足够长,那么以上推论就比较可靠。因此,通过 tick 触发采样,我们便可以了解程序中哪些地方最耗时间,从而重点分析。

call.cpp:

#include <stdio.h>
#include <malloc.h>
#include <chrono>
#include <thread>

void test()
{
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
}

void f()
{
    int i;
    for( i = 0; i < 100; i ++){
        test();
    }
}

int main()
{
    for(int i=0; i<100; i++) {
        f();
    }
    printf("process is over!\n");
    return 0;
}

CMakeLists.txt :

cmake_minimum_required(VERSION 3.10)
project (demo)

set(CMAKE_CXX_STANDARD 11)

SET(CMAKE_BUILD_TYPE "Debug")
# SET(CMAKE_BUILD_TYPE "Release")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -Wno-attributes")
SET(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall -Wno-attributes")

add_executable(call call.cpp)

6.1 文本报告

命令:

perf record -e cpu-clock -g ./可执行文件

示例:

➜  build perf record -e cpu-clock -g ./call 
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.

process is over!
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.004 MB perf.data (26 samples) ]
➜  build 

结果:

在这里插入图片描述

6.2 火焰图

上面通过文件查看不够直观,还有一种火焰图分析的方式:
工具下载:git clone https://github.com/brendangregg/FlameGraph.git

➜  build cd ..
# 工具下载
➜  perf git clone https://github.com/brendangregg/FlameGraph.git
Cloning into 'FlameGraph'...
remote: Enumerating objects: 1147, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 1147 (delta 13), reused 13 (delta 5), pack-reused 1119
Receiving objects: 100% (1147/1147), 1.90 MiB | 0 bytes/s, done.
Resolving deltas: 100% (659/659), done.
➜  perf cd build 
# 使用perf script工具对perf.data进行解析
➜  build perf script -i perf.data &> perf.unfold
# 将perf.unfold中的符号进行折叠
➜  build ../FlameGraph/stackcollapse-perf.pl perf.unfold &> perf.folded
# 最后生成svg图
➜  build ../FlameGraph/flamegraph.pl perf.folded > perf.svg

perf.svg:
在这里插入图片描述

6.3 perf diff

优化程序性能后,我们自然要看下效果:

perf diff perf.data perf.data.before

我们干掉了上文中的std::this_thread::sleep_for(std::chrono::milliseconds(1));这句代码,然后 diff 结果输出如下:

在这里插入图片描述

  • 6
    点赞
  • 45
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
google-perftools 简介 google-perftools 是一款针对 C/C++ 程序的性能分析工具,它是一个遵守 BSD 协议的开源项目。使用工具可以对 CPU 时间片、内存等系统资源的分配和使用进行分析,本文将重点介绍如何进行 CPU 时间片的剖析。 google-perftools 对一个程序的 CPU 性能剖析包括以下几个步骤。 1. 编译目标程序,加入对 google-perftools 库的依赖。 2. 运行目标程序,并用某种方式启动 / 终止剖析函数并产生剖析结果。 3. 运行剖结果转换工具,将不可读的结果数据转化成某种格式的文档(例如 pdf,txt,gv 等)。 安装 您可以在 google-perftools 的网站 (http://code.google.com/p/google-perftools/downloads/list) 上下载最新版的安装包。为完成步骤 3 的工作,您还需要一个将剖析结果转化为程序员可读文档的工具,例如 gv(http://www.gnu.org/software/gv/)。 编译与运行 您需要在原有的编译选项加入对 libprofiler.so 的引用,这样在目标程序运行时会加载工具的动态库。例如本例作者的系统,libprofiler.so 安装在"/usr/lib"目录下,所以需要在 makefile 文件的编译选项加入“-L/usr/lib -lprofiler”。 google-perftools 需要在目标代码的开始和结尾点分别调用剖析模块的启动和终止函数,这样在目标程序运行时就可以对这段时间内程序实际占用的 CPU 时间片进行统计和分析工具的启动和终止可以采用以下两种方式。 a. 使用调试工具 gdb 在程序手动运行性能工具的启动 / 终止函数。 gdb 是 Linux 上广泛使用的调试工具,它提供了强大的命令行功能,使我们可以在程序运行时插入断点并在断点处执行其他函数。具体的文档请参照 http://www.gnu.org/software/gdb/,本文将只对用到的几个基本功能进行简单介绍。使用以下几个功能就可以满足我们性能调试的基本需求,具体使用请参见下文示例。 命令 功能 ctrl+c 暂停程序的运行 c 继续程序的运行 b 添加函数断点(参数可以是源代码的行号或者一个函数名) p 打印某个量的值或者执行一个函数调用 b. 在目标代码直接加入性能工具函数的调用,该方法就是在程序代码直接加入调试函数的调用。 两种方式都需要对目标程序重新编译,加入对性能工具的库依赖。对于前者,他的好处是使用比较灵活,但工具的启动和终止依赖于程序员的手动操作,常常需要一些暂停函数(比如休眠 sleep)的支持才能达到控制程序的目的,因此精度可能受到影响。对于后者,它需要对目标代码的进行修改,需要处理函数声明等问题,但得到的结果精度较高,缺点是每次重新设置启动点都需要重新编译,灵活度不高,读者可以根据自己的实际需求采用有效的方式。 示例详解 该程序是一个简单的例子,文有两处耗时的无用操作,并且二者间有一定的调用关系。 清单 1. 示例程序 void consumeSomeCPUTime1(int input){ int i = 0; input++; while(i++ < 10000){ i--; i++; i--; i++; } }; void consumeSomeCPUTime2(int input){ input++; consumeSomeCPUTime1(input); int i = 0; while(i++ < 10000){ i--; i++; i--; i++; } }; int stupidComputing(int a, int b){ int i = 0; while( i++ < 10000){ consumeSomeCPUTime1(i); } int j = 0; while(j++ < 5000){ consumeSomeCPUTime2(j); } return a+b; }; int smartComputing(int a, int b){ return a+b; }; void main(){ int i = 0; printf("reached the start point of performance bottle neck\n"); sleep(5); //ProfilerStart("CPUProfile"); while( i++ MyProfile.pdf 转换后产生的结果文档如下图。图的数字和框体的大小代表了的某个函数的运行时间占整个剖析时间的比例。由代码的逻辑可知,stupidComputing,stupidComputing2 都是费时操作并且它们和 consumeSomeCPUTime 存在着一定的调用关系。 图 1. 剖析结果 结束语 本文介绍了一个 Linux 平台上的性能剖析工具 google-perftools,并结合实例向读者展示了如何使用工具配置、使用分析性能瓶颈。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

超级D洋葱

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值