valgrind使用

baboon_chen

已于 2024-01-24 16:21:43 修改

阅读量1k

点赞数 20

分类专栏： linux 命令文章标签： valgrind memcheck callgrind

于 2024-01-24 16:19:31 首次发布

本文链接：https://blog.csdn.net/i_just_smile/article/details/135824746

版权

linux 命令专栏收录该内容

26 篇文章 2 订阅

订阅专栏

文章目录

简介
安装
如何使用valgrind来检测内存错误？
如何使用其它的工具？
总结

简介

Valgrind是一个工具集，包含了许多调试与性能分析的工具。其中使用最多的是Memcheck，它能帮你检测C/C++中的内存问题，避免程序崩溃或不可预知错误。

其它的工具实际使用率低，具体如下：

Cachegrind
一个缓存和分支预测分析器。它能够模拟CPU中的一级缓存L1、D1和二级缓存，且能够精确指出程序中cache的命中和丢失。它还可以给出每行代码，每个函数，每个模块，和整个程序的内存引用次数以及指令数，有利于优化程序。
Callgrind
相当于Cachegrind的一个扩展。它除了能够给出Cachegrind提供的所有信息之外，还可以给出程序的调用图。
Massif
堆分析器。会生成一张表示程序运行过程中堆内存使用情况的图，包括在运行过程中哪个模块占用的堆内存最多等信息。
Helgrind
线程调试器。用于检测多线程程序中出现的数据竞争问题，如访问临界资源不加锁、死锁等。
DRD
类似于Helgrind，但使用了不同的分析技术，因此可能会发现不同的问题。
DHAT
一种不同类型的堆分析器。
…
工具集还在持续更新。

下载地址：Valgrind: Current Releases

官网文档：Valgrind: Table of Contents

安装

tar xjf valgrind-3.22.0.tar.bz2
cd valgrind-3.22.0

./configure
make -j4
make install

man valgrind

其实官方文档内容较多，应该选择阅读自己关注的工具文档，比如Memcheck文档：

在这里插入图片描述

如何使用valgrind来检测内存错误？

常见的内存问题有非法读写、非法释放、使用未初始化或无法寻址的变量、释放和分配函数不匹配、源目地址与目标地址相互覆盖等等。这里就不一一列举各个场景，主要介绍如何使用valgrind来检测程序。

首先，编写一个内存非法访问的程序：

#include <stdio.h>
#include <stdlib.h>

int main()
{
#if 0 // memcheck只检查堆空间的内存问题，如果是栈空间的错误会检测不出。
    int arr[5] = {1, 2, 3, 4, 5};
#else
    int *arr = (int *)malloc(5 * sizeof(int));
#endif
    arr[-1] = 0;

    printf("run ok.\n");

    return 0;
}

编译时，尽量别使用编译优化选项，编译器优化可能会改变代码的内存访问方式，导致检测报告不准确。

Valgrind的使用方式很简单：

valgrind [options] prog-and-args

它的选项有很多，除了--tool=<toolname>，用来指定工具外，感觉其它的都不怎么常用。
Valgrind如果没有使用--tool选项，就会默认使用memcheck工具来检测内存错误。

检测结果：

[root@localhost c]# valgrind ./a.out
==20219== Memcheck, a memory error detector
==20219== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==20219== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==20219== Command: ./a.out
==20219==
==20219== Invalid write of size 4
==20219==    at 0x4005F4: main (in /home/ch/c/a.out)
==20219==  Address 0x520e03c is 4 bytes before a block of size 20 alloc'd
==20219==    at 0x4C36081: malloc (vg_replace_malloc.c:442)
==20219==    by 0x4005E7: main (in /home/ch/c/a.out)
==20219==
run ok.
==20219==
==20219== HEAP SUMMARY:
==20219==     in use at exit: 20 bytes in 1 blocks
==20219==   total heap usage: 2 allocs, 1 frees, 1,044 bytes allocated
==20219==
==20219== LEAK SUMMARY:
==20219==    definitely lost: 20 bytes in 1 blocks
==20219==    indirectly lost: 0 bytes in 0 blocks
==20219==      possibly lost: 0 bytes in 0 blocks
==20219==    still reachable: 0 bytes in 0 blocks
==20219==         suppressed: 0 bytes in 0 blocks
==20219== Rerun with --leak-check=full to see details of leaked memory
==20219==
==20219== For lists of detected and suppressed errors, rerun with: -s
==20219== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

报告也很简明：

# 这是列出内存错误
==20219== Invalid write of size 4

# 堆检测汇总，2处分配，1处释放
==20219== HEAP SUMMARY:
==20219==     in use at exit: 20 bytes in 1 blocks
==20219==   total heap usage: 2 allocs, 1 frees, 1,044 bytes allocated

# 泄漏汇总，有20字节绝对泄漏
==20219== LEAK SUMMARY:
==20219==    definitely lost: 20 bytes in 1 blocks

如何使用其它的工具？

我们再演示一个检查线程错误的工具Helgrind。它的说明文档在这：Helgrind: a thread error detector

同样编写一个示例程序data_race.c:

#include <pthread.h>

int var = 0;

void* child_fn ( void* arg ) {
   var++; /* Unprotected relative to parent */ /* this is line 6 */
   return NULL;
}

int main ( void ) {
   pthread_t child;
   pthread_create(&child, NULL, child_fn, NULL);
   var++; /* Unprotected relative to child */ /* this is line 13 */
   pthread_join(child, NULL);
   return 0;
}

这段代码在使用线程访问全局变量时，没有加锁，因此会产生数据竞争的问题。

编译程序，并运行Helgrind进行检测：

[root@localhost c]# gcc data_race.c -lpthread
[root@localhost c]#
[root@localhost c]# valgrind --tool=helgrind ./a.out
==5425== Helgrind, a thread error detector
==5425== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==5425== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==5425== Command: ./a.out
==5425==
==5425== ---Thread-Announcement------------------------------------------
==5425==
==5425== Thread #1 is the program's root thread
==5425==
==5425== ---Thread-Announcement------------------------------------------
==5425==
==5425== Thread #2 was created
==5425==    at 0x516DDB2: clone (in /usr/lib64/libc-2.28.so)
==5425==    by 0x4E5803E: create_thread (in /usr/lib64/libpthread-2.28.so)
==5425==    by 0x4E599F5: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.28.so)
==5425==    by 0x4C44542: pthread_create_WRK (hg_intercepts.c:445)
==5425==    by 0x4C45A49: pthread_create@* (hg_intercepts.c:478)
==5425==    by 0x400656: main (in /home/ch/c/a.out)
==5425==
==5425== ----------------------------------------------------------------
==5425==
==5425== Possible data race during read of size 4 at 0x601030 by thread #1
==5425== Locks held: none
==5425==    at 0x400657: main (in /home/ch/c/a.out)
==5425==
==5425== This conflicts with a previous write of size 4 by thread #2
==5425== Locks held: none
==5425==    at 0x400627: child_fn (in /home/ch/c/a.out)
==5425==    by 0x4C44736: mythread_wrapper (hg_intercepts.c:406)
==5425==    by 0x4E59179: start_thread (in /usr/lib64/libpthread-2.28.so)
==5425==    by 0x516DDC2: clone (in /usr/lib64/libc-2.28.so)
==5425==  Address 0x601030 is 0 bytes inside data symbol "var"
==5425==
==5425== ----------------------------------------------------------------
==5425==
==5425== Possible data race during write of size 4 at 0x601030 by thread #1
==5425== Locks held: none
==5425==    at 0x400660: main (in /home/ch/c/a.out)
==5425==
==5425== This conflicts with a previous write of size 4 by thread #2
==5425== Locks held: none
==5425==    at 0x400627: child_fn (in /home/ch/c/a.out)
==5425==    by 0x4C44736: mythread_wrapper (hg_intercepts.c:406)
==5425==    by 0x4E59179: start_thread (in /usr/lib64/libpthread-2.28.so)
==5425==    by 0x516DDC2: clone (in /usr/lib64/libc-2.28.so)
==5425==  Address 0x601030 is 0 bytes inside data symbol "var"
==5425==
==5425==
==5425== Use --history-level=approx or =none to gain increased speed, at
==5425== the cost of reduced accuracy of conflicting-access information
==5425== For lists of detected and suppressed errors, rerun with: -s
==5425== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

报告中说明了每个线程的创建过程，以及提示出可能有数据竞争的变量，最后给出错误汇总。

当然，部分工具给出的报告，还需要借助一个解释工具，将报告转换成易读的模式。比如Massif的输出报告需要通过ms_print转变可读模式；Callgrind需要借助callgrind_annotate；Cachegrind需要借助cg_annotate等等。当然，这些工具在安装时会自带。

Callgrind能显示程序调用图，并且能跟踪线程。接下来演示它的使用过程。
同样，先编写一个示例程序，该程序创建了两个线程，分别使用不同的方法统计质数，有快有慢，相互对比耗时。

#include <pthread.h>
#include <stdio.h>
#include <stdbool.h>
#include <sys/time.h>
#include <math.h>
#include <unistd.h>

#define RANGE 1000000

bool is_prime_1(int n) {
    if (n <= 1)
      return false;

    for(int i = 2; i <= sqrt(n); ++i)
        if (n % i == 0) return false;

    return true;
}

// 优化算法
bool is_prime_2(int n) {
    if (n <= 3)
        return n > 1;

    // 只有6x-1和6x+1的数才有可能是质数
    if (n % 6 != 1 && n % 6 != 5)
        return false;

    // 判断这些数能否被小于sqrt(n)的奇数整除
    for (int i = 5; i <= sqrt(n); i += 6)
        if (n % i == 0 || n % (i + 2) == 0)
            return false;

    return true;
}

void* fun1(void *arg) {
    int i = 0;
    int count = 0;
    struct timeval begin, end;

    gettimeofday(&begin, 0);
    for (i = 0; i < RANGE; i++)
        if (is_prime_1(i))
            ++count;
    gettimeofday(&end, 0);
    long seconds = end.tv_sec - begin.tv_sec;
    long microseconds = end.tv_usec - begin.tv_usec;
    double elapsed = seconds + microseconds*1e-6;
    printf("use func1111111 static prime, count is %d,  time comsumed: %lf\n\n", count, elapsed);

    return NULL;
}

void* fun2(void *arg) {
    int i = 0;
    int count = 0;
    struct timeval begin, end;

    gettimeofday(&begin, 0);
    for (i = 0; i < RANGE; i++)
        if (is_prime_2(i))
            ++count;
    gettimeofday(&end, 0);
    long seconds = end.tv_sec - begin.tv_sec;
    long microseconds = end.tv_usec - begin.tv_usec;
    double elapsed = seconds + microseconds*1e-6;
    printf("use func222222 static prime, count is %d,  time comsumed: %lf\n\n", count, elapsed);

    return NULL;
}

int main() {
    pthread_t t1, t2;

    pthread_create(&t1, NULL, fun1, NULL);
    pthread_join(t1, NULL);

    pthread_create(&t2, NULL, fun2, NULL);
    pthread_join(t2, NULL);

    return 0;
}

编译后，同样使用Callgrind进行检测，输出报表。

[root@localhost c]# gcc demo.c -lpthread -lm
[root@localhost c]#
[root@localhost c]# valgrind --tool=callgrind ./a.out

Callgrind有一个选项--separate-threads=yes，它会拆分线程，输出多个callgrind.out.pid文件，一个线程对应一个。

输出的报告还不能直接读，因为它描述的是调用关系以及CPU占比，是以图的形式展示的。所以还需要通过gprof2dot.py脚本转成.dot文件。并且，.dot文件也不是最终的报告，它的内容是DOT Language，一种语言，用来告诉Graphviz怎样画图的。

python gprof2dot.py -f callgrind -n10 -s callgrind.out.6204 > test.dot

gprof2dot.py顾名思义，是一个将gprof分析结果转换为图形化调用图的Python脚本。可以通过pip intall gprof2dot下载(当然，你得解决pip源的问题)。或者直接在官网下载：gprof2dot · PyPI

我怕我扯的越来越远，再啰嗦一句，gprof是Linux/Unix系统下一个性能分析工具，用于分析和显示程序的运行情况。Man文档：GNU gprof

Graphviz是一个开源的图形可视化工具包，主要功能是根据描述图的数据生成图形。感兴趣的同学可以访问它的官网：Graphviz

一般linux的都有安装Graphviz工具包，我们最终使用它有有向图工具dot，生成最后的调用关系图。

dot -Tpng test.dot -o test.png

最后，将图片下载下来，就长这样：

在这里插入图片描述

图片上显示的占比为CPU占比，然后就可以根据热点进行程序优化了。如果你不喜欢.png，也可以生成其它格式的文件，dot支持的类型有很多，比如换成svg:

# -T 指定生成的类型
# -o 指定输出文件名
dot -Tsvg test.dot -o test.svg

更多类型，请参考：Output Formats | Graphviz

总结

Valgrind的工具有很多，不可能三言两语全说清楚。只有实际使用上时，才能有更深入的了解。但我希望你永远不需要它，原因懂的都懂。

baboon_chen

关注

20
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录