Valgrind underground

5. Let's Go Deeper

Valgrind simulates an Intel x86 processor and runs our test program in this synthetic processor. The two processors are not exactly same. Valgrind is compiled into a shared object, valgrind.so. A shell script valgrindsets the LD_PRELOAD environment variable to point to valgrind.so. This causes the .so to be loaded as an extra library to any subsequently executed dynamically-linked ELF binary, permitting the program to be debugged.

The dynamic linker calls the initialization function of Valgrind. Then the synthetic CPU takes control from the real CPU. In the memory there may be some other .so files. The dynamic linker calls the initialization function of all such .so files. Now the dynamic linker calls the main of the loaded program. When main returns, the synthetic CPU calls the finalization function of valgrind.so. During the execution of the finalization function, summary of all errors detected are printed and memory leaks are checked. Finalization function exits giving back the control from the synthetic CPU to the real one.


5.1. How Valgrind Tracks Validity of Each Byte

For every byte processed, the synthetic processor maintains 9 bits, 8 'V' bits and 1 'A' bit. The 'V' bits indicate the validity of the 8 bits in the byte and the 'A' bit indicates validity of the byte address. These valid-value(V) bits are checked only in two situations:


when data is used for address generation,


when control flow decision is to be made.


In any of these two situations, if the data is found to be undefined an error report will be generated. But no error reports are generated while copying or adding undefined data.

However the case with floating-point data is different. During a floating-point read instruction the 'V' bits corresponding to the data are checked. Thus copying of uninitialized value will produce error in case of floating-point numbers.

#include <stdlib.h>  int main()  {          int *p, *a;          p = malloc(10*sizeof(int));          a = malloc(10*sizeof(int));          a[3] = p[3];          free(a);          free(p);          return 0;  }    /*  produce no errors */    

#include <stdlib.h>  int main()  {          float *p, *a;          p = malloc(10*sizeof(float));          a = malloc(10*sizeof(float));          a[3] = p[3];          free(a);          free(p);          return 0;  }    /* produces error */    

All bytes that are in memory but not in CPU have an associated valid-address(A) bit, which indicates whether the corresponding memory location is accessible by the program. When a program starts, the 'A' bits corresponding to each global variables are set. When a call mallocnew or any other memory allocating function is made, the 'A' bits corresponding to the allocated bytes are set. Upon freeing the allocated block usingfree/new/new‘’ the corresponding 'A' bits are cleared. While doing a system call the 'A' bits are changed appropriately.

When values are loaded from memory the 'A' bits corresponding to each bytes are checked by Valgrind, and if the 'A' bit corresponding to a byte is set then its 'V' bits is checked. If the 'V' bits are not set, an error will be generated and the 'V' bits are set to indicate validity. This avoids long chain of errors. If the 'A' bit corresponding to a loaded byte is 0 then its 'V' bits are forced to set, despite the value being invalid.

Have a look on the following program. Run it.

#include <stdlib.h>  int main()  {          int *p, j;          p = malloc(5*sizeof(int));          j = p[5];          if (p[5] == 1)                  i = p[5]+1;          free(p);          return 0;  }  

Here two errors occur. Both of them are due to the accessing address location p + sizeof(int)*5 which is not allocated to the program. During the execution of j = p[5], since the address p + sizeof(int)*5 is invalid, the 'V' bits of 4 bytes starting at location p+sizeof(int)*5 are forced to set. Therefore uninitialized value occurs neither during the execution of j = p[5] nor during the execution of if(p[5]==1).


5.2. Cache Profiling

Modern x86 machines use two levels of caching. These levels are L1 and L2, in which L1 is a split cache that consists of Instruction cache(I1) and Data cache(D1). L2 is a unified cache.

The configuration of a cache means its size, associativity and number of lines. If the data requested by the processor appears in the upper level it is called a hit. If the data is not found in the upper level, the request is called a miss. The lower level in the hierarchy is then accessed to retrieve the block containing requested data. In modern machines L1 is first searched for data/instruction requested by the processor. If it is a hit then that data/instruction is copied to some register in the processor. Otherwise L2 is searched. If it is a hit then data/instruction is copied to L1 and from there it is copied to a register. If the request to L2 also is a miss then main memory has to be accessed.

Valgrind can simulate the cache, meaning it can display the things that occur in the cache when a program is running. For this, first compile your program with -g option as usual. Then use the shell script cachegrindinstead of valgrind.

Sample output:

==7436== I1  refs:      12,841  ==7436== I1  misses:       238  ==7436== L2i misses:       237  ==7436== I1  miss rate:   1.85%  ==7436== L2i miss rate:   1.84%  ==7436==  ==7436== D   refs:       5,914  (4,626 rd + 1,288 wr)  ==7436== D1  misses:       357  (  324 rd +    33 wr)  ==7436== L2d misses:       352  (  319 rd +    33 wr)  ==7436== D1  miss rate:    6.0% (  7.0%   +   2.5%  )  ==7436== L2d miss rate:    5.9% (  6.8%   +   2.5%  )  ==7436==  ==7436== L2 refs:          595  (  562 rd +    33 wr)  ==7436== L2 misses:        589  (  556 rd +    33 wr)  ==7436== L2 miss rate:     3.1% (  3.1%   +   2.5%  )  

   L2i misses means the number of instruction misses that occur in L2  cache.     L2d misses means the number of data misses that occur in L2 cache.     Total number of data references = Number of reads + Number of writes.     Miss rate means fraction of misses that are not found in the upper  level.  

The shell script cachegrind also produces a file, cachegrind.out, that contains line-by-line cache profiling information which is not humanly understandable. A program vg_annotate can easily interpret this information. If the shell script vg_annotate is used without any arguments it will read the file cachegrind.out and produce an output which is humanly understandable.

When C, C++ or assembly source programs are passed as input to vg_annotate it displays the number of cache reads, writes, misses etc.

I1 cache:         16384 B, 32 B, 4-way associative  D1 cache:         16384 B, 32 B, 4-way associative  L2 cache:         262144 B, 32 B, 8-way associative  Command:          ./a.out  Events recorded:  Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw  Events shown:     Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw  Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw  Thresholds:       99 0 0 0 0 0 0 0 0  Include dirs:  User annotated:   valg_flo.c  Auto-annotation:  off  

User-annotated source: valg_flo.c:

Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw     .   .   .   .   .    .   .   .    .   #include<stdlib.h>   .   .   .   .   .    .   .   .    .   int main()   3   1   1   .   .    .   1   0    0   {   .   .   .   .   .    .   .   .    .           float *p, *a;   6   1   1   .   .    .   3   0    0           p = malloc(10*sizeof(float));   6   0   0   .   .    .   3   0    0           a = malloc(10*sizeof(float));   6   1   1   3   1    1   1   1    1           a[3] = p[3];   4   0   0   1   0    0   1   0    0           free(a);   4   0   0   1   0    0   1   0    0           free(p);   2   0   0   2   0    0   .   .    .   }  


Ir = Total instruction cache reads.


I1mr = I1 cache read misses.


I2mr = L2 cache instruction read misses.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了小程序应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 适合毕业设计、课程设计作业。这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。 所有源码均经过严格测试,可以直接运行,可以放心下载使用。有任何使用问题欢迎随时与博主沟通,第一时间进行解答!
提供的源码资源涵盖了小程序应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 适合毕业设计、课程设计作业。这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。 所有源码均经过严格测试,可以直接运行,可以放心下载使用。有任何使用问题欢迎随时与博主沟通,第一时间进行解答!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值