Valgrind(Memcheck)错误消息的解释

最新推荐文章于 2022-08-15 20:02:12 发布

fdtsaid

最新推荐文章于 2022-08-15 20:02:12 发布

阅读量1.4k

点赞数 1

分类专栏：编程语言

编程语言专栏收录该内容

10 篇文章 0 订阅

订阅专栏

Explanation of error messages from Memcheck

Memcheck issues a range of error messages. This section presents a quick summary of what error messages mean. The precise behaviour of the error-checking machinery is described in Details of Memcheck’s checking machinery.

4.2.1. Illegal read / Illegal write errors

For example:

Invalid read of size 4
   at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
   by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
   by 0x40B07FF4: read_png_image(QImageIO *) (kernel/qpngio.cpp:326)
   by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
 Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd

This happens when your program reads or writes memory at a place which Memcheck reckons it shouldn’t. In this example, the program did a 4-byte read at address 0xBFFFF0E0, somewhere within the system-supplied library libpng.so.2.1.0.9, which was called from somewhere else in the same library, called from line 326 of qpngio.cpp, and so on.

Memcheck tries to establish what the illegal address might relate to, since that’s often useful. So, if it points into a block of memory which has already been freed, you’ll be informed of this, and also where the block was freed. Likewise, if it should turn out to be just off the end of a heap block, a common result of off-by-one-errors in array subscripting, you’ll be informed of this fact, and also where the block was allocated. If you use the --read-var-info option Memcheck will run more slowly but may give a more detailed description of any illegal address.

In this example, Memcheck can’t identify the address. Actually the address is on the stack, but, for some reason, this is not a valid stack address – it is below the stack pointer and that isn’t allowed. In this particular case it’s probably caused by GCC generating invalid code, a known bug in some ancient versions of GCC.

Note that Memcheck only tells you that your program is about to access memory at an illegal address. It can’t stop the access from happening. So, if your program makes an access which normally would result in a segmentation fault, you program will still suffer the same fate – but you will get a message from Memcheck immediately prior to this. In this particular example, reading junk on the stack is non-fatal, and the program stays alive.

4.2.2. Use of uninitialised values

For example:

Conditional jump or move depends on uninitialised value(s)
   at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
   by 0x402E8476: _IO_printf (printf.c:36)
   by 0x8048472: main (tests/manuel1.c:8)

An uninitialised-value use error is reported when your program uses a value which hasn’t been initialised – in other words, is undefined. Here, the undefined value is used somewhere inside the printf machinery of the C library. This error was reported when running the following small program:

int main()
{
  int x;
  printf ("x = %d\n", x);
}

It is important to understand that your program can copy around junk (uninitialised) data as much as it likes. Memcheck observes this and keeps track of the data, but does not complain. A complaint is issued only when your program attempts to make use of uninitialised data in a way that might affect your program’s externally-visible behaviour. In this example, x is uninitialised. Memcheck observes the value being passed to _IO_printf and thence to _IO_vfprintf, but makes no comment. However, _IO_vfprintf has to examine the value of x so it can turn it into the corresponding ASCII string, and it is at this point that Memcheck complains.

Sources of uninitialised data tend to be:

Local variables in procedures which have not been initialised, as in the example above.
The contents of heap blocks (allocated with malloc, new, or a similar function) before you (or a constructor) write something there.

To see information on the sources of uninitialised data in your program, use the --track-origins=yes option. This makes Memcheck run more slowly, but can make it much easier to track down the root causes of uninitialised value errors.

4.2.3. Use of uninitialised or unaddressable values in system calls

Memcheck checks all parameters to system calls:

It checks all the direct parameters themselves, whether they are initialised.
Also, if a system call needs to read from a buffer provided by your program, Memcheck checks that the entire buffer is addressable and its contents are initialised.
Also, if the system call needs to write to a user-supplied buffer, Memcheck checks that the buffer is addressable.

After the system call, Memcheck updates its tracked information to precisely reflect any changes in memory state caused by the system call.

Here’s an example of two system calls with invalid parameters:

  #include <stdlib.h>
  #include <unistd.h>
  int main( void )
  {
    char* arr  = malloc(10);
    int*  arr2 = malloc(sizeof(int));
    write( 1 /* stdout */, arr, 10 );
    exit(arr2[0]);
  }

You get these complaints …

  Syscall param write(buf) points to uninitialised byte(s)
     at 0x25A48723: __write_nocancel (in /lib/tls/libc-2.3.3.so)
     by 0x259AFAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
     by 0x8048348: (within /auto/homes/njn25/grind/head4/a.out)
   Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc'd
     at 0x259852B0: malloc (vg_replace_malloc.c:130)
     by 0x80483F1: main (a.c:5)

  Syscall param exit(error_code) contains uninitialised byte(s)
     at 0x25A21B44: __GI__exit (in /lib/tls/libc-2.3.3.so)
     by 0x8048426: main (a.c:8)

… because the program has (a) written uninitialised junk from the heap block to the standard output, and (b) passed an uninitialised value to exit. Note that the first error refers to the memory pointed to by buf (not buf itself), but the second error refers directly to exit’s argument arr2[0].

4.2.4. Illegal frees

For example:

Invalid free()
   at 0x4004FFDF: free (vg_clientmalloc.c:577)
   by 0x80484C7: main (tests/doublefree.c:10)
 Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
   at 0x4004FFDF: free (vg_clientmalloc.c:577)
   by 0x80484C7: main (tests/doublefree.c:10)

Memcheck keeps track of the blocks allocated by your program with malloc/new, so it can know exactly whether or not the argument to free/delete is legitimate or not. Here, this test program has freed the same block twice. As with the illegal read/write errors, Memcheck attempts to make sense of the address freed. If, as here, the address is one which has previously been freed, you wil be told that – making duplicate frees of the same block easy to spot. You will also get this message if you try to free a pointer that doesn’t point to the start of a heap block.

4.2.5. When a heap block is freed with an inappropriate deallocation function

In the following example, a block allocated with new[] has wrongly been deallocated with free:

Mismatched free() / delete / delete []
   at 0x40043249: free (vg_clientfuncs.c:171)
   by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
   by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
   by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
 Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
   at 0x4004318C: operator new[](unsigned int) (vg_clientfuncs.c:152)
   by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
   by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
   by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)

In C++ it’s important to deallocate memory in a way compatible with how it was allocated. The deal is:

If allocated with malloc, calloc, realloc, valloc or memalign, you must deallocate with free.
If allocated with new, you must deallocate with delete.
If allocated with new[], you must deallocate with delete[].

The worst thing is that on Linux apparently it doesn’t matter if you do mix these up, but the same program may then crash on a different platform, Solaris for example. So it’s best to fix it properly. According to the KDE folks “it’s amazing how many C++ programmers don’t know this”.

The reason behind the requirement is as follows. In some C++ implementations, delete[] must be used for objects allocated by new[] because the compiler stores the size of the array and the pointer-to-member to the destructor of the array’s content just before the pointer actually returned. delete doesn’t account for this and will get confused, possibly corrupting the heap.

4.2.6. Overlapping source and destination blocks

The following C library functions copy some data from one memory block to another (or something similar): memcpy, strcpy, strncpy, strcat, strncat. The blocks pointed to by their src and dst pointers aren’t allowed to overlap. The POSIX standards have wording along the lines “If copying takes place between objects that overlap, the behavior is undefined.” Therefore, Memcheck checks for this.

For example:

==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21)
==27492==    at 0x40026CDC: memcpy (mc_replace_strmem.c:71)
==27492==    by 0x804865A: main (overlap.c:40)

You don’t want the two blocks to overlap because one of them could get partially overwritten by the copying.

You might think that Memcheck is being overly pedantic reporting this in the case where dst is less than src. For example, the obvious way to implement memcpy is by copying from the first byte to the last. However, the optimisation guides of some architectures recommend copying from the last byte down to the first. Also, some implementations of memcpy zero dst before copying, because zeroing the destination’s cache line(s) can improve performance.

The moral of the story is: if you want to write truly portable code, don’t make any assumptions about the language implementation.

4.2.7. Fishy argument values

All memory allocation functions take an argument specifying the size of the memory block that should be allocated. Clearly, the requested size should be a non-negative value and is typically not excessively large. For instance, it is extremely unlikly that the size of an allocation request exceeds 2**63 bytes on a 64-bit machine. It is much more likely that such a value is the result of an erroneous size calculation and is in effect a negative value (that just happens to appear excessively large because the bit pattern is interpreted as an unsigned integer). Such a value is called a “fishy value”. The size argument of the following allocation functions is checked for being fishy: malloc, calloc, realloc, memalign, new, new []. __builtin_new, __builtin_vec_new, For calloc both arguments are being checked.

For example:

==32233== Argument 'size' of function malloc has a fishy (possibly negative) value: -3
==32233==    at 0x4C2CFA7: malloc (vg_replace_malloc.c:298)
==32233==    by 0x400555: foo (fishy.c:15)
==32233==    by 0x400583: main (fishy.c:23)

In earlier Valgrind versions those values were being referred to as “silly arguments” and no back-trace was included.

4.2.8. Memory leak detection

Memcheck keeps track of all heap blocks issued in response to calls to malloc/new et al. So when the program exits, it knows which blocks have not been freed.

If --leak-check is set appropriately, for each remaining block, Memcheck determines if the block is reachable from pointers within the root-set. The root-set consists of (a) general purpose registers of all threads, and (b) initialised, aligned, pointer-sized data words in accessible client memory, including stacks.

There are two ways a block can be reached. The first is with a “start-pointer”, i.e. a pointer to the start of the block. The second is with an “interior-pointer”, i.e. a pointer to the middle of the block. There are several ways we know of that an interior-pointer can occur:

The pointer might have originally been a start-pointer and have been moved along deliberately (or not deliberately) by the program. In particular, this can happen if your program uses tagged pointers, i.e. if it uses the bottom one, two or three bits of a pointer, which are normally always zero due to alignment, in order to store extra information.
It might be a random junk value in memory, entirely unrelated, just a coincidence.
It might be a pointer to the inner char array of a C++ std::string. For example, some compilers add 3 words at the beginning of the std::string to store the length, the capacity and a reference count before the memory containing the array of characters. They return a pointer just after these 3 words, pointing at the char array.
Some code might allocate a block of memory, and use the first 8 bytes to store (block size - 8) as a 64bit number. sqlite3MemMalloc does this.
It might be a pointer to an array of C++ objects (which possess destructors) allocated with new[]. In this case, some compilers store a “magic cookie” containing the array length at the start of the allocated block, and return a pointer to just past that magic cookie, i.e. an interior-pointer. See this page for more information.
It might be a pointer to an inner part of a C++ object using multiple inheritance.

You can optionally activate heuristics to use during the leak search to detect the interior pointers corresponding to the stdstring, length64, newarray and multipleinheritance cases. If the heuristic detects that an interior pointer corresponds to such a case, the block will be considered as reachable by the interior pointer. In other words, the interior pointer will be treated as if it were a start pointer.

With that in mind, consider the nine possible cases described by the following figure.

     Pointer chain            AAA Leak Case   BBB Leak Case
     -------------            -------------   -------------
(1)  RRR ------------> BBB                    DR
(2)  RRR ---> AAA ---> BBB    DR              IR
(3)  RRR               BBB                    DL
(4)  RRR      AAA ---> BBB    DL              IL
(5)  RRR ------?-----> BBB                    (y)DR, (n)DL
(6)  RRR ---> AAA -?-> BBB    DR              (y)IR, (n)DL
(7)  RRR -?-> AAA ---> BBB    (y)DR, (n)DL    (y)IR, (n)IL
(8)  RRR -?-> AAA -?-> BBB    (y)DR, (n)DL    (y,y)IR, (n,y)IL, (_,n)DL
(9)  RRR      AAA -?-> BBB    DL              (y)IL, (n)DL

Pointer chain legend:
- RRR: a root set node or DR block
- AAA, BBB: heap blocks
- --->: a start-pointer
- -?->: an interior-pointer

Leak Case legend:
- DR: Directly reachable
- IR: Indirectly reachable
- DL: Directly lost
- IL: Indirectly lost
- (y)XY: it's XY if the interior-pointer is a real pointer
- (n)XY: it's XY if the interior-pointer is not a real pointer
- (_)XY: it's XY in either case

Every possible case can be reduced to one of the above nine. Memcheck merges some of these cases in its output, resulting in the following four leak kinds.

“Still reachable”. This covers cases 1 and 2 (for the BBB blocks) above. A start-pointer or chain of start-pointers to the block is found. Since the block is still pointed at, the programmer could, at least in principle, have freed it before program exit. “Still reachable” blocks are very common and arguably not a problem. So, by default, Memcheck won’t report such blocks individually.
“Definitely lost”. This covers case 3 (for the BBB blocks) above. This means that no pointer to the block can be found. The block is classified as “lost”, because the programmer could not possibly have freed it at program exit, since no pointer to it exists. This is likely a symptom of having lost the pointer at some earlier point in the program. Such cases should be fixed by the programmer.
“Indirectly lost”. This covers cases 4 and 9 (for the BBB blocks) above. This means that the block is lost, not because there are no pointers to it, but rather because all the blocks that point to it are themselves lost. For example, if you have a binary tree and the root node is lost, all its children nodes will be indirectly lost. Because the problem will disappear if the definitely lost block that caused the indirect leak is fixed, Memcheck won’t report such blocks individually by default.
“Possibly lost”. This covers cases 5–8 (for the BBB blocks) above. This means that a chain of one or more pointers to the block has been found, but at least one of the pointers is an interior-pointer. This could just be a random value in memory that happens to point into a block, and so you shouldn’t consider this ok unless you know you have interior-pointers.

(Note: This mapping of the nine possible cases onto four leak kinds is not necessarily the best way that leaks could be reported; in particular, interior-pointers are treated inconsistently. It is possible the categorisation may be improved in the future.)

Furthermore, if suppressions exists for a block, it will be reported as “suppressed” no matter what which of the above four kinds it belongs to.

The following is an example leak summary.

LEAK SUMMARY:
   definitely lost: 48 bytes in 3 blocks.
   indirectly lost: 32 bytes in 2 blocks.
     possibly lost: 96 bytes in 6 blocks.
   still reachable: 64 bytes in 4 blocks.
        suppressed: 0 bytes in 0 blocks.

If heuristics have been used to consider some blocks as reachable, the leak summary details the heuristically reachable subset of ‘still reachable:’ per heuristic. In the below example, of the 95 bytes still reachable, 87 bytes (56+7+8+16) have been considered heuristically reachable.

LEAK SUMMARY:
   definitely lost: 4 bytes in 1 blocks
   indirectly lost: 0 bytes in 0 blocks
     possibly lost: 0 bytes in 0 blocks
   still reachable: 95 bytes in 6 blocks
                      of which reachable via heuristic:
                        stdstring          : 56 bytes in 2 blocks
                        length64           : 16 bytes in 1 blocks
                        newarray           : 7 bytes in 1 blocks
                        multipleinheritance: 8 bytes in 1 blocks
        suppressed: 0 bytes in 0 blocks

If --leak-check=full is specified, Memcheck will give details for each definitely lost or possibly lost block, including where it was allocated. (Actually, it merges results for all blocks that have the same leak kind and sufficiently similar stack traces into a single “loss record”. The --leak-resolution lets you control the meaning of “sufficiently similar”.) It cannot tell you when or how or why the pointer to a leaked block was lost; you have to work that out for yourself. In general, you should attempt to ensure your programs do not have any definitely lost or possibly lost blocks at exit.

For example:

8 bytes in 1 blocks are definitely lost in loss record 1 of 14
   at 0x........: malloc (vg_replace_malloc.c:...)
   by 0x........: mk (leak-tree.c:11)
   by 0x........: main (leak-tree.c:39)

88 (8 direct, 80 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 14
   at 0x........: malloc (vg_replace_malloc.c:...)
   by 0x........: mk (leak-tree.c:11)
   by 0x........: main (leak-tree.c:25)

The first message describes a simple case of a single 8 byte block that has been definitely lost. The second case mentions another 8 byte block that has been definitely lost; the difference is that a further 80 bytes in other blocks are indirectly lost because of this lost block. The loss records are not presented in any notable order, so the loss record numbers aren’t particularly meaningful. The loss record numbers can be used in the Valgrind gdbserver to list the addresses of the leaked blocks and/or give more details about how a block is still reachable.

The option --show-leak-kinds= controls the set of leak kinds to show when --leak-check=full is specified.

The of leak kinds is specified in one of the following ways:

a comma separated list of one or more of definite indirect possible reachable.
all to specify the complete set (all leak kinds).
none for the empty set.

The default value for the leak kinds to show is --show-leak-kinds=definite,possible.

To also show the reachable and indirectly lost blocks in addition to the definitely and possibly lost blocks, you can use --show-leak-kinds=all. To only show the reachable and indirectly lost blocks, use --show-leak-kinds=indirect,reachable. The reachable and indirectly lost blocks will then be presented as shown in the following two examples.

64 bytes in 4 blocks are still reachable in loss record 2 of 4
   at 0x........: malloc (vg_replace_malloc.c:177)
   by 0x........: mk (leak-cases.c:52)
   by 0x........: main (leak-cases.c:74)

32 bytes in 2 blocks are indirectly lost in loss record 1 of 4
   at 0x........: malloc (vg_replace_malloc.c:177)
   by 0x........: mk (leak-cases.c:52)
   by 0x........: main (leak-cases.c:80)

Because there are different kinds of leaks with different severities, an interesting question is: which leaks should be counted as true “errors” and which should not?

The answer to this question affects the numbers printed in the ERROR SUMMARY line, and also the effect of the --error-exitcode option. First, a leak is only counted as a true “error” if --leak-check=full is specified. Then, the option --errors-for-leak-kinds= controls the set of leak kinds to consider as errors. The default value is --errors-for-leak-kinds=definite,possible

fdtsaid

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Valgrind(Memcheck)错误消息的解释

Explanation of error messages from MemcheckMemcheck issues a range of error messages. This section presents a quick summary of what error messages mean. The precise behaviour of the error-checking ma...
复制链接

扫一扫

专栏目录