Anatomy of a Program in Memory（程序运行时内存结构）

转载自：http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory

Memory management is the heart of operating systems; it is crucial for both programming and system administration. In the next few posts I'll cover memory with an eye towards practical aspects, but without shying away from internals. While the concepts are generic, examples are mostly from Linux and Windows on 32-bit x86. This first post describes how programs are laid out in memory.

Each process in a multi-tasking OS runs in its own memory sandbox. This sandbox is the virtual address space , which in 32-bit mode is always a 4GB block of memory addresses . These virtual addresses are mapped to physical memory by page tables , which are maintained by the operating system kernel and consulted by the processor. Each process has its own set of page tables, but there is a catch. Once virtual addresses are enabled, they apply to all software running in the machine, including the kernel itself . Thus a portion of the virtual address space must be reserved to the kernel:

Kernel/User Memory Split

This does not mean the kernel uses that much physical memory, only that it has that portion of address space available to map whatever physical memory it wishes. Kernel space is flagged in the page tables as exclusive to privileged code (ring 2 or lower), hence a page fault is triggered if user-mode programs try to touch it. In Linux, kernel space is constantly present and maps the same physical memory in all processes. Kernel code and data are always addressable, ready to handle interrupts or system calls at any time. By contrast, the mapping for the user-mode portion of the address space changes whenever a process switch happens:

Process Switch Effects on Virtual Memory

Blue regions represent virtual addresses that are mapped to physical memory, whereas white regions are unmapped. In the example above, Firefox has used far more of its virtual address space due to its legendary memory hunger. The distinct bands in the address space correspond to memory segments like the heap, stack, and so on. Keep in mind these segments are simply a range of memory addresses and have nothing to do with Intel-style segments . Anyway, here is the standard segment layout in a Linux process:

Flexible Process Address Space Layout In Linux

When computing was happy and safe and cuddly, the starting virtual addresses for the segments shown above were exactly the same for nearly every process in a machine. This made it easy to exploit security vulnerabilities remotely. An exploit often needs to reference absolute memory locations: an address on the stack, the address for a library function, etc. Remote attackers must choose this location blindly, counting on the fact that address spaces are all the same. When they are, people get pwned. Thus address space randomization has become popular. Linux randomizes thestack , memory mapping segment , and heap by adding offsets to their starting addresses.Unfortunately the 32-bit address space is pretty tight, leaving little room for randomization andhampering its effectiveness .

The topmost segment in the process address space is the stack, which stores local variables and function parameters in most programming languages. Calling a method or function pushes a newstack frame onto the stack. The stack frame is destroyed when the function returns. This simple design, possible because the data obeys strict LIFO order, means that no complex data structure is needed to track stack contents – a simple pointer to the top of the stack will do. Pushing and popping are thus very fast and deterministic. Also, the constant reuse of stack regions tends to keep active stack memory in the cpu caches , speeding up access. Each thread in a process gets its own stack.

It is possible to exhaust the area mapping the stack by pushing more data than it can fit. This triggers a page fault that is handled in Linux by expand_stack() , which in turn callsacct_stack_growth() to check whether it's appropriate to grow the stack. If the stack size is belowRLIMIT_STACK (usually 8MB), then normally the stack grows and the program continues merrily, unaware of what just happened. This is the normal mechanism whereby stack size adjusts to demand.However, if the maximum stack size has been reached, we have a stack overflow and the program receives a Segmentation Fault. While the mapped stack area expands to meet demand, it does not shrink back when the stack gets smaller. Like the federal budget, it only expands.

Dynamic stack growth is the only situation in which access to an unmapped memory region, shown in white above, might be valid. Any other access to unmapped memory triggers a page fault that results in a Segmentation Fault. Some mapped areas are read-only, hence write attempts to these areas also lead to segfaults.

Below the stack, we have the memory mapping segment. Here the kernel maps contents of files directly to memory. Any application can ask for such a mapping via the Linux mmap() system call (implementation ) or CreateFileMapping() / MapViewOfFile() in Windows. Memory mapping is a convenient and high-performance way to do file I/O, so it is used for loading dynamic libraries. It is also possible to create an anonymous memory mapping that does not correspond to any files, being used instead for program data. In Linux, if you request a large block of memory via malloc() , the C library will create such an anonymous mapping instead of using heap memory. 'Large' means larger than MMAP_THRESHOLD bytes, 128 kB by default and adjustable via mallopt() .

Speaking of the heap, it comes next in our plunge into address space. The heap provides runtime memory allocation, like the stack, meant for data that must outlive the function doing the allocation, unlike the stack. Most languages provide heap management to programs. Satisfying memory requests is thus a joint affair between the language runtime and the kernel. In C, the interface to heap allocation is malloc() and friends, whereas in a garbage-collected language like C# the interface is thenew keyword.

If there is enough space in the heap to satisfy a memory request, it can be handled by the language runtime without kernel involvement. Otherwise the heap is enlarged via the brk() system call (implementation ) to make room for the requested block. Heap management is complex , requiring sophisticated algorithms that strive for speed and efficient memory usage in the face of our programs' chaotic allocation patterns. The time needed to service a heap request can vary substantially. Real-time systems have special-purpose allocators to deal with this problem. Heaps also becomefragmented , shown below:

Fragmented Heap

Finally, we get to the lowest segments of memory: BSS, data, and program text. Both BSS and data store contents for static (global) variables in C. The difference is that BSS stores the contents ofuninitialized static variables, whose values are not set by the programmer in source code. The BSS memory area is anonymous: it does not map any file. If you say static int cntActiveUsers , the contents of cntActiveUsers live in the BSS.

The data segment, on the other hand, holds the contents for static variables initialized in source code. This memory area is not anonymous . It maps the part of the program's binary image that contains the initial static values given in source code. So if you say static int cntWorkerBees = 10 , the contents of cntWorkerBees live in the data segment and start out as 10. Even though the data segment maps a file, it is a private memory mapping , which means that updates to memory are not reflected in the underlying file. This must be the case, otherwise assignments to global variables would change your on-disk binary image. Inconceivable!

The data example in the diagram is trickier because it uses a pointer. In that case, the contents of pointer gonzo – a 4-byte memory address – live in the data segment. The actual string it points to does not, however. The string lives in the text segment, which is read-only and stores all of your code in addition to tidbits like string literals. The text segment also maps your binary file in memory, but writes to this area earn your program a Segmentation Fault. This helps prevent pointer bugs, though not as effectively as avoiding C in the first place. Here's a diagram showing these segments and our example variables:

ELF Binary Image Mapped Into Memory

You can examine the memory areas in a Linux process by reading the file /proc/pid_of_process/maps .Keep in mind that a segment may contain many areas. For example, each memory mapped file normally has its own area in the mmap segment, and dynamic libraries have extra areas similar to BSS and data. The next post will clarify what 'area' really means. Also, sometimes people say “data segment” meaning all of data + bss + heap.

You can examine binary images using the nm and objdump commands to display symbols, their addresses, segments, and so on. Finally, the virtual address layout described above is the “flexible” layout in Linux, which has been the default for a few years. It assumes that we have a value forRLIMIT_STACK . When that's not the case, Linux reverts back to the “classic” layout shown below:

Classic Process Address Space Layout In Linux

That's it for virtual address space layout. The next post discusses how the kernel keeps track of these memory areas. Coming up we'll look at memory mapping, how file reading and writing ties into all this and what memory usage figures mean.

/**********************************************************************

以下为Google机器人翻译的，供参考

***********************************************************************/

解剖一个程序在内存中

内存管理是操作系统的心脏，它是至关重要的编程和系统管理。在接下来的几个帖子中，我将覆盖内存，着眼实际问题，但没有回避内部。虽然这些概念是通用的，例子大多是从Linux和Windows上的32位x86。这第一篇文章介绍了如何计划奠定了在内存中。

一个多任务的操作系统中的每个进程运行在它自己的内存沙箱。此沙箱是在32位模式下的虚拟地址空间 ，这始终是一个4GB的内存地址块 。这些虚拟地址映射到物理内存页表，这是维护操作系统的内核和处理器咨询。每个进程都有自己的一套页表，但有一个陷阱。一旦启用了虚拟地址，它们适用于所有的机器上运行的软件， 包括内核本身 。这样的虚拟地址空间的一部分，必须保留到内核：

内核/用户记忆体分割

这并不意味着内核使用多少物理内存，只知道它的地址空间映射的物理内存，它希望的那部分。内核空间被标记在页表中的独家特权代码（环2或更低），因此一个页面故障被触发，如果用户模式的程序试图去触摸它。在Linux中，内核空间是持续存在并映射到相同的物理内存中的所有进程。内核代码和数据总是可寻址的，在任何时候，准备处理中断或系统调用。相比之下，映射的地址空间为用户模式部分的变化发生进程切换时：

对虚拟内存的进程切换

蓝色区域代表的虚拟地址映射到物理内存，而白色区域的非映射。在上面的例子中，Firefox已经使用更为其虚拟地址空间，由于其传奇般的记忆饥饿。不同频段的地址空间中对应的内存段，如堆，栈等。请记住，这些细分市场仅仅是一个记忆体位址范围， 什么都没有做的英特尔式的段。无论如何，这是一个Linux进程的标准部分布局在：

在Linux灵活的进程地址空间布局

当计算是幸福和安全的，可爱的，上面显示的段的起始虚拟地址几乎每一道工序在一台机器上是完全一样的 。这使得很容易地远程利用的安全漏洞。这是一种攻击往往需要引用绝对内存地址在栈上的地址，库函数的地址，等等。远程攻击者必须选择这个位置盲目，寄望于一个事实，即地址空间都是一样的。当他们是，人们获得PWNED。因此，地址空间随机化已经成为流行。 Linux的堆栈，内存映射段，堆随机化，加入到他们的起始地址的偏移量。不幸的是，32位地址空间是非常紧张，留下一点空间随机化，妨碍了其有效性。

进程的地址空间中的堆栈最上面的部分，存储在大多数编程语言中的局部变量和函数参数。调用方法或函数推到堆栈上的一个新的堆栈帧 。当函数返回时的堆栈帧被破坏。这个简单的设计，可能是因为数据遵循严格的后进先出顺序，这意味着没有复杂的数据结构需要跟踪堆栈的内容-一个简单的指针的堆栈的顶部就可以了。入栈和出栈是非常快速和确定性。此外，不断重复使用的堆栈区域会继续保持活跃的栈内存的CPU缓存中，加快访问。过程中的每个线程都有自己的堆栈。

这是可能的映射由堆栈推它可以容纳更多的数据比用尽的面积。这将触发一个页故障处理在Linux expand_stack（），这反过来又调用acct_stack_growth（）来检查它是否是适当的增长堆栈。如果栈的大小低于RLIMIT_STACK（通常为8MB），那么通常是堆栈的增长和程序继续愉快，不知道刚刚发生了什么。这是正常的机制，堆栈大小调整要求。然而，如果已达到最大堆栈大小，我们有一个堆栈溢出 ，程序接收到一个分割故障。映射的堆栈区域扩展，以满足需求，它不退缩，当堆栈变小。联邦预算一样，它不仅扩展。

动态栈增长是唯一的情况中，接触到一个映射的内存区域，在白色上面所示，可能是有效的。任何其他访问映射的内存，结果在一个分割故障触发页面错误。一些映射的区域是只读的，因此写这些领域的尝试也可能导致出现segfaults。

下面的堆栈，我们的内存映射段。在这里，内核直接到内存映射文件的内容。任何应用程序都可以问这样的映射，通过对Linux 的mmap（）系统调用（执行）或CreateFileMapping（） / MapViewOfFile（）在Windows中。执行文件I / O，内存映射是一个方便的和高性能的方式，因此，它被用于加载动态库。另外，也可以创建一个匿名的，不对应于任何文件，节目数据，而不是正在使用的存储器映射 。在Linux中，如果你要求一个大的内存块，通过malloc（）函数，C库会创建这样一个匿名映射而不是使用堆内存。 “大型”是指大于128 KB MMAP_THRESHOLD字节，默认情况下，可调通过mallopt（）。

说到堆，接下来我们投身到地址空间中。堆提供运行时的内存分配，如堆栈，这意味着必须活得比做了分配的功能，不同的是堆栈的数据。大多数语言提供堆管理程序。因此，满足内存请求的是一个共同语言运行时和内核之间的事。在C语言中，堆分配的接口是malloc（）函数和朋友，而在垃圾收集的语言，如C＃中的接口是新的关键字。

如果有足够的空间在堆中，以满足内存请求，它可以处理语言运行时没有涉及内核。否则，通过BRK（）系统调用（执行），以腾出空间所请求的块堆被放大。堆管理是复杂的，需要复杂的算法，争取在面对我们的计划“混沌分配格局的速度和内存使用效率。堆请求提供服务所需要的时间，可以有很大差异。实时的系统有特殊用途的分配器来处理这个问题。堆也变得支离破碎 ，如下图所示：

零散的堆

最后，我们得到的最低内存段：BSS，数据和程序的文本。 BSS和数据存储的内容为静态（全局）变量的差异是BSS的未初始化的静态变量，其值由程序员在源代码中未设置的内容存储。 BSS内存区域是匿名的：它不映射任何文件。如果你说的内容静态诠释cntActiveUsers，的cntActiveUsers生活中的BSS。

上，另一方面，在所述数据段持有在源代码中初始化的静态变量的内容。此存储区是不是匿名的 。它的一部分，该程序的二进制图象，其中包含在源代码中的给定值的初始静态映射。所以，如果你说静态诠释cntWorkerBees的= 10，的cntWorkerBees住在数据段中的内容，并开始为10。尽管数据段映射了一个文件，它是一个私有内存映射 ，这意味着更新到内存中不反映在底层的文件。这必须是这样，否则对全局变量的赋值会改变你的磁盘上的二进制图像。不可思议！

例如图中的数据是棘手的，因为它使用了指针。在这种情况下，内容的指针的gonzo -一个4字节的存储器地址-生活中的数据段。它指向的实际字符串，但是不。住在文本段，它是只读的，除了花絮，如字符串常量存储所有你的代码中的字符串。文本段也映射在内存中的二进制文件，但写这方面赚你的程序分割故障。这有助于防止指针错误，但不作为有效为避免C的首位。这里有一个图表显示这些细分市场和我们的例子中的变量：

ELF二进制图像映射到内存中

您可以检查一个Linux进程的内存区域读取文件/ proc / pid_of_process /图 。请记住，一个段可能包含许多领域。例如，每个存储器映射的文件通常具有其自己的区域在mmap段，和动态库有额外的区域类似的BSS和数据。下一篇文章将澄清'区域'的真正含义是什么。此外，有时人们说，“数据段”，这意味着所有的数据+ bss +堆。

您可以检查使用纳米和objdump的命令来显示符号，它们的地址，段，等的二进制图像。最后，上述的虚拟地址布局是“灵活”的布局在Linux中，这一直是默认的几年。它假定我们有一个值RLIMIT_STACK的 ，。如果不是的话，Linux的恢复到“经典”的布局如下图所示：

在Linux中的经典进程地址空间布局

这就是它的虚拟地址空间布局。下一篇文章将讨论内核是如何跟踪这些内存区域。我们来看看内存映射文件阅读和写作的关系到了这一切，内存使用数字意味着什么。