要build 一个C++ Program, 需要经过3个步骤:
(1) preprocessor: recognize meta-information about the code
这些meta information 如下: pricessir aims at directives which starts with # character like: #include <iostream>
-----#include <file>: insert the specified header file into the code at the location of directives
-----#define [key] [value]: replace the key with the value everywhere, or define constents or macros
-----#ifndef [key]
#define [key]
.............................
#endif
(2)compiler: translate source code into machine depend object code
(3)linker:link togrether all individual files into an application (.exe).
一旦我们生成了.exe 文件, 我们就可以运行了(run)。
例如程序:
#include <stdio.h>
int g_i = 100; /* A global variable */
int g_j; /* An uninitialized global variable */
int main(void) /* A function */
{
int l_i = 1; /* A local variable */
static int s_i = 2; /* A static local variable */
int c;
for (c = 0; c < 1000; c++)
{
l_i += c;
}
return 0;
}
编译上述程序, 生成可执行文件.exe 。
When this code is compiled and linked, we have an executable “program”. When we execute the program, this code will be loaded into the virtual address space that the operating system (操作系统分配的虚拟内存空间)allocates for it. And in this post I will talk about how the instructions and the data of the program are arranged in its virtual address space.
Basically, the memory space for a program has two parts: the code segment(代码段) which holds the program’s executable instructions (程序的可执行命令)and the data segment (数据段)which holds data manipulated by the instructions(可执行命令处理的数据).
此时, 该可执行文件被load进虚拟内存空间中。 该程序在虚拟内存空间的分配情况如下:
memory layout of a program
或者:
注意:
Code Segment(代码段)
The code segment, more often called Text Segment, starts usually from the low address and contains the executable instructions (code) of the program. The text segment is static and protected from modification, which means that once loaded, the content of the text segment cannot be modified.
In our example, the text segment contains the machine instructions corresponding to our main()
function, including the initialization instruction of the local variable l_i
, the initialization instruction of the loop counter c
and the loop itself.
“Data Segment”
The “data segment” is more complicated than the code segment, and it is more “active” also. The “data segment” can be further classified into 4 sections.
Initialized Data Segment(已经初始化了的数据段)
The Initialized Data Segment is the portion of memory space which contains global variables(全局变量) and static variables(静态变量, 包括全局静态变量和局部静态变量) that are initialized by the code. In our example, the global variable g_i
and the static variable s_i
are stored in the Initialized Data Segment.
If you read carefully enough, you may have noticed that I put the title of this section in quotes. That’s because that we usually use data segment to refer to the Initialized Data Segment, I used the term “Data Segment” in the title and in the illustration to fabric a general term which says “a segment containing data”, which enclose the Initialized Data Segment, the Uninitialized Data Segment, the heap and the stack. If you are familiar with the x86 assembly language, you would probably often say “start a data section with the .DATA directive” and “allocate a stack space with the .STACK directive”. But here I used the term “Data Segment” to refer to all the sections which are used to stock data.
Uninitialized Data Segment(未初始化的数据段)
The Uninitialized Data Segment, also referred to as BSS, contains all the global variables and static variables that are not initialized by the programmer. In our example, the global variable g_j
will be stored in the Uninitialized Data Segment.
Stack section and Heap area(栈区域和堆区域)
The stack section and the heap area, face to face, occupy the rest of the virtual memory space of the program. Usually, the stack section starts from the highest address of the virtual memory space and increases towards the lower address of the virtual memory space(栈地址由高地址向下生成). Contrarily, the heap area starts from the lowest address after the uninitialized data segment and increases towards the highest address of the virtual memory space(堆地址由低地址向上生成).
The stack section(栈区域) is used to store automatic variables (non-static local variables, 自动变量, 即非静态的局部变量, 注意是变量啊) and the calling environment each time a function is called. In the stack section, variable spaces are allocated dynamically(动态分配的) by moving up and down the stack pointer(栈指针) which indicates the top of the stack. When a variable goes out of scope, the stack pointer simply goes up and the variable space is no longer usable(出栈, 指针当然向上移动, 因为栈是向下生成的). This management manner makes the memory allocation in stack very fast. I will talk about the Memory allocation and variable scope in future posts.
The heap area(堆区域) is a space area often used for dynamic memory allocation and is managed by malloc
,realloc
and free
). The allocation of the space for a new variable in the heap is usually much slower than is in the stack because the heap may contain non-contiguous regions(造成了不连续的区域) caused by dynamic allocation and free of spaces. The heap area(堆区域)is shared by all threads of the program’s process(该程序的进程的所有线程). In contrary, each thread pocesses its own stack section(每一个线程都有自己栈区域).
| | | stack | (向下grow) | | +-----------+ | | | | | | +-----------+ | || heap |(向上grow)| |+-----------+ V| || data || |+-----------+ ^| text | | growth低地址 +-----------+
We can have as many .text and .data blocks in the source code as we want. The assembler will consolidate all the .text blocks into the text segment, and and all the .data blocks into the data segment.
EX1:Each subprogram in a MAL program should have its own .text block and its own .data block.
# Subprogram 1 .data # Variables for subprogram 1 .text # Subprogram body ret # Subprogram 2 .data # Variables for subprogram 2 .text # Subprogram body ret # Main program .data # Variables for main .text # Main body retWhy use multiple .text and .data sections in a program?
Variables should be defined along with the subprogram that uses them, for the sake of readability.
In the assembler's point of view, all variables are global. The notion of a variable's scope in C or Java is enforced by the compiler, not the hardware. Hardware knows only about memory addresses, and the compiler must keep track of which addresses are used by each subprogram.
In assembly language, it is the programmer's responsibility to ensure that each subprogram accesses only its own variables. Although the assembler will not prevent it, you must ensure that no subprogram accesses variables of another subprogram. If you do, your program is not modular, and your subprograms are not independent, portable modules as they should be.
A subprogram should never be considered part of the larger program it is used in. It should be considered an independent module that can be used within any other program without modification.
The size(1) command reports the sizes (in bytes) of the text, data, and bss segments. ( for more details please refer man page of size(1) )
1. Check the following simple C program
#include <stdio.h>
int
main(
void
)
{
return
0;
}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout [narendra@CentOS]$ size memory-layout text data bss dec hex filename 960 248 8 1216 4c0 memory-layout
2. Let us add one global variable in program, now check the size of bss (highlighted in red color).
#include <stdio.h>
int
global;
/* Uninitialized variable stored in bss*/
int
main(
void
)
{
return
0;
}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text data bss dec hex filename
960 248 12 1220 4c4 memory-layout
3. Let us add one static variable which is also stored in bss.
#include <stdio.h>
int
global;
/* Uninitialized variable stored in bss*/
int
main(
void
)
{
static
int
i;
/* Uninitialized static variable stored in bss */
return
0;
}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text data bss dec hex filename
960 248 16 1224 4c8 memory-layout
4. Let us initialize the static variable which will then be stored in Data Segment (DS)
#include <stdio.h>
int
global;
/* Uninitialized variable stored in bss*/
int
main(
void
)
{
static
int
i = 100;
/* Initialized static variable stored in DS*/
return
0;
}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text data bss dec hex filename
960 252 12 1224 4c8 memory-layout
5. Let us initialize the global variable which will then be stored in Data Segment (DS)
#include <stdio.h>
int
global = 10;
/* initialized global variable stored in DS*/
int
main(
void
)
{
static
int
i = 100;
/* Initialized static variable stored in DS*/
return
0;
}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text data bss dec hex filename
960 256 8 1224 4c8 memory-layout