9.5. Segments and the Program Header Table
程序头表仅用于可执行文件、共享库和core文件。它包含有关 ELF 文件中的段以及如何将它们加载到内存中的信息。如前所述, 段是在加载到内存时具有相同内存属性的 ELF 文件的连续部分。
The elements in the 32-bit program header table have the following structure:
32位程序头表中的元素具有以下结构:
typedef struct elf32_phdr{
Elf32_Word p_type;
Elf32_Off p_offset;
Elf32_Addr p_vaddr;
Elf32_Addr p_paddr;
Elf32_Word p_filesz;
Elf32_Word p_memsz;
Elf32_Word p_flags;
Elf32_Word p_align;
} Elf32_Phdr;
There are several types of segments, each with a different purpose and function. The valid values for p_type can be found in /usr/include/elf.h:
有几种类型的段, 每个段都有不同的目的和功能。p_type 的有效值可以在/usr/include/elf.h :
#define PT_NULL 0 /* Program header table entry unused */
#define PT_LOAD 1 /* Loadable program segment */
#define PT_DYNAMIC 2 /* Dynamic linking information */
#define PT_INTERP 3 /* Program interpreter */
#define PT_NOTE 4 /* Auxiliary information */
#define PT_SHLIB 5 /* Reserved */
#define PT_PHDR 6 /* Entry for header table itself */
#define PT_TLS 7 /* Thread-local storage segment */
#define PT_NUM 8 /* Number of defined types */
#define PT_LOOS 0x60000000 /* Start of OS-specific */
#define PT_GNU_EH_FRAME 0x6474e550 /* GCC .eh_frame_hdr segment */
#define PT_HIOS 0x6fffffff /* End of OS-specific */
#define PT_LOPROC 0x70000000 /* Start of processor-specific */
#define PT_HIPROC 0x7fffffff /* End of processor-specific */
The most “interesting” segment types are:
最 "有趣" 的段类型是:
- PT_LOAD
- PT_DYNAMIC
- PT_INTERP
The PT_LOAD type is a “loadable” segment meaning that it can be loaded into memory. Loadable segments contain everything needed by a program to run the executable code.
PT_LOAD 类型是一个 "可加载" 段, 意味着它可以加载到内存中。可加载段包含程序运行可执行代码时所需的所有内容。
The PT_DYNAMIC type is for dynamic linking information. This is used by run time linker to find all of the required shared libraries and to perform run time linking.
PT_DYNAMIC 类型用于动态链接信息。运行时链接器使用此方法查找所有必需的共享库并执行运行时链接。
The PT_INTERP segment points to the “.interp” section that lists the name of the program interpreter for an executable. The program interpreter is responsible for getting a program up and running under its own executable code. In other words, it takes care of the process initialization from the point of view of the dynamic loading, linking, and so on. See the section later in this chapter titled “Program Interpreter” for more information.
PT_INTERP 段指向列出可执行文件的程序解释器名称的 ". INTERP" 部分。程序解释器负责在自己的可执行代码中获取程序并运行。换言之, 它从动态加载、链接等的角度来处理过程初始化。有关详细信息, 请参阅本章后面的 "程序解释器" 部分。
Let’s take a closer look at the program header and segments for the executable foo:
让我们更仔细地看一下可执行 foo 的程序头和段:
penguin> readelf -l foo
Elf file type is EXEC (Executable file)
Entry point 0x8048540
There are 7 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4
INTERP 0x000114 0x08048114 0x08048114 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x0076c 0x0076c R E 0x1000
LOAD 0x00076c 0x0804976c 0x0804976c 0x001d4 0x001dc RW 0x1000
DYNAMIC 0x000810 0x08049810 0x08049810 0x000f0 0x000f0 RW 0x4
NOTE 0x000128 0x08048128 0x08048128 0x00020 0x00020 R 0x4
GNU_EH_FRAME 0x000748 0x08048748 0x08048748 0x00024 0x00024 R 0x4
<...>
The first program header (segment) is always the program header itself. The second program header entry describes an “INTERP” segment. This is a special segment that only includes the name of the program interpreter (more on this segment later in the chapter). The third and fourth program headers entry segment describes LOAD segments (more on these later). The fifth segment is the dynamic segment used for dynamic linking. The last two are special segments for vendor-specific information and exception handling and are not covered in this chapter.
第一个程序头 (段) 始终是程序头本身。第二个程序头条目描述了一个 "INTERP" 段。这是一个特殊的部分, 仅包括程序解释器的名称 (在本章后面详细介绍)。第三个和第四个程序头条目段描述了负载段 (后面将详细介绍)。第五段是动态链接所使用的动态段。最后两个是针对供应商特定信息和异常处理的特殊段, 本章不包括这些部分。
The offset of a segment (program header) refers to the actual offset in the ELF file. For example, we can find the program interpreter (INTERP) by looking at offset 0x114 in the executable foo:
段的偏移量 (程序头) 指的是 ELF 文件中的实际偏移量。例如, 通过查看可执行 foo 中的偏移 0x114, 我们可以找到程序解释器 (INTERP):
penquin> hexdump -C foo | less
<...>
00000100 48 87 04 08 24 00 00 00 24 00 00 00 04 00 00 00
|H...$...$.......|
00000110 04 00 00 00 2f 6c 69 62 2f 6c 64 2d 6c 69 6e 75 |..../lib/
ld-linu|
00000120 78 2e 73 6f 2e 32 00 00 04 00 00 00 10 00 00 00
|x.so.2..........|
00000130 01 00 00 00 47 4e 55 00 00 00 00 00 02 00 00 00
|....GNU.........|
At offset 0x114 (from the readelf output), the executable contains the name of the program interpreter: /lib/ld-linux.so.2.
在偏移 0x114 (从 readelf 输出中找到) 中, 可执行文件包含程序解释器的名称:/lib/ld-linux.so.2。
The virtual address (VirtAddr) is the address at which each segment would be loaded into memory. The physical address (PhysAddr) is only used for platforms that use actual physical addresses (that is, that do not use virtual memory). With some rare exceptions, Linux uses virtual addresses so that the “physical address” field can be ignored by Linux users. The file size field (FileSiz) is the size of the segment on disk. The memory size (MemSiz) is the size of the segment after it has been loaded into memory. The distinction between the file size and memory size is important. Some segments (such as the second LOAD segment listed in the output) are larger in memory than on disk. This is where very specific sections go, such as the .bss section, which does not occupy any space in the ELF file.
虚拟地址 (VirtAddr) 是将每个段加载到内存中的地址。物理地址 (PhysAddr) 仅用于使用实际物理地址 (即不使用虚拟内存) 的平台。由于一些罕见的例外情况, linux 使用虚拟地址, 以便 linux 用户可以忽略 "物理地址" 字段。文件大小字段 (FileSiz) 是磁盘上段的大小。内存大小 (MemSiz) 是将其加载到内存后段的大小。文件大小和内存大小之间的区别很重要。某些段 (如输出中列出的第二个加载段) 比磁盘上的内存大。这是非常具体的部分, 如.bss 部分, 它不占用 ELF 文件中的任何空间。
The .bss section, located at the end of the data segment (the second LOAD segment in the output above), contains uninitialized variables. Allocating space in the ELF file for variables that have not been initialized is pointless because there are no values to store. However, when the data segment is loaded into memory, the additional space is allocated, making room for the uninitialized variables. The uninitialized variables will always be initialized with zeros when a shared library or executable is loaded for the first time.
位于数据段末尾 (上面输出的第二个负载段) 的.bss 部分包含未初始化的变量。在 ELF 文件中为尚未初始化的变量分配空间是毫无意义的, 因为没有要存储的值。但是, 当数据段加载到内存中时, 将分配额外的空间, 为未初始化的变量腾出空间。当首次加载共享库或可执行文件时, 未初始化的变量将始终使用零初始化
The flags field (Flg) is the memory attributes used when the segment is loaded into memory. Lastly, the alignment field (Align) is the required byte alignment for the segment. An alignment of 0x1 means that there are no alignment requirements. An alignment of 0x4 means that the segment must start on an address that is on a 4-byte boundary.
标志字段 (Flg) 是将段加载到内存时使用的内存属性。最后, 对齐字段 (对齐) 是段所需的字节对齐方式。0x1 意味着没有对齐要求。0x4 意味着该段必须在4字节边界的地址上启动。
Of particular importance are the two LOAD segment addresses. In order, these are the text and data segments, explained in more detail next.
特别重要的是两个加载段地址。按顺序, 这些是文本段和数据段, 接下来将详细解释。
Every executable and shared library has one “text” segment (Note: segment, not section) and one “data” segment. The term “text” in this context refers to sections (contained in the text segment) that should have read-only access when loaded into memory. Examples of the sections that may be in the text segment include the ELF header, the text section (with executable code), string tables, and read only data symbols (the .rodata section). The data segment will contain sections that need to be written to during the process’ life time and hence will be writable in memory. Examples of the sections in the data segment include writeable data symbols that are initialized (the .data section), the global offset table, and the uninitialized memory (the .bss section).
每个可执行文件和共享库都有一个 "文本" 段 (备注: 段, 非节) 和一个 "数据" 段。此上下文中的术语 "文本" 是指在加载到内存时应具有只读访问权限的节 (包含在文本段中)。文本段中可能包含的节: ELF 头、文本节 (带有可执行代码)、字符串表和只读数据符号 (. rodata 节)。数据段将包含需要在进程的生命期内写入的部分, 因此将在内存中可写。数据段中的节包括初始化时可写数据符号 (. data 节)、全局偏移表和未初始化的内存 (. bss 部分)。
The text and data segments are both “load” segments because they are loaded into memory and used by a process.
文本和数据段都是 "加载" 段, 因为它们加载到内存中并由进程使用。
penguin readelf -l foo |egrep LOAD
LOAD 0x000000 0x08048000 0x08048000 0x0076c 0x0076c R E 0x1000
LOAD 0x00076c 0x0804976c 0x0804976c 0x001d4 0x001dc RW 0x1000
After running the program foo, the address space looks like this (Note: Only the first part of the output is displayed.):
运行程序 foo 后, 地址空间如下所示 (注意: 仅显示输出的第一部分.):
Code View: Scroll / Show All
penguin> head -4 /proc/4893/maps
08048000-08049000 r-xp 00000000 08:13 2044467 /home/wilding/src/Linuxbook/ELF/foo
08049000-0804a000 rw-p 00000000 08:13 2044467 /home/wilding/src/Linuxbook/ELF/foo
40000000-40012000 r-xp 00000000 08:13 1144740 /lib/ld-2.2.5.so
40012000-40013000 rw-p 00011000 08:13 1144740 /lib/ld-2.2.5.so
<...>
Notice the addresses of the two loaded segments for the executable foo (/home/wilding/src/Linuxbook/foo). The first segment starts exactly at the “virtual address” defined for the first load segment from the program headers. The second load segment is loaded at a different address than what is specified in the ELF file, foo. The reason for this is the alignment restrictions for the second segment, which are 0x1000 or on a 4KB boundary. This boundary is due to the underlying hardware and virtual address translation mechanisms. The in-memory segment starts at a lower memory address that is aligned according to the alignment restrictions. The actual important contents are still located at the original address of 0x0804976c.
请注意可执行文件 foo (/home/wilding/src/Linuxbook/foo) 的两个已加载段的地址。第一段恰好在从程序头上为第一个加载段定义的 "虚拟地址" 开始。第二个加载段加载在不同的地址, 而不是在foo ELF 文件中指定的。原因是第二段的对齐限制 (0x1000 或4KB 边界)。此边界是由于底层硬件和虚拟地址转换机制造成的。内存段从较低的内存地址开始, 根据对齐方式的限制对齐。实际重要内容仍然位于原始地址0x0804976c。
We can see the system calls involved in loading a library into memory by using the strace command:
通过使用 strace 命令, 我们可以看到将库加载到内存中所涉及的系统调用:
Code View: Scroll / Show All
penguin> strace -o foo.st foo
This is a printf format string in baz
This is a printf format string in main
penguin> less foo.st
<...>
open("./i686/mmx/libfoo.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("./i686/libfoo.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("./mmx/libfoo.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("./libfoo.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\7\0"..., 1024) = 1024
fstat64(3, {st_mode=S_IFREG|0755, st_size=7301, ...}) = 0
getcwd("/home/wilding/src/Linuxbook/ELF", 128) = 32
mmap2(NULL, 7556, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40014000
mprotect(0x40015000, 3460, PROT_NONE) = 0
mmap2(0x40015000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED,
3, 0) = 0x40015000
close(3) = 0
<...>
The strace output shows several open system calls in the effort to locate the library called libfoo.so. The first 1024 bytes of the file are read, and then the contents are mapped into memory using mmap2. Notice that the mmap is loaded with the contents of the file with memory attributes, READ and EXEC, but not write. Later, another memory map is used to load in the data segment with memory attributes, READ and WRITE.
strace 输出显示了几个open系统调用, 试图找到名为 libfoo.so 的库。读取文件的前1024个字节, 然后使用 mmap2 将内容映射到内存中。请注意, mmap 加载的文件内容具有内存属性:读取和执行, 但不写入。之后, 另一个内存映射用于在数据段中加载具有读写属性的内存。
Segments are important for loading ELF files into memory, but the real functionality and contents are in the ELF sections. Let’s take a closer look at ELF sections next.
段对于将 ELF 文件加载到内存很重要, 但真正的功能和内容在 ELF 节。接下来, 让我们仔细看看ELF节。