CS:APP第七章知识总结(链接的详细流程与报错常见原因、ELF格式、static关键字、静/动态库生成使用、GOT/PLT、linker库函数wrapper)

本文深入探讨了链接器的工作原理,特别是它如何处理不同.o文件中的全局变量和函数。介绍了ELF文件格式,包括.text、.data、.bss等节区,并解释了.reloc.text和.reloc.data的用途。还讨论了静态库和动态库的区别,以及动态链接器的角色,如GOT和PLT在动态链接过程中的作用。最后提到了动态链接库的使用和interpositioning技术。
摘要由CSDN通过智能技术生成

linker的作用是把各个.o文件的不同的数据块和代码块组合在一起(relocate),确保运行时能找到。

relocatable的含义:可以被relocate,被linker处理之前的.o文件都是relocatable的。

ELF,Executable and Linkable Format
在这里插入图片描述

.text 机器码

.rodata Read-only data such as the format strings in printf statements, and
jump tables for switch statements.

.data Initialized global and static C variables. Local C variables are maintained
at run time on the stack and do not appear in either the .data or .bss
sections.

.bss Uninitialized global and static C variables, along with any global or static variables that are initialized to zero. This section occupies no actual space in the object file; it is merely a placeholder. Object file formats distinguish between initialized and uninitialized variables for space efficiency: uninitialized variables do not have to occupy any actual disk space in the object file. At run time, these variables are allocated in memory with an initial value of zero.

.symtab 符号表。unlike the symbol table inside a compiler, the .symtab symbol table does not contain entries for local variables. Local symbols that are defined and referenced exclusively by modulem. These correspond to static C functions and global variables that are defined with the
static attribute. These symbols are visible anywhere within module m, but cannot be referenced by other modules.
static修饰的变量会被放在数据段,也会被放进这个符号表内。

.rel.text A list of locations in the .text section that will need to be modified when the linker combines this object file with others. In general, any instruction that calls an external function or references a global variable will need to be modified. On the other hand, instructions that call local functions do not need to be modified.
即代码段的relocation entry。

.rel.data In general, any initialized global variable whose initial value is the address of a global variable or externally defined function will need to be modified.
即数据段的relocation entry。

.debug A debugging symbol table with entries for local variables and typedefs defined in the program, global variables defined and referenced in the program, and the original C source file. It is only present if the compiler driver is invoked with the -g option.

.line A mapping between line numbers in the original C source program and machine code instructions in the .text section. It is only present if the compiler driver is invoked with the -g option.

.strtab A string table for the symbol tables in the .symtab and .debug sections and for the section names in the section headers. A string table is a sequence of null-terminated character strings.

C programmers use the static attribute to hide variable and function declarations inside the source file. Similarly, any global variable or function declared without the static attribute is public and can be accessed by any other module. It is good programming practice to protect your variables and functions with the static attribute wherever possible.
面向对象里,static修饰的变量或函数是属于类而非对象的。

linker在遇到不同文件中有相同名称的全局变量时的处理规则:
在这里插入图片描述
strong symbols如main函数,初始化后的全局变量。weak symbols如未初始化的全局变量。rule2和rule3虽然不会报错,但反而更会造成问题,就像是隐藏的炸弹。
When in doubt, invoke the linker with a flag such as the gcc -fno-common flag, which triggers an error if it encounters multiply defined global symbols. Or use the -Werror option, which turns all warnings into errors.

提供标准库函数的方法:
编译器直接识别库函数关键字,如pascal;缺点是编译器需要随着库函数频繁更新;
将所有标准库函数放在同一个relocatable object module中;缺点是可执行文件会很大、module更新后需要重新编译整个module;
将每个标准库函数放在单独一个relocatable object module中;缺点是加编译选项时要输入很多字符串。
On Linux systems, static libraries are stored on disk in a particular file format known as an archive. An archive is a collection of concatenated relocatable object files, with a header that describes the size and location of each member object file. Archive filenames are denoted with the .a suffix.
这样,之前中文提到的问题就都解决了,加编译选项时只需要写archive的名字,但实际被链接进来的不是整个archive,而仅仅是被用到的那些module。

ld是GNU的链接器。
ar可以制作archive。.a文件的本质就是.o文件(relocatable object module)的集合。

以下两条指令是等价的:
gcc -static -o prog2c main2.o ./libvector.a
gcc -static -o prog2c main2.o -L. -lvector
The -static argument tells the compiler driver that the linker should build a fully linked executable object file that can be loaded into memory and run without any further linking at load time. The -lvector argument is a shorthand for libvector.a, and the -L. argument tells the linker to look for libvector.a in the current directory.

链接分为两步,第一步是symbol resolution,确保符号都已经在链接器的某一个输入文件中定义(如果这一步失败,就会报symbol undefined 的错误);第二步是relocation。

第一步symbol resolution的操作详细流程
在这里插入图片描述

上面的流程中,需要注意的是,“从左至右”。这会有顺序的问题,可能会导致链接错误。应做如下分析和处理:
suppose foo.c calls functions in libx.a and libz.a that call functions in liby.a. Then libx.a and libz.a must precede liby.a on the command line:
linux> gcc foo.c libx.a libz.a liby.a
如果同一个.o或.a反复出现,原因也如上,发生了交叉引用。

relocation到底是在干什么?定义是,融合各个输入的模块,赋予各个部分一个run-time address。
“融合”的一个例子如:the .data sections from the input modules are all merged into one section that will become the .data section for the output executable object file.
先调整各个模块中.text和.data的位置,然后调整.text中所有引用到这些符号的地方。
在汇编获得机器码这一步,汇编器并不知道运行时会是什么地址。所以需要预留relocation entry来指引之后的计算。具体的计算方法我没仔细看,总之融合之后肯定是要重新计算一下。

executable object file 和 relocatable object file的格式略有不同。
在这里插入图片描述
The ELF header describes the overall format of the file. It also includes the program’s entry point, which is the address of the first instruction to execute when the program runs.The .text, .rodata, and .data sections are similar to those in a relocatable object file, except that these sections have been relocated to their eventual run-time memory addresses. The .init section defines a small function,
called _init, that will be called by the program’s initialization code. Since the executable is fully linked (relocated), it needs no .rel sections.

.so shared objects
.dll dynamic link libraries
静态链接的问题是,很多函数都会用scanf printf等,每个程序的代码段里都有,这可以认为是一种浪费。所以希望能共享。
动态链接库,因为想share,所以才动态。
从.so格式的全称可以看出,动态库的本质和静态库一样,都是object。
生成.so文件的指令示例:gcc -shared -fpic -o libvector.so addvec.c multvec.c
第一个参数表示生成shared objects,第二个参数表示position independent code。之后是输出文件和输入文件。
Code that can be loaded without needing any relocation is known as position independent code (PIC). Users direct GNU compilation systems to generate PIC code with the -fpic option to gcc. Shared libraries must always be compiled with this option.
全局变量地址解析需要用到GOT。GOT在数据段的开头,每个shared module caller都有自己的GOT,此外还有自己的read/write data段,除此之外,shared module的其它部分都可以共享,从而尽可能地节省了空间(如果不使用GOT的话,那么每个caller都需要复制一份代码段),并且更加安全(代码段只读)。GOT主要利用了如下事实:不管module被加载到内存的哪个位置,其数据段中的任一变量和代码段的任一指令的距离都是一个常数,这个常数在链接时段融合之后就确定了。把指令中需要修改(relocate)的部分分离出来,跟数据部分放在一起,这样指令部分就可以保持不变,而数据部分则在每个进程拥有一个副本。
至于函数地址解析,需要同时借助PLT(procedure linkage table)和GOT,得先从lazy-binding说起。因为caller只会使用.so export的一部分函数,所以lazy-binding,调用时再解析函数的地址。PLT在代码段,每个caller都有自己的PLT。PLT的首项总是dynamic linker中的解析器。
在这里插入图片描述
在这里插入图片描述
总结是,只有GOT中出现绝对地址,而PLT和代码都是相对寻址,因此代码段是PIC。
逻辑顺序是,为了省空间提出动态链接,动态链接库肯定是PIC,实现PIC需要能以相对地址的方式解析变量和函数,因此需要GOT和PLT。
在这里插入图片描述
dynamic linker所做的工作如下:
在这里插入图片描述
在这些工作完成之后,shared libraries的位置不再改变,直至程序执行完毕。

动态库的使用,linux提供了一些接口,可以看看dlfcn.h文件,原书P729有接口说明和demo。
使用动态库的链接指令:gcc -rdynamic -o prog2r dll.c -ldl

Linux linkers support a powerful technique, called library interpositioning, that allows you to intercept calls to shared library functions and execute your own code instead. Using interpositioning, you could trace the number of times a particular library function is called, validate and trace its input and output values, or even replace it with a completely different implementation.
Here’s the basic idea: Given some target function to be interposed on, you
create a wrapper function whose prototype is identical to the target function. Using
some particular interpositioning mechanism, you then trick the system into calling
the wrapper function instead of the target function. The wrapper function typically
executes its own logic, then calls the target function and passes its return value
back to the caller.
Interpositioning can occur at compile time, link time, or run time as the
program is being loaded and executed.
具体的操作参见原书7.13节。需要用的时候再看看吧。wrapper的做法类似插桩。

书中总结了与object文件有关的GNU binutils工具:
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值