本文源码基于linux3.10.
1 elf文件的组成
elf文件可以是可执行程序,可以是目标文件(.o),也可以是动态库,也可以是静态库。可执行文件根据编译方式的不同,又分为静态编译和动态编译。
静态链接和动态链接的elf可执行文件在系统加载的时候,当然会有一定的区别。
静态链接的,在装入/启动其运行时无需装入函数库映像、也无需进行动态连接;动态连接,需要在装入/启动其运行时同时装入函数库映像并进行动态链接。
Linux内核既支持静态链接的ELF映像,也支持动态链接的ELF映像,而且装入/启动ELF映像必需由内核完成,而动态连接的实现则既可以在内核中完成,也可在用户空间完成。
GNU把对于动态链接ELF映像的支持作了分工:
把ELF映像的装入/启动在Linux内核中;而把动态链接的实现放在用户空间(glibc),并为此提供一个称为”解释器”的工具软件,而解释器的装入/启动也由内核负责。
每一个elf文件都有一个elf文件的头,位于elf文件的最前面,用来描述该elf文件,对于32位的系统,该文件是如下结构:
typedef struct elf32_hdr{
unsigned char e_ident[EI_NIDENT];
Elf32_Half e_type;//文件类型
Elf32_Half e_machine;//机器类型
Elf32_Word e_version;//版本号
Elf32_Addr e_entry; /* Entry point *///elf文件入口虚拟地址
Elf32_Off e_phoff; //segment 表在文件中的偏移,segment用于加载视图
Elf32_Off e_shoff;//section表在文件中的偏移,section用于链接视图
Elf32_Word e_flags;
Elf32_Half e_ehsize;//elf头的大小
Elf32_Half e_phentsize;//segment 头的大小
Elf32_Half e_phnum;//segment的数量
Elf32_Half e_shentsize;//section 描述符的大小
Elf32_Half e_shnum;//section的数量
Elf32_Half e_shstrndx;//字符串表在section中的索引
} Elf32_Ehdr;
该文件头为52个字节,文件头里面对于可执行文件来说比较重要的信息是segment相关的信息。程序执行的时候需要把相关的segment加载进内存。一个segment 可以包含多个section,可以说segment是多个section的集合,section主要在链接的时候进行使用。而segment把多个相同属性的section放在一起,便于程序加载。
#include <stdio.h>
int sub(int a,int b)
{
return a-b;
}
int add(int a, int b)
{
return sub(a,b);
}
int main()
{
int a = 1;
int b = 2;
int res = add(a,b);
while(1){
sleep(1);
}
return 0;
}
这边写了一个简单的程序,把其编译成可执行文件,用readlf查看其文件头,可以看到如下信息:
lu@ubuntu:~/tmp/test_program$ arm-linux-readelf -h test
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x8384
Start of program headers: 52 (bytes into file)
Start of section headers: 2220 (bytes into file)
Flags: 0x5000002, has entry point, Version5 EABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 8
Size of section headers: 40 (bytes)
Number of section headers: 31
Section header string table index: 28
可以看到,程序入口地址为0x8384,program 头就是用来描述segment的,可执行文件会有program头,而目标文件可以不包含program头。
对于elf文件格式的可执行文件,在elf头后面,就是program 头了,用来描述segment信息。从上面Start of program headers中就可以看到,program头的偏移为52字节,而elf头刚好是52个字节。
再对上面的test程序,看一下其segment的具体情况:
lu@ubuntu:~/tmp/test_program$ arm-linux-readelf -l test
Elf file type is EXEC (Executable file)
Entry point 0x8384
There are 8 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
EXIDX 0x000578 0x00008578 0x00008578 0x00068 0x00068 R 0x4
PHDR 0x000034 0x00008034 0x00008034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x00008134 0x00008134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.3]
LOAD 0x000000 0x00008000 0x00008000 0x005e4 0x005e4 R E 0x8000
LOAD 0x0005e4 0x000105e4 0x000105e4 0x00124 0x00128 RW 0x8000
DYNAMIC 0x0005f0 0x000105f0 0x000105f0 0x000f0 0x000f0 RW 0x4
NOTE 0x000148 0x00008148 0x00008148 0x00020 0x00020 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00 .ARM.exidx
01
02 .interp
03 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.extab .ARM.exidx .eh_frame
04 .init_array .fini_array .jcr .dynamic .got .data .bss
05 .dynamic
06 .note.ABI-tag
07
elf头里面也说了,总共有8个program header,readle -l 命令显示,确实是有8个。并且也证实了我们前面所说的,不同的section根据其属性类型,会打包到不同的segment中。03 segment主要是代码段相关的,属性为可执行,而04 segment则是数据相关的,类型为可读可写。
下面来分析一下program 头的结构,操作系统会利用这些头信息来把程序加载到虚拟内存中。
typedef struct elf32_phdr{
Elf32_Word p_type; //类型,通常在加载的时候,操作系统只关注LOAD类型,当然动态链接的话,还需要用到INTERP类型来描述动态解释器
Elf32_Off p_offset;//segment在文件中的偏移
Elf32_Addr p_vaddr; //segment加载的虚拟地址
Elf32_Addr p_paddr;//segment加载的物理地址,通常和p_paddr相等
Elf32_Word p_filesz;//segment在磁盘上的大小,如果包含bss段,通常不包括bss的大小
Elf32_Word p_memsz;//segment在磁盘上的大小,如果包含bss段,通常不包括bss的大小
Elf32_Word p_flags;//segment的权限,通常有R,W,X
Elf32_Word p_align;
} Elf32_Phdr;
再看我们的test可执行文件,包含有LOAD和INTERP类型的segment,因为该程序是动态链接的,所以需要INTERP类型的segment,我们再来以静态链接的方式来编译一下该程序,看一下segment会有什么区别:
Elf file type is EXEC (Executable file)
Entry point 0x8130
There are 6 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
EXIDX 0x071798 0x00079798 0x00079798 0x01598 0x01598 R 0x4
LOAD 0x000000 0x00008000 0x00008000 0x72db0 0x72db0 R E 0x8000
LOAD 0x072db0 0x00082db0 0x00082db0 0x007a4 0x02034 RW 0x8000
NOTE 0x0000f4 0x000080f4 0x000080f4 0x00020 0x00020 R 0x4
TLS 0x072db0 0x00082db0 0x00082db0 0x00010 0x00028 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00 .ARM.exidx
01 .note.ABI-tag .init .text __libc_freeres_fn .fini .rodata __libc_subfreeres __libc_atexit .ARM.extab .ARM.exidx .eh_frame
02 .tdata .init_array .fini_array .jcr .data.rel.ro .got .data .bss __libc_freeres_ptrs
03 .note.ABI-tag
04 .tdata .tbss
05
可以看到,静态链接以后,DYNAMIC和INTERP segment已经不需要了,后面源码分析的时候,我们还会看到静态链接和动态链接的源码差异。
2 elf可执行程序加载do_execve系统调用
通常在linux系统中,执行一个可执行程序,通常会利用fork系统调用,创建一个新的进程,然后利用execve系统调用,在新创建的进程中加载新的可执行文件。对于fork以后,进程分叉的原理,可以参考这篇文章,这边不做过多描述:
https://blog.csdn.net/oqqYuJi12345678/article/details/102828714
execve系统调用函数,对于内核空间中,一般是do_execve函数:
int do_execve(const char *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp)
{
struct user_arg_ptr argv = { .ptr.native = __argv };
struct user_arg_ptr envp = { .ptr.native = __envp };
return do_execve_common(filename, argv, envp);
}
传入的参数通常是可执行文件名,执行参数,以及环境变量,下面展开分析do_execve_common
static int do_execve_common(const char *filename,
struct user_arg_ptr argv,
struct user_arg_ptr envp)
{
struct linux_binprm *bprm;
struct file *file;
struct files_struct *displaced;
bool clear_in_exec;
int retval;
const struct cred *cred = current_cred();
/*
* We move the actual failure in case of RLIMIT_NPROC excess from
* set*uid() to execve() because too many poorly written programs
* don't check setuid() return code. Here we additionally recheck
* whether NPROC limit is still exceeded.
*/
if ((current->flags & PF_NPROC_EXCEEDED) &&
atomic_read(&cred->user->processes) > rlimit(RLIMIT_NPROC)) {
retval = -EAGAIN;
goto out_ret;
}
/* We're below the limit (still or again), so we don't want to make
* further execve() calls fail. */
current->flags &= ~PF_NPROC_EXCEEDED;
retval = unshare_files(&displaced);
if (retval){
goto out_ret;
}
retval = -ENOMEM;
bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
if (!bprm){
goto out_files;
}
retval = prepare_bprm_creds(bprm);
if (retval)
{
goto out_free;
}
retval = check_unsafe_exec(bprm);
if (retval < 0)
{
goto out_free;
}
clear_in_exec = retval;
current->in_execve = 1;
//打开该可执行文件
file = open_exec(filename);
retval = PTR_ERR(file);
if (IS_ERR(file)){
goto out_unmark;
}
//调用sched_exec()找到最小负载的CPU,用来执行该二进制文件
sched_exec();
bprm->file = file;
bprm->filename = filename;
bprm->interp = filename;
--------------------------------------------------------------------------(1)
retval = bprm_mm_init(bprm);
if (retval)
{
goto out_file;
}
bprm->argc = count(argv, MAX_ARG_STRINGS);
if ((retval = bprm->argc) < 0)
{
goto out;
}
bprm->envc = count(envp, MAX_ARG_STRINGS);
if ((retval = bprm->envc) < 0)
{
goto out;
}
-----------------------------------------------------------------------------(2)
retval = prepare_binprm(bprm);
if (retval < 0)
{
goto out;
}
retval = copy_strings_kernel(1, &bprm->filename, bprm);
if (retval < 0)
{
goto out;
}
bprm->exec = bprm->p;
retval = copy_strings(bprm->envc, envp, bprm);
if (retval < 0)
{
goto out;
}
retval = copy_strings(bprm->argc, argv, bprm);
if (retval < 0)
{
goto out;
}
-------------------------------------------------------------------------(3)
retval = search_binary_handler(bprm);
if (retval < 0){
goto out;
}
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
acct_update_integrals(current);
free_bprm(bprm);
if (displaced)
put_files_struct(displaced);
return retval;
out:
if (bprm->mm) {
acct_arg_size(bprm, 0);
mmput(bprm->mm);
}
out_file:
if (bprm->file) {
allow_write_access(bprm->file);
fput(bprm->file);
}
out_unmark:
if (clear_in_exec)
current->fs->in_exec = 0;
current->in_execve = 0;
out_free:
free_bprm(bprm);
out_files:
if (displaced)
reset_files_struct(displaced);
out_ret:
return retval;
}
(1)bprm_mm_init主要完成分配页表,拷贝内核空间,为用户栈分配虚拟空间:
static int bprm_mm_init(struct linux_binprm *bprm)
{
int err;
struct mm_struct *mm = NULL;
//分配mm_struct结构,并为其拷贝内核页表
bprm->mm = mm = mm_alloc();
err = -ENOMEM;
if (!mm)
goto err;
err = init_new_context(current, mm);
if (err)
goto err;
//为用户栈分配虚拟内存
err = __bprm_mm_init(bprm);
。。。。。。。。。。。。。。。。。。
}
(2)prepare_binprm主要完成elf文件头的读取,为后面处理elf文件头做准备
int prepare_binprm(struct linux_binprm *bprm)
{
umode_t mode;
struct inode * inode = file_inode(bprm->file);
int retval;
mode = inode->i_mode;
。。。。。。。。。。。。。。。。。。。。。。。
bprm->cred_prepared = 1;
memset(bprm->buf, 0, BINPRM_BUF_SIZE);
return kernel_read(bprm->file, 0, bprm->buf, BINPRM_BUF_SIZE);
}
(3)search_binary_handler()函数对linux_binprm的formats链表进行扫描,并尝试每个load_binary函数,如果成功加载了文件的执行格式,对formats的扫描终止。
linux支持其他不同格式的可执行程序, 在这种方式下, linux能运行其他操作系统所编译的程序, 如MS-DOS程序, 活BSD Unix的COFF可执行格式, 因此linux内核用struct linux_binfmt来描述各种可执行程序。
linux内核对所支持的每种可执行的程序类型都有个struct linux_binfmt的数据结构,定义如下:
struct linux_binfmt {
struct list_head lh;
struct module *module;
int (*load_binary)(struct linux_binprm *);
int (*load_shlib)(struct file *);
int (*core_dump)(struct coredump_params *cprm);
unsigned long min_coredump; /* minimal dump size */
};
其提供了3种方法来加载和执行可执行程序
load_binary
通过读存放在可执行文件中的信息为当前进程建立一个新的执行环境
load_shlib
用于动态的把一个共享库捆绑到一个已经在运行的进程, 这是由uselib()系统调用激活的
core_dump
在名为core的文件中, 存放当前进程的执行上下文. 这个文件通常是在进程接收到一个缺省操作为”dump”的信号时被创建的, 其格式取决于被执行程序的可执行类型
所有的linux_binfmt对象都处于一个链表中, 第一个元素的地址存放在formats变量中, 可以通过调用register_binfmt()和unregister_binfmt()函数在链表中插入和删除元素, 在系统启动期间, 为每个编译进内核的可执行格式都执行registre_fmt()函数. 当实现了一个新的可执行格式的模块正被装载时, 也执行这个函数, 当模块被卸载时, 执行unregister_binfmt()函数。
int search_binary_handler(struct linux_binprm *bprm)
{
unsigned int depth = bprm->recursion_depth;
int try,retval;
struct linux_binfmt *fmt;
pid_t old_pid, old_vpid;
。。。。。。。。。。。。。。。
for (try=0; try<2; try++) {
read_lock(&binfmt_lock);
//遍历注册的linux_binfmt链表,
list_for_each_entry(fmt, &formats, lh) {
int (*fn)(struct linux_binprm *) = fmt->load_binary;
if (!fn)
continue;
if (!try_module_get(fmt->module))
continue;
read_unlock(&binfmt_lock);
bprm->recursion_depth = depth + 1;
//调用特定format的加载函数,进行处理
retval = fn(bprm);
bprm->recursion_depth = depth;
if (retval >= 0) {
if (depth == 0) {
trace_sched_process_exec(current, old_pid, bprm);
ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
}
put_binfmt(fmt);
allow_write_access(bprm->file);
if (bprm->file)
fput(bprm->file);
bprm->file = NULL;
current->did_exec = 1;
proc_exec_connector(current);
return retval;
}
read_lock(&binfmt_lock);
put_binfmt(fmt);
if (retval != -ENOEXEC || bprm->mm == NULL){
break;
}
if (!bprm->file) {
read_unlock(&binfmt_lock);
{
return retval;
}
}
}
read_unlock(&binfmt_lock);
。。。。。。。。。。。。。。。。。。。。。。
}
return retval;
}
对于elf格式的文件,其加载函数为load_elf_binary,也是我们本章的重点
3 elf可执行文件具体加载分析
分析整个elf文件的加载过程,就是分析load_elf_binary函数的过程,下面先把该函数源码贴上,然后一步一步来分析该函数:
static int load_elf_binary(struct linux_binprm *bprm)
{
struct file *interpreter = NULL; /* to shut gcc up */
unsigned long load_addr = 0, load_bias = 0;
int load_addr_set = 0;
char * elf_interpreter = NULL;
unsigned long error;
struct elf_phdr *elf_ppnt, *elf_phdata;
unsigned long elf_bss, elf_brk;
int retval, i;
unsigned int size;
unsigned long elf_entry;
unsigned long interp_load_addr = 0;
unsigned long start_code, end_code, start_data, end_data;
unsigned long reloc_func_desc __maybe_unused = 0;
int executable_stack = EXSTACK_DEFAULT;
unsigned long def_flags = 0;
struct pt_regs *regs = current_pt_regs();
struct {
struct elfhdr elf_ex;
struct elfhdr interp_elf_ex;
} *loc;
loc = kmalloc(sizeof(*loc), GFP_KERNEL);
if (!loc) {
retval = -ENOMEM;
goto out_ret;
}
/* Get the exec-header */
//前面从exe文件中读取了头128个字节的文件,前52个字节是elf的文件
loc->elf_ex = *((struct elfhdr *)bprm->buf);
retval = -ENOEXEC;
/* First of all, some simple consistency checks */
//检查魔数是否相同
if (memcmp(loc->elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
goto out;
if (loc->elf_ex.e_type != ET_EXEC && loc->elf_ex.e_type != ET_DYN)
goto out;
if (!elf_check_arch(&loc->elf_ex))
goto out;
if (!bprm->file->f_op || !bprm->file->f_op->mmap)
goto out;
/* Now read in all of the header information */
if (loc->elf_ex.e_phentsize != sizeof(struct elf_phdr))
goto out;
//如果可执行程序不包含program头,直接退出,前面也说过了,可执行文件必须要包含program头
if (loc->elf_ex.e_phnum < 1 ||
loc->elf_ex.e_phnum > 65536U / sizeof(struct elf_phdr))
goto out;
//统计所有的program 头的大小
size = loc->elf_ex.e_phnum * sizeof(struct elf_phdr);
retval = -ENOMEM;
elf_phdata = kmalloc(size, GFP_KERNEL);
if (!elf_phdata)
goto out;
//把program 头读取到elf_phdata中,为加载程序做准备
retval = kernel_read(bprm->file, loc->elf_ex.e_phoff,
(char *)elf_phdata, size);
if (retval != size) {
if (retval >= 0)
retval = -EIO;
goto out_free_ph;
}
elf_ppnt = elf_phdata;
elf_bss = 0;
elf_brk = 0;
start_code = ~0UL;
end_code = 0;
start_data = 0;
end_data = 0;
//遍历所有的program头
for (i = 0; i < loc->elf_ex.e_phnum; i++) {
//elf文件如果是动态链接的,会有PT_INTERP segment,需要加载其解释器
if (elf_ppnt->p_type == PT_INTERP) {
/* This is the program interpreter used for
* shared libraries - for now assume that this
* is an a.out format binary
*/
retval = -ENOEXEC;
if (elf_ppnt->p_filesz > PATH_MAX ||
elf_ppnt->p_filesz < 2)
{
goto out_free_ph;
}
retval = -ENOMEM;
//PT_INTERP segment中,存放的是解释器的名字
elf_interpreter = kmalloc(elf_ppnt->p_filesz,
GFP_KERNEL);
if (!elf_interpreter){
goto out_free_ph;
}
-------------------------------------------------------------------------------(1)
//把p_offset是segment在elf文件中的内容,读取该信息,其实就是读取解释器的名字
retval = kernel_read(bprm->file, elf_ppnt->p_offset,
elf_interpreter,
elf_ppnt->p_filesz);
if (retval != elf_ppnt->p_filesz) {
if (retval >= 0)
retval = -EIO;
goto out_free_interp;
}
/* make sure path is NULL terminated */
retval = -ENOEXEC;
if (elf_interpreter[elf_ppnt->p_filesz - 1] != '\0'){
goto out_free_interp;
}
//通过文件名,打开动态解释器
interpreter = open_exec(elf_interpreter);
retval = PTR_ERR(interpreter);
if (IS_ERR(interpreter)){
goto out_free_interp;
}
/*
* If the binary is not readable then enforce
* mm->dumpable = 0 regardless of the interpreter's
* permissions.
*/
would_dump(bprm, interpreter);
//因为elf 解释器一般是个elf 动态库,也是个elf文件,
//所以读取该elf文件的头信息
retval = kernel_read(interpreter, 0, bprm->buf,
BINPRM_BUF_SIZE);
if (retval != BINPRM_BUF_SIZE) {
if (retval >= 0)
retval = -EIO;
goto out_free_dentry;
}
/* Get the exec headers */
//记录动态解释器的elf文件头
loc->interp_elf_ex = *((struct elfhdr *)bprm->buf);
break;
}
elf_ppnt++;
}
elf_ppnt = elf_phdata;
for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++)
if (elf_ppnt->p_type == PT_GNU_STACK) {
if (elf_ppnt->p_flags & PF_X)
executable_stack = EXSTACK_ENABLE_X;
else
executable_stack = EXSTACK_DISABLE_X;
break;
}
/* Some simple consistency checks for the interpreter */
if (elf_interpreter) {
retval = -ELIBBAD;
/* Not an ELF interpreter */
if (memcmp(loc->interp_elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
goto out_free_dentry;
/* Verify the interpreter has a valid arch */
if (!elf_check_arch(&loc->interp_elf_ex)){
goto out_free_dentry;
}
}
/* Flush all traces of the currently running executable */
//对该进程里面的其他线程,以及内核页表,做一些清理工作,在此清除掉了父进程的所有相关代码
-----------------------------------------------------------------------(2)
retval = flush_old_exec(bprm);
if (retval)
{
goto out_free_dentry;
}
/* OK, This is the point of no return */
current->mm->def_flags = def_flags;
/* Do this immediately, since STACK_TOP as used in setup_arg_pages
may depend on the personality. */
SET_PERSONALITY(loc->elf_ex);
if (elf_read_implies_exec(loc->elf_ex, executable_stack))
current->personality |= READ_IMPLIES_EXEC;
if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
current->flags |= PF_RANDOMIZE;
//继续做一些清除工作,比如关闭父进程打开的文件,清除父进程注册的信号
setup_new_exec(bprm);
/* Do this so that we can load the interpreter, if need be. We will
change some of these later */
current->mm->free_area_cache = current->mm->mmap_base;
current->mm->cached_hole_size = 0;
//把用户栈往下移,栈区上方预留空间用来存放参数
retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
executable_stack);
if (retval < 0) {
send_sig(SIGKILL, current, 0);
goto out_free_dentry;
}
current->mm->start_stack = bprm->p;
/* Now we do a little grungy work by mmapping the ELF image into
the correct location in memory. */
//elf可执行文件的加载,在这边处理,遍历elf文件的程序表头
for(i = 0, elf_ppnt = elf_phdata;
i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
int elf_prot = 0, elf_flags;
unsigned long k, vaddr;
//只对类型为PT_LOAD的segment进程处理,elf文件只有类型为PT_LOAD才需要加载进内存
if (elf_ppnt->p_type != PT_LOAD)
continue;
if (unlikely (elf_brk > elf_bss)) {
unsigned long nbyte;
/* There was a PT_LOAD segment with p_memsz > p_filesz
before this one. Map anonymous pages, if needed,
and clear the area. */
retval = set_brk(elf_bss + load_bias,
elf_brk + load_bias);
if (retval) {
send_sig(SIGKILL, current, 0);
goto out_free_dentry;
}
nbyte = ELF_PAGEOFFSET(elf_bss);
if (nbyte) {
nbyte = ELF_MIN_ALIGN - nbyte;
if (nbyte > elf_brk - elf_bss)
nbyte = elf_brk - elf_bss;
if (clear_user((void __user *)elf_bss +
load_bias, nbyte)) {
/*
* This bss-zeroing can fail if the ELF
* file specifies odd protections. So
* we don't check the return value
*/
}
}
}
if (elf_ppnt->p_flags & PF_R)
elf_prot |= PROT_READ;
if (elf_ppnt->p_flags & PF_W)
elf_prot |= PROT_WRITE;
if (elf_ppnt->p_flags & PF_X)
elf_prot |= PROT_EXEC;
elf_flags = MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE;
//elf文件加载的虚拟地址
vaddr = elf_ppnt->p_vaddr;
if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
elf_flags |= MAP_FIXED;
} else if (loc->elf_ex.e_type == ET_DYN) {
/* Try and get dynamic programs out of the way of the
* default mmap base, as well as whatever program they
* might try to exec. This is because the brk will
* follow the loader, and is not movable. */
#ifdef CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE
/* Memory randomization might have been switched off
* in runtime via sysctl or explicit setting of
* personality flags.
* If that is the case, retain the original non-zero
* load_bias value in order to establish proper
* non-randomized mappings.
*/
if (current->flags & PF_RANDOMIZE)
load_bias = 0;
else
load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
#else
load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
#endif
}
//建立用户空间虚拟地址空间与目标映像文件中某个连续区间之间的映射,其返回值就是实际映射的起始地址。
//一般load_bias + vaddr不为0,就是指定映射的虚拟地址,返回地址一般和改地址相同
//正常来讲,指定虚拟地址的映射,映射成功以后返回的虚拟地址就是指定的虚拟地址
error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
elf_prot, elf_flags, 0);
if (BAD_ADDR(error)) {
send_sig(SIGKILL, current, 0);
retval = IS_ERR((void *)error) ?
PTR_ERR((void*)error) : -EINVAL;
goto out_free_dentry;
}
//不管是动态链接,还是静态链接,从上面的test文件的segment中可以看到,一般都会包含两个PT_LOAD,第一个一般为text代码段,第二个为数据段
if (!load_addr_set) {
load_addr_set = 1;
//text segment会在前面,所以先算出text段的加载地址
load_addr = (elf_ppnt->p_vaddr - elf_ppnt->p_offset);
//可执行文件不会走这一支
if (loc->elf_ex.e_type == ET_DYN) {
load_bias += error -
ELF_PAGESTART(load_bias + vaddr);
load_addr += load_bias;
reloc_func_desc = load_bias;
}
}
k = elf_ppnt->p_vaddr;
//start_code 初始化为全F,这边看起来elf文件第一个 PT_LOAD segment 一定是
//代码段,所以start_code在第一次处理代码segment时赋值
if (k < start_code)
start_code = k;
//从内存分布上看data数据段肯定在代码段之上,所以总是更新start_data 为
//最大加载地址
if (start_data < k)
start_data = k;
/*
* Check to see if the section's size will overflow the
* allowed task size. Note that p_filesz must always be
* <= p_memsz so it is only necessary to check p_memsz.
*/
if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
elf_ppnt->p_memsz > TASK_SIZE ||
TASK_SIZE - elf_ppnt->p_memsz < k) {
/* set_brk can never work. Avoid overflows. */
send_sig(SIGKILL, current, 0);
retval = -EINVAL;
goto out_free_dentry;
}
k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
//bss在data 段之上,所以总时更新bss为最大的加载地址以及该段的大小
//在data segment中,p_filesz 只是物理存储器上该文件的大小,不包括bss段
//所以elf_bss记录的是bss segment的起始地址
if (k > elf_bss)
elf_bss = k;
//如果该segment 是可执行段,则更新代码段结束地址
if ((elf_ppnt->p_flags & PF_X) && end_code < k)
end_code = k;
if (end_data < k)
end_data = k;
//p_memsz的大小,是内存中该segment的大小,所以elf_brk是bss段的结束地址,对于
//数据segment,p_memsz会大于p_filesz
k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;
if (k > elf_brk)
elf_brk = k;
}
loc->elf_ex.e_entry += load_bias;
elf_bss += load_bias;
elf_brk += load_bias;
start_code += load_bias;
end_code += load_bias;
start_data += load_bias;
end_data += load_bias;
/* Calling set_brk effectively mmaps the pages that we need
* for the bss and break sections. We must do this before
* mapping in the interpreter, to make sure it doesn't wind
* up getting placed where the bss needs to go.
*/
//清空bss段
retval = set_brk(elf_bss, elf_brk);
if (retval) {
send_sig(SIGKILL, current, 0);
goto out_free_dentry;
}
if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
send_sig(SIGSEGV, current, 0);
retval = -EFAULT; /* Nobody gets to see this, but.. */
goto out_free_dentry;
}
if (elf_interpreter) {
//动态链接的,需要加载解释器
unsigned long interp_map_addr = 0;
//把解释器elf文件加载到内存
---------------------------------------------------------------------------(3)
elf_entry = load_elf_interp(&loc->interp_elf_ex,
interpreter,
&interp_map_addr,
load_bias);
if (!IS_ERR((void *)elf_entry)) {
/*
* load_elf_interp() returns relocation
* adjustment
*/
----------------------------------------------------------------------------(4)
interp_load_addr = elf_entry;
elf_entry += loc->interp_elf_ex.e_entry;
}
if (BAD_ADDR(elf_entry)) {
force_sig(SIGSEGV, current);
retval = IS_ERR((void *)elf_entry) ?
(int)elf_entry : -EINVAL;
goto out_free_dentry;
}
reloc_func_desc = interp_load_addr;
allow_write_access(interpreter);
fput(interpreter);
kfree(elf_interpreter);
} else {
//静态链接elf文件中程序的入口地址,就是execve系统调用返回用户空间时的返回地址
//静态链接上面进行文件映射的时候会指定映射地址,所以在编译的时候确定的虚拟入口
//地址elf_entry 不会变,就是该程序的入口地址,这和动态链接的解释器的入口地址会
//有差异
elf_entry = loc->elf_ex.e_entry;
if (BAD_ADDR(elf_entry)) {
force_sig(SIGSEGV, current);
retval = -EINVAL;
goto out_free_dentry;
}
}
kfree(elf_phdata);
set_binfmt(&elf_format);
#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
retval = arch_setup_additional_pages(bprm, !!elf_interpreter);
if (retval < 0) {
send_sig(SIGKILL, current, 0);
goto out;
}
#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
install_exec_creds(bprm);
//填写目标文件的参数环境变量等必要信息
//在完成装入,启动用户空间的映像运行之前,还需要为目标映像和解释器准备好一些有关的信息,
//这些信息包括常规的argc、envc等等,还有一些"辅助向量(Auxiliary Vector)"。
//这些信息需要复制到用户空间,使它们在CPU进入解释器或目标映像的程序入口时出现在用户空间
//堆栈上,前面在用户栈上方预留的空间就是用来存储这些参数
retval = create_elf_tables(bprm, &loc->elf_ex,
load_addr, interp_load_addr);
if (retval < 0) {
send_sig(SIGKILL, current, 0);
goto out;
}
/* N.B. passed_fileno might not be initialized? */
current->mm->end_code = end_code;
current->mm->start_code = start_code;
current->mm->start_data = start_data;
current->mm->end_data = end_data;
current->mm->start_stack = bprm->p;
#ifdef arch_randomize_brk
if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
current->mm->brk = current->mm->start_brk =
arch_randomize_brk(current->mm);
#ifdef CONFIG_COMPAT_BRK
current->brk_randomized = 1;
#endif
}
#endif
if (current->personality & MMAP_PAGE_ZERO) {
/* Why this, you ask??? Well SVr4 maps page 0 as read-only,
and some applications "depend" upon this behavior.
Since we do not have the power to recompile these, we
emulate the SVr4 behavior. Sigh. */
error = vm_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC,
MAP_FIXED | MAP_PRIVATE, 0);
}
#ifdef ELF_PLAT_INIT
/*
* The ABI may specify that certain registers be set up in special
* ways (on i386 %edx is the address of a DT_FINI function, for
* example. In addition, it may also specify (eg, PowerPC64 ELF)
* that the e_entry field is the address of the function descriptor
* for the startup routine, rather than the address of the startup
* routine itself. This macro performs whatever initialization to
* the regs structure is required as well as any relocations to the
* function descriptor entries when executing dynamically links apps.
*/
ELF_PLAT_INIT(regs, reloc_func_desc);
#endif
//用程序入口地址,以及新的用户栈地址,来更新系统调用进入内核时的保存的原来的上下文环境
-------------------------------------------------------------------------------(5)
start_thread(regs, elf_entry, bprm->p);
retval = 0;
out:
kfree(loc);
out_ret:
return retval;
/* error cleanup */
out_free_dentry:
allow_write_access(interpreter);
if (interpreter)
fput(interpreter);
out_free_interp:
kfree(elf_interpreter);
out_free_ph:
kfree(elf_phdata);
goto out;
}
(1)对于动态链接的文件,有PT_INTERP类型的segment,用来描述该动态解释器,这边再来看一下test文件动态链接的segment:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
EXIDX 0x000578 0x00008578 0x00008578 0x00068 0x00068 R 0x4
PHDR 0x000034 0x00008034 0x00008034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x00008134 0x00008134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.3]
LOAD 0x000000 0x00008000 0x00008000 0x005e4 0x005e4 R E 0x8000
LOAD 0x0005e4 0x000105e4 0x000105e4 0x00124 0x00128 RW 0x8000
DYNAMIC 0x0005f0 0x000105f0 0x000105f0 0x000f0 0x000f0 RW 0x4
NOTE 0x000148 0x00008148 0x00008148 0x00020 0x00020 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
可以看到INTERP 里面,segment的内容就是解释器的名字/lib/ld-linux.so.3,
(2)flush_old_exec
int flush_old_exec(struct linux_binprm * bprm)
{
int retval;
/*
* Make sure we have a private signal table and that
* we are unassociated from the previous thread group.
*/
//清除掉父进程的子线程
retval = de_thread(current);
if (retval)
goto out;
set_mm_exe_file(bprm->mm, bprm->file);
filename_to_taskname(bprm->tcomm, bprm->filename, sizeof(bprm->tcomm));
/*
* Release all of the old mmap stuff
*/
acct_arg_size(bprm, 0);
//释放父进程的mm结构,用上面新申请的mm结构替换。这样就和父进程的用户空间彻底分道扬镳了
retval = exec_mmap(bprm->mm);
if (retval)
goto out;
bprm->mm = NULL; /* We're using it now */
set_fs(USER_DS);
current->flags &=
~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
flush_thread();
current->personality &= ~bprm->per_clear;
return 0;
out:
return retval;
}
(3)load_elf_interp 加载elf 解释器到内存,解释器是个动态库文件,加载原理和可执行文件也是类似的,有好些过程都是一样的,下面来看一下这段代码,只分析需要注意的一些地方:
static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
struct file *interpreter, unsigned long *interp_map_addr,
unsigned long no_base)
{
struct elf_phdr *elf_phdata;
struct elf_phdr *eppnt;
unsigned long load_addr = 0;
int load_addr_set = 0;
unsigned long last_bss = 0, elf_bss = 0;
unsigned long error = ~0UL;
unsigned long total_size;
int retval, i, size;
/* First of all, some simple consistency checks */
if (interp_elf_ex->e_type != ET_EXEC &&
interp_elf_ex->e_type != ET_DYN)
goto out;
if (!elf_check_arch(interp_elf_ex))
goto out;
if (!interpreter->f_op || !interpreter->f_op->mmap)
goto out;
/*
* If the size of this structure has changed, then punt, since
* we will be doing the wrong thing.
*/
if (interp_elf_ex->e_phentsize != sizeof(struct elf_phdr))
goto out;
//动态库也必须具有program 头
if (interp_elf_ex->e_phnum < 1 ||
interp_elf_ex->e_phnum > 65536U / sizeof(struct elf_phdr))
goto out;
/* Now read in all of the header information */
size = sizeof(struct elf_phdr) * interp_elf_ex->e_phnum;
if (size > ELF_MIN_ALIGN)
goto out;
elf_phdata = kmalloc(size, GFP_KERNEL);
if (!elf_phdata)
goto out;
//读取动态解释器 elf头,为后续解析其 program头做准备
retval = kernel_read(interpreter, interp_elf_ex->e_phoff,
(char *)elf_phdata, size);
error = -EIO;
if (retval != size) {
if (retval < 0)
error = retval;
goto out_close;
}
total_size = total_mapping_size(elf_phdata, interp_elf_ex->e_phnum);
if (!total_size) {
error = -EINVAL;
goto out_close;
}
eppnt = elf_phdata;
//
for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
//动态解释器也只处理PT_LOAD 类型的segment
if (eppnt->p_type == PT_LOAD) {
int elf_type = MAP_PRIVATE | MAP_DENYWRITE;
int elf_prot = 0;
unsigned long vaddr = 0;
unsigned long k, map_addr;
if (eppnt->p_flags & PF_R)
elf_prot = PROT_READ;
if (eppnt->p_flags & PF_W)
elf_prot |= PROT_WRITE;
if (eppnt->p_flags & PF_X)
elf_prot |= PROT_EXEC;
vaddr = eppnt->p_vaddr;
if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
elf_type |= MAP_FIXED;
else if (no_base && interp_elf_ex->e_type == ET_DYN)
load_addr = -vaddr;
//建立虚拟空间和实际文件的映射,这变需要注意的是,首先加载
//动态解释器的text段,动态解释器没有为text段指定编译地址,地址为0,
//所以该地址由系统自动分配,这样就不会和elf可执行文件的代码段以
//及数据段重叠
map_addr = elf_map(interpreter, load_addr + vaddr,
eppnt, elf_prot, elf_type, total_size);
total_size = 0;
if (!*interp_map_addr)//记录代码段的加载地址
*interp_map_addr = map_addr;
error = map_addr;
if (BAD_ADDR(map_addr))
goto out_close;
if (!load_addr_set &&
interp_elf_ex->e_type == ET_DYN) {
//记录代码段的加载地址,数据段的映射会根据该代码段地址的
//映射结果进行偏移
load_addr = map_addr - ELF_PAGESTART(vaddr);
load_addr_set = 1;
}
/*
* Check to see if the section's size will overflow the
* allowed task size. Note that p_filesz must always be
* <= p_memsize so it's only necessary to check p_memsz.
*/
k = load_addr + eppnt->p_vaddr;
if (BAD_ADDR(k) ||
eppnt->p_filesz > eppnt->p_memsz ||
eppnt->p_memsz > TASK_SIZE ||
TASK_SIZE - eppnt->p_memsz < k) {
error = -ENOMEM;
goto out_close;
}
/*
* Find the end of the file mapping for this phdr, and
* keep track of the largest address we see for this.
*/
k = load_addr + eppnt->p_vaddr + eppnt->p_filesz;
//记录bss段的起始地址
if (k > elf_bss)
elf_bss = k;
/*
* Do the same thing for the memory mapping - between
* elf_bss and last_bss is the bss section.
*/
//记录bss段的结束地址
k = load_addr + eppnt->p_memsz + eppnt->p_vaddr;
if (k > last_bss)
last_bss = k;
}
}
//如果bss段存在,清空bss段
if (last_bss > elf_bss) {
/*
* Now fill out the bss section. First pad the last page up
* to the page boundary, and then perform a mmap to make sure
* that there are zero-mapped pages up to and including the
* last bss page.
*/
if (padzero(elf_bss)) {
error = -EFAULT;
goto out_close;
}
/* What we have mapped so far */
elf_bss = ELF_PAGESTART(elf_bss + ELF_MIN_ALIGN - 1);
/* Map the last of the bss segment */
error = vm_brk(elf_bss, last_bss - elf_bss);
if (BAD_ADDR(error))
goto out_close;
}
// 返回代码段的加载地址
error = load_addr;
out_close:
kfree(elf_phdata);
out:
return error;
}
(4)因为elf动态解释器的代码段的虚拟地址为0,所以其记录的interp_elf_ex.e_entry就是程序入口地址在代码段的偏移,那么代码段进行映射以后,真实的e_entry就是偏移地址interp_elf_ex.e_entry再加上加载地址,从上可以看出,静态编译后,程序的入口地址就是编译的时候elf可执行文件中指定的入口地址;而动态编译以后程序的入口地址,则是动态解释器程序的的入口地址。
查看一下动态解释器的segment信息,可以看到可执行代码段的虚拟加载地址就是0.和我们前面分析的一致:
arm-linux-readelf -l /usr/local/arm/4.3.3/arm-none-linux-gnueabi/libc/armv4t/lib/ld-linux.so.3
Elf file type is DYN (Shared object file)
Entry point 0x7b0
There are 7 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
EXIDX 0x01be80 0x0001be80 0x0001be80 0x004c8 0x004c8 R 0x4
LOAD 0x000000 0x00000000 0x00000000 0x1c3e0 0x1c3e0 R E 0x8000
LOAD 0x01cda8 0x00024da8 0x00024da8 0x00834 0x00918 RW 0x8000
DYNAMIC 0x01cf44 0x00024f44 0x00024f44 0x000b8 0x000b8 RW 0x4
GNU_EH_FRAME 0x01c348 0x0001c348 0x0001c348 0x0001c 0x0001c R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
GNU_RELRO 0x01cda8 0x00024da8 0x00024da8 0x00258 0x00258 R 0x1
把test程序加个while循环,让其驻留在内存里,看一下其内存映射分布:
cat /proc/1035/maps
00008000-00009000 r-xp 00000000 00:09 797 /mnt/test
00010000-00011000 rwxp 00000000 00:09 797 /mnt/test 代码段
b6e0c000-b6f2c000 r-xp 00000000 00:09 461 /lib/libc-2.8.so //动态解释器加载地址
b6f2c000-b6f33000 ---p 00120000 00:09 461 /lib/libc-2.8.so
b6f33000-b6f35000 r-xp 0011f000 00:09 461 /lib/libc-2.8.so
b6f35000-b6f36000 rwxp 00121000 00:09 461 /lib/libc-2.8.so
b6f36000-b6f39000 rwxp 00000000 00:00 0
b6f39000-b6f45000 r-xp 00000000 00:09 482 /lib/libgcc_s.so.1
b6f45000-b6f4c000 ---p 0000c000 00:09 482 /lib/libgcc_s.so.1
b6f4c000-b6f4d000 rwxp 0000b000 00:09 482 /lib/libgcc_s.so.1
b6f4d000-b6f6a000 r-xp 00000000 00:09 499 /lib/ld-2.8.so
b6f6f000-b6f71000 rwxp 00000000 00:00 0
b6f71000-b6f72000 r-xp 0001c000 00:09 499 /lib/ld-2.8.so
b6f72000-b6f73000 rwxp 0001d000 00:09 499 /lib/ld-2.8.so
be9c4000-be9e5000 rw-p 00000000 00:00 0 [stack]
ffff0000-ffff1000 r-xp 00000000 00:00 0 [vectors]
ld-linux.so.3动态解释器其实指向ld-2.8.so, 可以看到为动态解释器分配的地址在内存较高地址处,不会和可执行文件的虚拟地址向重合
(5)start_thread是一个宏,用来改变系统调用时保存的用户态上下文的返回地址,以及堆栈地址,那把用户上下文的返回地址,以及堆栈地址替换掉,再系统调用返回的时候,新的程序就会开始执行了:
//系统调用的时候,会在内核栈的前面部分,保留用户态上下文,传入的regs参数
//直接指向该段保留地址,修改该地址中的内容,返回的时候用户态的则会直接影响返回结果
#define start_thread(regs,pc,sp) \
({ \
unsigned long *stack = (unsigned long *)sp; \
memset(regs->uregs, 0, sizeof(regs->uregs)); \
if (current->personality & ADDR_LIMIT_32BIT) \
regs->ARM_cpsr = USR_MODE; \
else \
regs->ARM_cpsr = USR26_MODE; \
if (elf_hwcap & HWCAP_THUMB && pc & 1) \
regs->ARM_cpsr |= PSR_T_BIT; \
regs->ARM_cpsr |= PSR_ENDSTATE; \
regs->ARM_pc = pc & ~1; /* pc */ \
regs->ARM_sp = sp; /* sp */ \
regs->ARM_r2 = stack[2]; /* r2 (envp) */ \
regs->ARM_r1 = stack[1]; /* r1 (argv) */ \
regs->ARM_r0 = stack[0]; /* r0 (argc) */ \
nommu_start_thread(regs); \
})
关于如何从系统调用返回用户态,可以参考该博文::
https://blog.csdn.net/oqqYuJi12345678/article/details/100746436
新程序真正开始执行,是返回用户态的那一刻开始
参考博文: