Linux ELF文件装入与执行概述


ELF是linux中使用最广泛的一种应用程序格式,为了弄清楚Linux内核是如何讲ELF文件精确映射到指定内存空间,上周末把内核sys_execve部分好好看了一遍,小结如下:

1. ELF格式
ELF指定了进程中text段、bss段、data段等应该放置到进程虚拟内存空间的什么位置,以及记录了进程需要用到的各种动态链接库的位置。


2. sys_execve的大致执行流程
1) 打开ELF二进制文件,读入ELF头
2) 删除从父进程继承过来的mm相关内容
3) 根据ELF头将interpreter段、text段、data段等映射进内存(由此知linux不支持压缩了的二进制程序)
设置好堆栈等,更新mm内容。
4) "伪造"好本进程的内核栈,为进程返回用户态执行做好准备。内核栈中的ip指向了interpreter段入口。
5) sys_execve系统调用返回到用户态,开始interpreter的执行(interpreter一般为linux-ld.so.2 or similar)

进入到用户态后,interpreter做了些什么呢?

6) interpreter帮助用户进程装入动态链接库,做好全部重定位映射工作。
7) interpreter返回到main开始执行。

这里面有几个问题需要深究:
1> sys_execve被调用的时候内核栈长什么样?用户态参数是如何传入到内核的?
只有弄明白了这个问题,才知道如何从内核返回到interpreter入口开始执行
A: 关于这个问题请参考linux系统调用相关章节。linux系统调用采取了一个一致的方法来处理系统调用参数问题,非常值得借鉴,将另外撰文梳理其设计思路。
2> interpreter的参数从哪里来?interpreter如何返回到main?
A: 如果从传统的C语言函数调用的角度来理解,这个问题会很费解。但是如果能从汇编的角度,动态地、有目的地调整和"伪造"调用栈,就能够做到方便地再各个函数间切换和传参。
内核会构造好interpreter所需要的参数栈,interpreter会构造好main所需要的参数栈。用户栈是在setup_arg_pages函数中构建的。
3> 内核是如何保证将各个段映射到期望的位置?
mmap函数有一个参数取MAP_FIXED参数即可。

笔记附文:

/* 将当前(current)的mm结构替换成参数中的mm结构。本函数被 * int flush_old_exec(struct linux_binprm * bprm)调用。 * 旧mm被删除。 */ static int exec_mmap(struct mm_struct *mm) { struct task_struct *tsk; struct mm_struct * old_mm, *active_mm; /* Notify parent that we're no longer interested in the old VM */ tsk = current; old_mm = current->mm; /* 释放当前进程的老mm结构(人老珠黄真可怕!)*/ mm_release(tsk, old_mm); if (old_mm) { /* 如果老的mm正在被使用(coredump)则不能继续 */ /* * Make sure that if there is a core dump in progress * for the old mm, we get out and die instead of going * through with the exec. We must hold mmap_sem around * checking core_state and changing tsk->mm. */ down_read(&old_mm->mmap_sem); if (unlikely(old_mm->core_state)) { up_read(&old_mm->mmap_sem); return -EINTR; } } /* 老的mm已经销毁了,迎接新媳妇 */ task_lock(tsk); /* 如果当前线程是个核心线程,则active_mm有效 */ active_mm = tsk->active_mm; /* 新mm入洞房 */ tsk->mm = mm; tsk->active_mm = mm; /* 第二天起,新媳妇就正式管家啦! */ activate_mm(active_mm, mm); task_unlock(tsk); /* 设置了mm中几个函数指针, 何用? */ arch_pick_mmap_layout(mm); if (old_mm) { /* 事到如今如果old_mm还没有消失, * 那是因为他们家妹妹active_mm在帮她撑腰 */ up_read(&old_mm->mmap_sem); BUG_ON(active_mm != old_mm); /* 如果老mm外头有人,就做个顺水人情 送给外头那位吧 */ mm_update_next_owner(old_mm); /* 从自己的通讯录里头把老mm删除 */ mmput(old_mm); return 0; } /* 彻底干掉老的active_mm. 莫非是为多线程服务? */ mmdrop(active_mm); return 0; } /* 将elf文件映射到当前进程的虚拟内存中 * 总体思路为: * * */ /* 预备知识 Complete Reference on ELF format: http://www.muppetlabs.com/~breadbox/software/ELF.txt 1. 为了读懂下面的代码,最好了解ELF头的格式: typedef struct elf32_hdr{ unsigned char e_ident[EI_NIDENT]; /* Magic Number */ Elf32_Half e_type; /* ET_EXEC或ET_DYN:可执行映像或共享库 */ Elf32_Half e_machine; /* 目标CPU类型 */ Elf32_Word e_version; /* */ Elf32_Addr e_entry; /* Entry point, 一般是_start()的起点 */ Elf32_Off e_phoff; /* 指向“程序头(Program Header)”数组的起点 */ Elf32_Off e_shoff; /* 向“区段头(Section Header)”数组的起点, 标定“程序段”“数据段”等等 */ Elf32_Word e_flags; Elf32_Half e_ehsize; /* 映像头部本身的大小 */ Elf32_Half e_phentsize; /* “程序头(Program Header)”数组元素的大小 */ Elf32_Half e_phnum; /* “程序头(Program Header)”数组元素的个数 */ Elf32_Half e_shentsize; /* “区段头(Section Header)”数组元素的大小 */ Elf32_Half e_shnum; /* “区段头(Section Header)”数组元素的个数 */ Elf32_Half e_shstrndx; } Elf32_Ehdr; 2. 每个程序头里面包含的是什么呢? typedef struct elf32_phdr{ Elf32_Word p_type; /* 段的类型,特别地,PT_LOAD表示是可加载的段 */ Elf32_Off p_offset; /* 该段在文件中相对于文件第0个字节的偏移 */ Elf32_Addr p_vaddr; /* 该段加载后在进程空间中占用的内存起始地址 */ Elf32_Addr p_paddr; /* 在支持paging的OS中该字段被忽略 */ Elf32_Word p_filesz; /*该段在文件中占用的字节大小. 有些段可能在 文件中不存在但却占用一定的内存空间,此时这个字段为0 */ Elf32_Word p_memsz; /* 该段在内存中占用的字节大小。有些段可能仅存在于文件 中而不被加载到内存,此时这个字段为0。*/ Elf32_Word p_flags; Elf32_Word p_align; /* 对齐值 */ } Elf32_Phdr; 3. 每个区段头里面包含的是什么呢? 区段表是从链接角度看待ELF文件的结果,所以从区段的角度ELF文件分成了许多的区, 每个区保存着用于不同目的的数据,这些数据可能被前面提到的程序头重复引用。 typedef struct elf64_shdr { Elf64_Word sh_name; /* Section name, index in string tbl */ Elf64_Word sh_type; /* Type of section */ Elf64_Xword sh_flags; /* Miscellaneous section attributes */ Elf64_Addr sh_addr; /* Section virtual addr at execution */ Elf64_Off sh_offset; /* Section file offset */ Elf64_Xword sh_size; /* Size of section in bytes */ Elf64_Word sh_link; /* Index of another section */ Elf64_Word sh_info; /* Additional section information */ Elf64_Xword sh_addralign; /* Section alignment */ Elf64_Xword sh_entsize; /* Entry size if section holds table */ } Elf64_Shdr; 4. 程序头和区段头有什么区别? 链接器和加载器看待elf是完全不同的, 链接器看到的是由区段头部表描述的一系列逻辑区段的**(也就是说它忽略了程序头部表)。 而加载器则是看成是由程序头部表描述的一系列的段的**(忽略了区段头部表)。 区分图片: http://img.ddvip.com/2009_09_10/1252583354_ddvip_9407.jpeg Segment是从映像装入角度考虑的划分,Section才是从连接/启动角度考虑的划分 以Wine为例子, Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame 03 .data .dynamic .ctors .dtors .jcr .got .bss 04 .dynamic 05 .note.ABI-tag http://blog.csdn.net/zytju1983/archive/2009/03/13/3985909.aspx 5. 如何保证各个区段map到期望的虚拟位置? mmap函数flags参数有MAP_FIXED标志,当此标志被设置的时候,一旦映射失败,则返回错误! 6. 纵观全函数,load_elf_binary的作用是: 1) 将elf各个段的数据读入到内存并建立映射 2) 将interpreter载入到内存并建立映射(包括了动态重定位过程) 3) 设置好regs结构的ip,sp等,为启动进程做好了准备 待解决的问题:interpreter如何把控制权交给_main()? 我自己的一点分析: 在load_elf_binary中获得ld-linux.so.2的入口地址eax后,执行 push eax ret 就进入了ld-linux.so.2领地,在这里ld-linux.so.2帮助装入各个链接库 Q1. 如何知道装入哪些链接库?参数从何而来? Q2. 如何在装入完成后返回到main开始执行主程序? A1. 通过堆栈操作!注意到上面两句汇编代码,起本质等价于一个jump,可以想象jump的目标地址 load_elf_binary函数内部,此时解释器的代码就和load_elf_binary函数共用参数堆栈了! A2. 通过unwind interpreter的堆栈,然后返回到main开始执行 下面的代码取自GNU ELF interpreter,说明了ld.so是如何完成链接的。 Code in http://ftp.gnu.org > gnu > glibc > glibc-2.5.tar.bz2 > glibc-2.5 > sysdeps > i386 > dl-machine.h /* Initial entry point code for the dynamic linker. The C function `_dl_start' is the real entry point; its return value is the user program's entry point. */ #define RTLD_START asm ("/n/ .text/n/ .align 16/n/ 0: movl (%esp), %ebx/n/ ret/n/ .align 16/n/ .globl _start/n/ .globl _dl_start_user/n/ _start:/n/ # Note that _dl_start gets the parameter in %eax./n/ movl %esp, %eax/n/ call _dl_start/n/ _dl_start_user:/n/ # Save the user entry point address in %edi./n/ movl %eax, %edi/n/ # Point %ebx at the GOT./n/ call 0b/n/ addl $_GLOBAL_OFFSET_TABLE_, %ebx/n/ # See if we were run as a command with the executable file/n/ # name as an extra leading argument./n/ movl _dl_skip_args@GOTOFF(%ebx), %eax/n/ # Pop the original argument count./n/ popl %edx/n/ # Adjust the stack pointer to skip _dl_skip_args words./n/ leal (%esp,%eax,4), %esp/n/ # Subtract _dl_skip_args from argc./n/ subl %eax, %edx/n/ # Push argc back on the stack./n/ push %edx/n/ # The special initializer gets called with the stack just/n/ # as the application's entry point will see it; it can/n/ # switch stacks if it moves these contents over./n/ " RTLD_START_SPECIAL_INIT "/n/ # Load the parameters again./n/ # (eax, edx, ecx, *--esp) = (_dl_loaded, argc, argv, envp)/n/ movl _rtld_local@GOTOFF(%ebx), %eax/n/ leal 8(%esp,%edx,4), %esi/n/ leal 4(%esp), %ecx/n/ movl %esp, %ebp/n/ # Make sure _dl_init is run with 16 byte aligned stack./n/ andl $-16, %esp/n/ pushl %eax/n/ pushl %eax/n/ pushl %ebp/n/ pushl %esi/n/ # Clear %ebp, so that even constructors have terminated backchain./n/ xorl %ebp, %ebp/n/ # Call the function to run the initializers./n/ call _dl_init_internal@PLT/n/ # Pass our finalizer function to the user in %edx, as per ELF ABI./n/ leal _dl_fini@GOTOFF(%ebx), %edx/n/ # Restore %esp _start expects./n/ movl (%esp), %esp/n/ # Jump to the user's entry point./n/ jmp *%edi/n/ .previous/n/ "); /* Call the OS-dependent function to set up life so we can do things like file access. It will call `dl_main' (below) to do all the real work of the dynamic linker, and then unwind our frame and run the user entry point on the same stack we entered on. */ Code in rtld.c .... */ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs) { struct file *interpreter = NULL; /* to shut gcc up */ unsigned long load_addr = 0, load_bias = 0; int load_addr_set = 0; char * elf_interpreter = NULL; unsigned long error; struct elf_phdr *elf_ppnt, *elf_phdata; unsigned long elf_bss, elf_brk; int elf_exec_fileno; int retval, i; unsigned int size; unsigned long elf_entry; unsigned long interp_load_addr = 0; unsigned long start_code, end_code, start_data, end_data; unsigned long reloc_func_desc = 0; int executable_stack = EXSTACK_DEFAULT; unsigned long def_flags = 0; struct { struct elfhdr elf_ex; struct elfhdr interp_elf_ex; } *loc; loc = kmalloc(sizeof(*loc), GFP_KERNEL); if (!loc) { retval = -ENOMEM; goto out_ret; } /* Get the exec-header */ loc->elf_ex = *((struct elfhdr *)bprm->buf); retval = -ENOEXEC; /* First of all, some simple consistency checks */ if (memcmp(loc->elf_ex.e_ident, ELFMAG, SELFMAG) != 0) goto out; if (loc->elf_ex.e_type != ET_EXEC && loc->elf_ex.e_type != ET_DYN) goto out; if (!elf_check_arch(&loc->elf_ex)) goto out; /* EFL文件所在的文件系统必须支持mmap操作 */ if (!bprm->file->f_op||!bprm->file->f_op->mmap) goto out; /* Now read in all of the header information */ if (loc->elf_ex.e_phentsize != sizeof(struct elf_phdr)) goto out; if (loc->elf_ex.e_phnum < 1 || loc->elf_ex.e_phnum > 65536U / sizeof(struct elf_phdr)) goto out; /* Note: ELF装载器(区分链接器)只使用Program Header * 下面为Program Header分配空间 * Program header里面指明了各个区段应该如何装载到内存中 */ size = loc->elf_ex.e_phnum * sizeof(struct elf_phdr); retval = -ENOMEM; elf_phdata = kmalloc(size, GFP_KERNEL); if (!elf_phdata) goto out; /* 将ELF文件中Program Header部分读入到缓存中 */ retval = kernel_read(bprm->file, loc->elf_ex.e_phoff, (char *)elf_phdata, size); if (retval != size) { if (retval >= 0) retval = -EIO; goto out_free_ph; } /* 下面对ELF文件的操作应该需要一个fd (?) */ retval = get_unused_fd(); if (retval < 0) goto out_free_ph; get_file(bprm->file); fd_install(elf_exec_fileno = retval, bprm->file); elf_ppnt = elf_phdata; elf_bss = 0; elf_brk = 0; start_code = ~0UL; end_code = 0; start_data = 0; end_data = 0; /* 下面的代码遍历 三次Program Header数组 * 第一次处理PT_INTERP类型的区段 * 第二次处理PT_GNU_STACK类型的区段 * 第三次才处理PT_LOAD类型的区段 * NOTE: PT_DYNAMIC这个字段并没有处理,留给interpreter来映射和重定位。 * 下面分区段注释 */ /* * 第一次处理PT_INTERP类型的区段 */ for (i = 0; i < loc->elf_ex.e_phnum; i++) { if (elf_ppnt->p_type == PT_INTERP) { /* This is the program interpreter used for * shared libraries - for now assume that this * is an a.out format binary */ retval = -ENOEXEC; if (elf_ppnt->p_filesz > PATH_MAX || elf_ppnt->p_filesz < 2) goto out_free_file; retval = -ENOMEM; elf_interpreter = kmalloc(elf_ppnt->p_filesz, GFP_KERNEL); if (!elf_interpreter) goto out_free_file; /* 在PT_INTERP段中存放的是链接器的名称 * ELF规范强制要求OS最先处理该字段 * 该字段的内容类似于: * /lib64/ld-linux-x86-64.so.2 */ retval = kernel_read(bprm->file, elf_ppnt->p_offset, elf_interpreter, elf_ppnt->p_filesz); if (retval != elf_ppnt->p_filesz) { if (retval >= 0) retval = -EIO; goto out_free_interp; } /* make sure path is NULL terminated */ retval = -ENOEXEC; if (elf_interpreter[elf_ppnt->p_filesz - 1] != '/0') goto out_free_interp; /* * The early SET_PERSONALITY here is so that the lookup * for the interpreter happens in the namespace of the * to-be-execed image. SET_PERSONALITY can select an * alternate root. * * However, SET_PERSONALITY is NOT allowed to switch * this task into the new images's memory mapping * policy - that is, TASK_SIZE must still evaluate to * that which is appropriate to the execing application. * This is because exit_mmap() needs to have TASK_SIZE * evaluate to the size of the old image. * * So if (say) a 64-bit application is execing a 32-bit * application it is the architecture's responsibility * to defer changing the value of TASK_SIZE until the * switch really is going to happen - do this in * flush_thread(). - akpm */ SET_PERSONALITY(loc->elf_ex); /* 打开链接器文件,返回文件句柄 */ interpreter = open_exec(elf_interpreter); retval = PTR_ERR(interpreter); if (IS_ERR(interpreter)) goto out_free_interp; /* * If the binary is not readable then enforce * mm->dumpable = 0 regardless of the interpreter's * permissions. */ if (file_permission(interpreter, MAY_READ) < 0) bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; /* 读入链接器的程序头 */ retval = kernel_read(interpreter, 0, bprm->buf, BINPRM_BUF_SIZE); if (retval != BINPRM_BUF_SIZE) { if (retval >= 0) retval = -EIO; goto out_free_dentry; } /* Get the exec headers */ loc->interp_elf_ex = *((struct elfhdr *)bprm->buf); break; } elf_ppnt++; } /* * 第二次处理PT_GNU_STACK类型的区段 */ elf_ppnt = elf_phdata; for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++) if (elf_ppnt->p_type == PT_GNU_STACK) { /* 由代码可以看出,这个区段只是提供了一个标志 * 没有实际的段数据 */ if (elf_ppnt->p_flags & PF_X) executable_stack = EXSTACK_ENABLE_X; else executable_stack = EXSTACK_DISABLE_X; break; } /* 检查链接器的ELF标志以及其目标平台是否合法 */ /* Some simple consistency checks for the interpreter */ if (elf_interpreter) { retval = -ELIBBAD; /* Not an ELF interpreter */ if (memcmp(loc->interp_elf_ex.e_ident, ELFMAG, SELFMAG) != 0) goto out_free_dentry; /* Verify the interpreter has a valid arch */ if (!elf_check_arch(&loc->interp_elf_ex)) goto out_free_dentry; } else { /* Executables without an interpreter also need a personality */ SET_PERSONALITY(loc->elf_ex); } /* Flush all traces of the currently running executable */ retval = flush_old_exec(bprm); if (retval) goto out_free_dentry; /* OK, This is the point of no return */ current->flags &= ~PF_FORKNOEXEC; current->mm->def_flags = def_flags; /* Do this immediately, since STACK_TOP as used in setup_arg_pages may depend on the personality. */ SET_PERSONALITY(loc->elf_ex); if (elf_read_implies_exec(loc->elf_ex, executable_stack)) current->personality |= READ_IMPLIES_EXEC; if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space) current->flags |= PF_RANDOMIZE; arch_pick_mmap_layout(current->mm); /* Do this so that we can load the interpreter, if need be. We will change some of these later */ current->mm->free_area_cache = current->mm->mmap_base; current->mm->cached_hole_size = 0; retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP), executable_stack); if (retval < 0) { send_sig(SIGKILL, current, 0); goto out_free_dentry; } current->mm->start_stack = bprm->p; /* * 第三次处理PT_LOAD类型的区段 */ /* Now we do a little grungy work by mmaping the ELF image into the correct location in memory. */ for(i = 0, elf_ppnt = elf_phdata; i < loc->elf_ex.e_phnum; i++, elf_ppnt++) { int elf_prot = 0, elf_flags; unsigned long k, vaddr; if (elf_ppnt->p_type != PT_LOAD) continue; if (unlikely (elf_brk > elf_bss)) { unsigned long nbyte; /* There was a PT_LOAD segment with p_memsz > p_filesz before this one. Map anonymous pages, if needed, and clear the area. */ retval = set_brk (elf_bss + load_bias, elf_brk + load_bias); if (retval) { send_sig(SIGKILL, current, 0); goto out_free_dentry; } nbyte = ELF_PAGEOFFSET(elf_bss); if (nbyte) { nbyte = ELF_MIN_ALIGN - nbyte; if (nbyte > elf_brk - elf_bss) nbyte = elf_brk - elf_bss; if (clear_user((void __user *)elf_bss + load_bias, nbyte)) { /* * This bss-zeroing can fail if the ELF * file specifies odd protections. So * we don't check the return value */ } } } if (elf_ppnt->p_flags & PF_R) elf_prot |= PROT_READ; if (elf_ppnt->p_flags & PF_W) elf_prot |= PROT_WRITE; if (elf_ppnt->p_flags & PF_X) elf_prot |= PROT_EXEC; elf_flags = MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE; vaddr = elf_ppnt->p_vaddr; if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) { /* 非动态定位部分,必须映射到期望区间, * 故而指定MAP_FIXED参数 */ elf_flags |= MAP_FIXED; } else if (loc->elf_ex.e_type == ET_DYN) { /* Try and get dynamic programs out of the way of the * default mmap base, as well as whatever program they * might try to exec. This is because the brk will * follow the loader, and is not movable. */ #ifdef CONFIG_X86 load_bias = 0; #else load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr); #endif } /* 重点代码 * 将file中的对应区段内容map到vaddr中 */ error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags, 0); if (BAD_ADDR(error)) { send_sig(SIGKILL, current, 0); retval = IS_ERR((void *)error) ? PTR_ERR((void*)error) : -EINVAL; goto out_free_dentry; } /* 本代码只在第一次时执行 */ if (!load_addr_set) { load_addr_set = 1; load_addr = (elf_ppnt->p_vaddr - elf_ppnt->p_offset); if (loc->elf_ex.e_type == ET_DYN) { load_bias += error - ELF_PAGESTART(load_bias + vaddr); load_addr += load_bias; reloc_func_desc = load_bias; } } k = elf_ppnt->p_vaddr; if (k < start_code) start_code = k; if (start_data < k) start_data = k; /* * Check to see if the section's size will overflow the * allowed task size. Note that p_filesz must always be * <= p_memsz so it is only necessary to check p_memsz. */ if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz || elf_ppnt->p_memsz > TASK_SIZE || TASK_SIZE - elf_ppnt->p_memsz < k) { /* set_brk can never work. Avoid overflows. */ send_sig(SIGKILL, current, 0); retval = -EINVAL; goto out_free_dentry; } k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz; if (k > elf_bss) elf_bss = k; if ((elf_ppnt->p_flags & PF_X) && end_code < k) end_code = k; if (end_data < k) end_data = k; k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz; if (k > elf_brk) elf_brk = k; } /* end of for PT_LOAD */ /* 对PT_LOAD的全部努力就得到如下数据,加上已经 * 映射好了的内存段 */ loc->elf_ex.e_entry += load_bias; elf_bss += load_bias; elf_brk += load_bias; start_code += load_bias; end_code += load_bias; start_data += load_bias; end_data += load_bias; /* Calling set_brk effectively mmaps the pages that we need * for the bss and break sections. We must do this before * mapping in the interpreter, to make sure it doesn't wind * up getting placed where the bss needs to go. */ retval = set_brk(elf_bss, elf_brk); if (retval) { send_sig(SIGKILL, current, 0); goto out_free_dentry; } if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) { send_sig(SIGSEGV, current, 0); retval = -EFAULT; /* Nobody gets to see this, but.. */ goto out_free_dentry; } /* 读入链接器到内存中,记录入口地址 */ if (elf_interpreter) { unsigned long uninitialized_var(interp_map_addr); elf_entry = load_elf_interp(&loc->interp_elf_ex, interpreter, &interp_map_addr, load_bias); if (!IS_ERR((void *)elf_entry)) { /* * load_elf_interp() returns relocation * adjustment */ interp_load_addr = elf_entry; elf_entry += loc->interp_elf_ex.e_entry; } if (BAD_ADDR(elf_entry)) { force_sig(SIGSEGV, current); retval = IS_ERR((void *)elf_entry) ? (int)elf_entry : -EINVAL; goto out_free_dentry; } reloc_func_desc = interp_load_addr; allow_write_access(interpreter); fput(interpreter); kfree(elf_interpreter); } else { elf_entry = loc->elf_ex.e_entry; if (BAD_ADDR(elf_entry)) { force_sig(SIGSEGV, current); retval = -EINVAL; goto out_free_dentry; } } kfree(elf_phdata); sys_close(elf_exec_fileno); set_binfmt(&elf_format); #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES retval = arch_setup_additional_pages(bprm, executable_stack); if (retval < 0) { send_sig(SIGKILL, current, 0); goto out; } #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */ compute_creds(bprm); current->flags &= ~PF_FORKNOEXEC; /* 这个函数做了很多事,需要仔细分析! * bprm->p在这里被修改了。 */ retval = create_elf_tables(bprm, &loc->elf_ex, load_addr, interp_load_addr); if (retval < 0) { send_sig(SIGKILL, current, 0); goto out; } /* N.B. passed_fileno might not be initialized? */ current->mm->end_code = end_code; current->mm->start_code = start_code; current->mm->start_data = start_data; current->mm->end_data = end_data; current->mm->start_stack = bprm->p; #ifdef arch_randomize_brk if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) current->mm->brk = current->mm->start_brk = arch_randomize_brk(current->mm); #endif if (current->personality & MMAP_PAGE_ZERO) { /* Why this, you ask??? Well SVr4 maps page 0 as read-only, and some applications "depend" upon this behavior. Since we do not have the power to recompile these, we emulate the SVr4 behavior. Sigh. */ down_write(&current->mm->mmap_sem); error = do_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC, MAP_FIXED | MAP_PRIVATE, 0); up_write(&current->mm->mmap_sem); } #ifdef ELF_PLAT_INIT /* * The ABI may specify that certain registers be set up in special * ways (on i386 %edx is the address of a DT_FINI function, for * example. In addition, it may also specify (eg, PowerPC64 ELF) * that the e_entry field is the address of the function descriptor * for the startup routine, rather than the address of the startup * routine itself. This macro performs whatever initialization to * the regs structure is required as well as any relocations to the * function descriptor entries when executing dynamically links apps. */ ELF_PLAT_INIT(regs, reloc_func_desc); #endif /* start_thread名不副实,更应该叫做prepare_user_thread() * 它把邋elf_entry、user_stack设置到regs里面去了 * 为后面的启动做好了准备。真正启动用户态程序的时机是 * sys_execve()返回到用户态的时候! * http://lkml.indiana.edu/hypermail/linux/kernel/0105.2/0910.html */ start_thread(regs, elf_entry, bprm->p); retval = 0; out: kfree(loc); out_ret: return retval; /* error cleanup */ out_free_dentry: allow_write_access(interpreter); if (interpreter) fput(interpreter); out_free_interp: kfree(elf_interpreter); out_free_file: sys_close(elf_exec_fileno); out_free_ph: kfree(elf_phdata); goto out; } /* ip,sp到底是如何转换的呢?这里面用到了诀窍! sys_execve->do_execve->search_binary_handler->load_binary->load_elf_binary->(code above) 首先弄明白下面的问题: 1. 系统调用中,用户参数、用户栈是如何管理的?保存在哪里? 首先描述下陷入内核的时候堆栈长成了什么样: ( in the famous 8K space ) struct pt_regs { unsigned long bx; /* 进入内核后SAVE_ALL压入 */ 低地址 unsigned long cx; /* 进入内核后SAVE_ALL压入 */ unsigned long dx; /* 进入内核后SAVE_ALL压入 */ unsigned long si; /* 进入内核后SAVE_ALL压入 */ unsigned long di; /* 进入内核后SAVE_ALL压入 */ ^ unsigned long bp; /* 进入内核后SAVE_ALL压入 */ ^ unsigned long ax; /* 进入内核后SAVE_ALL压入 */ ^ unsigned long ds; /* 进入内核后SAVE_ALL压入 */ ^ unsigned long es; /* 进入内核后SAVE_ALL压入 */ ^ unsigned long fs; /* 进入内核后SAVE_ALL压入 */ ^ /* int gs; */ unsigned long orig_ax;/* 进入内核后push eax压入 */ unsigned long ip; /* 陷入内核时系统自动压入 */ unsigned long cs; /* 陷入内核时系统自动压入 */ unsigned long flags; /* 陷入内核时系统自动压入 */ unsigned long sp; /* 陷入内核时系统自动压入 */ unsigned long ss; /* 陷入内核时系统自动压入 */ 高地址 }; NOTE: 越是下面的数据越早被压入堆栈. 下面是2.6内核中进入内核栈后的代码. # system call handler stub ENTRY(system_call) RING0_INT_FRAME # can't unwind into user space anyway pushl %eax # save orig_eax CFI_ADJUST_CFA_OFFSET 4 # cld instruction SAVE_ALL GET_THREAD_INFO(%ebp) # 这个时候esp指向的是pt_regs栈顶(高地址) syscall_call: call *sys_call_table(,%eax,4) # call的目标地址为sys_call_table+eax*4, 应该就是eax表示调用号, # 调用目标即为函数入口, 此时ip再次压栈, 参数esp指向ip # 在服务函数内部, 就可以通过esp访问到pt_regs了 movl %eax,PT_EAX(%esp) # store the return value syscall_exit: LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt # setting need_resched or sigpending # between sampling and the iret ...... 在64为计算机上,sys_execve反汇编结果如下: (objdump -d /lib/modules/2.6.18.8/source/arch/x86_64/kernel/process.o) 0000000000000000 <sys_execve>: 0: 48 83 ec 28 sub $0x28,%rsp # 腾出本地局部变量栈 4: 48 89 5c 24 08 mov %rbx,0x8(%rsp) # 保存些寄存器到临时栈中 9: 48 89 6c 24 10 mov %rbp,0x10(%rsp) e: 48 89 d5 mov %rdx,%rbp # Why rdx? no idea. 11: 4c 89 64 24 18 mov %r12,0x18(%rsp) 16: 4c 89 6c 24 20 mov %r13,0x20(%rsp) 不纠缠这个了。。。乱!反正就是知道一点,pt_regs中有你所需 关于Linux用户进程向系统中断调用过程传递参数方面, Linux系统使用了通用寄存器传递方法,例如寄存器ebx、ecx和edx。 这种使用寄存器传递参数方法的一个明显优点就是: 当进入系统中断服务程序而保存寄存器值时, 这些传递参数的寄存器也被自动地放在了内核态堆栈上, 因此用不着再专门对传递参数的寄存器进行特殊处理。 2. 如何与execve合作? 在pt_regs 的帮助下,可以设置ip,esp, 对于execve之类的系统调用,就可以通过替换掉ip,esp 来实现移花接木的效果。 3. 用户态如何把参数传入核心栈呢? 举个例子用户态write被调用时候 write: pushl %ebx movl 8(%esp), %ebx ; linux的_syscall3使得这里做了如此的展开 movl 12(%esp), %ecx ; 使得寄存器传参得以实现 movl 16(%esp), %edx ; 显然,这个过程不依赖于编译器 movl $4, %eax int $0x80 .... Read more from this perfect online book-store: http://my.safaribooksonline.com/0-596-00002-2/ch08-10-fm2xml Chapter 8. System Calls > Anticipating Linux 2.4 - Pg. 241 */ /* * sys_execve() executes a new program. */ asmlinkage int sys_execve(struct pt_regs regs) { int error; char * filename; filename = getname((char __user *) regs.bx); error = PTR_ERR(filename); if (IS_ERR(filename)) goto out; error = do_execve(filename, (char __user * __user *) regs.cx, (char __user * __user *) regs.dx, &regs); if (error == 0) { /* Make sure we don't return using sysenter.. */ set_thread_flag(TIF_IRET); } putname(filename); out: return error; }

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值