MachO文件格式

目录

0x1 前言

0x2 例子程序

0x3 MachO文件格式分析 

0xx1 Mach Header-可执行文件头

0xx2 Load Commands-加载命令

segment_command结构体定义:

dyld_info_command结构体

LC_SYMTAB结构体

 LC_LOAD_DYLINKER结构体

LC_UUID结构体 

 LC_LOAD_DYLIB结构体

 LC_RPATH结构体

0xx3 DATA(Section64)        

0xx4 Dynamic Loader Info

0xx5 Function Starts、Symbol Table、Dynamic Symbol Table、String Table、Code Signature

可执行文件运行过程


0x1 前言

刚刚接触IOS逆向,要从最基础的开始学起,看了一些视频之后,发现MachO文件挺重要的,HOOK的时候会用到,当然不止hook,理解他的文件格式显得很重要,今天就自己总结一下。

0x2 例子程序

例子程序就拿我写好的一个fishhook的好了,代码主要代码和工程目录如下:

//
//  ViewController.m
//  fishhook
//
//  Created by yrl on 2019/7/26.
//  Copyright © 2019 apple. All rights reserved.
//

#import "ViewController.h"
#import "fishhook.h"

@interface ViewController ()

@end

@implementation ViewController

- (void)viewDidLoad {
    [super viewDidLoad];
    NSLog(@"123");//NSLog属于懒加载,调用才能看到
    //实现交换
    //定义rebinding结构体
    struct rebinding nslog;
    nslog.name = "NSLog";
    nslog.replacement = myNSLog;
    nslog.replaced = (void **)&sys_nslog;
    //定义结构体数组
    struct rebinding rebs[1] = {nslog};
    /* 用来重新绑定符号
     * arg1 存放rebinding结构体的数组
     * arg2 数组长度
     */
    rebind_symbols(rebs, 1);//两个参数 结构体数组、数组长度
    printf("修改完毕!!");
}
//-------------更改系统NSLOG函数-------------
//函数指针,用来保存原来的函数地址
static void(*sys_nslog)(NSString *format,...);
//定义一个新的函数
void myNSLog(NSString *format, ...){
    format = [format stringByAppendingString:@"\n勾上了!!"];
    sys_nslog(format);
}
-(void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event{//屏幕触发事件
    NSLog(@"点击屏幕!!");
}

@end

0x3 MachO文件格式分析 

å¾å说ææå­

可以看出,Mach-O主要由以下3部分组成。

 Mach-O头部(mach header)。描述了Mach-O的CPU架构、文件类型以及加载命令等信息。

 加载命令(load command)。描述了文件中数据的具体组织结构,不同的数据类型使用不同的加载命令表示。

 Data。Data中每个段(segment)的数据都保存在这里,段的概念与ELF文件中段的概念类似。每个段都有一个或多个Section,它们存放了具体的数据与代码。

这里结束可视化工具,烂苹果(MachOview)来查看其文件格式,也可以用苹果自带的otool工具查看,其内容是一样的,不过MachO会显示的人性化一点。如下图:

两种方法得到的结果是一样的,不过个人觉得MachO会更直观和全面。

MachO文件格式,有这几个部分(下面的每一个部分都是代表一个结构体):

mach64 Header文件头
Load Commands加载命令
__TEXT文本(代码)段
__DATA数据段
Dynamic Loader Info动态加载信息
Function Starts入口函数
Symbol Table符号表
Dynamic Symbol Table动态库符号表
String Table

字符串表

Code Signature代码签名

0xx1 Mach Header-可执行文件头

用mac下的工具otool查看macho文件头信息:otool -h fishhook

Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x00           2    22       3096 0x00200085

上面是macho标准的文件头格式,对应的数据结构如下:

/*
 * The 32-bit mach header appears at the very beginning of the object file for
 * 32-bit architectures.
 */
struct mach_header {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t   cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
};

/* Constant for the magic field of the mach_header (32-bit architectures) */
#define MH_MAGIC    0xfeedface  /* the mach magic number */
#define MH_CIGAM    0xcefaedfe  /* NXSwapInt(MH_MAGIC) */

/*
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t   cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
    uint32_t    reserved;   /* reserved */
};

/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */

 字段解释和举例解释如下:

字段说明例子(举例)
magic魔数头,系统加载器通过该字段快速判断文件适用于32位还是64位。0xfeedface是32位,0xfeedfacf是64位。例子中是0xfeedfacf为64位文件
cputypeCPU类型,该字段确保系统可以将合适的二进制文件在当前架构下运行例子中的值为

16777223(0x100007)为CPU_TYPE_X86_64 

cpusubtype

CPU指定子类型,inter、arm、powerpc等,详细描述其支持的CPU子类型

#define CPU_SUBTYPE_MASK    0xff000000    

#define CPU_SUBTYPE_LIB64    0x80000000

#define CPU_SUBTYPE_X86_ALL        ((cpu_subtype_t)3)

#define CPU_SUBTYPE_X86_64_ALL        ((cpu_subtype_t)3)

#define CPU_SUBTYPE_X86_ARCH1        ((cpu_subtype_t)4)

#define CPU_SUBTYPE_X86_64_H        ((cpu_subtype_t)8)

 

例子中的为3,定义为CPU_SUBTYPE_X86_64_ALL 
filetype

说明文件类型(可执行文件、库文件、核心转储文件、内河扩展文件、DYSM文件、动态库等)

#define    MH_OBJECT    0x1    编译过程中产生的  obj文件 (gcc -c xxx.c 生成xxx.o文件)

#define    MH_EXECUTE    0x2        可执行二进制文件 (/usr/bin/ls)

#define    MH_FVMLIB    0x3        

#define    MH_CORE        0x4        CoreDump (崩溃时的Dump文件)

#define    MH_PRELOAD    0x5        

#define    MH_DYLIB    0x6        动态库(/usr/lib/里面的那些共享库文件)

#define    MH_DYLINKER    0x7       连接器linker(/usr/lib/dyld文件)

#define    MH_BUNDLE    0x8        

#define    MH_DYLIB_STUB    0x9        

#define    MH_DSYM        0xa       

#define    MH_KEXT_BUNDLE    0xb   内核扩展文件 (自己开发的简单内核模块

#define	MH_OBJECT	0x1	/*relocatable object file */
#define	MH_EXECUTE	0x2	/*demand paged executable file */
#define	MH_FVMLIB	0x3	/*fixed VM shared library file */
#define	MH_CORE		0x4	/*core file */
#define	MH_PRELOAD	0x5	/*preloaded executable file */
#define	MH_DYLIB	0x6	/*dynamically bound shared library */
#define	MH_DYLINKER	0x7	/*dynamic link editor */
#define	MH_BUNDLE	0x8	/*dynamically bound bundle file */
#define	MH_DYLIB_STUB	0x9	/*shared library stub for static */
/*linking only, no section contents */
#define	MH_DSYM		0xa	/*companion file with only debug */
/*  sections */
#define	MH_KEXT_BUNDLE	0xb	/*x86_64 kexts */
例子中值为2MH-EXECUTE,二进制可执行文件
ncmds说明加载命令的条数例子中为22,说明有22条加载命令,对应Load Commands中的22项
sizeofcmds表示加载命令的大小例子中为3096字节
flags标志位,表示二进制文件支持的功能,主要是和系统加载,链接相关

 #define    MH_NOUNDEFS    0x1

#define    MH_INCRLINK    0x2

#define    MH_DYLDLINK    0x4

#define    MH_LAZY_INIT 0x40

#define    MH_TWOLEVEL    0x80

#define    MH_PIE 0x200000

例子中为0x200085,

MH-NOUNDEFS表示:目标没有未定义的符号,不存在链接依赖

MH-DYLDLINK表示:该目标文件是dyld的输入文件,无法被再次的静态链接

MH-TWOLEVEL表示:动态加载二级名称空间

MH-PIE表示:地址空间布局随机化

reserved64位的保留字段详细请看loader.h

系统先解释头文件、获得文件支持位数(32 、64),获得CPU类型,获得文件类型,加载命令条数和大小,获得文件标识。

0xx2 Load Commands-加载命令

在mach_header之后是Load Command加载命令,这些加载命令在Mach-O文件加载解析时,会被内核加载器或者动态链接器调用,基本的加载命令的数据结构如下:

struct load_command {
    uint32_t cmd;        /* type of load command */
    uint32_t cmdsize;    /* total size of command in bytes */
};

此结构对应的成员只有两个:cmd字段代表当前加载命令的类型,cmdsize字段代表当前加载命令的大小。

cmd的类型不同,所代表的加载命令的类型就不同,它的结构体也会有所不同。不同类型的加载命令会在load_command结构体后面加上一个或多个字段来表示特定的结构体信息。

在macOS系统进化的过程中,加载命令是更新比较频繁的一个数据结构体。加载命令的类型cmd的取值共有0x22种,它们的定义如下:

#define	LC_SEGMENT	0x1	/* segment of this file to be mapped */
#define	LC_SYMTAB	0x2	/* link-edit stab symbol table info */
#define	LC_SYMSEG	0x3	/* link-edit gdb symbol table info (obsolete) */
#define	LC_THREAD	0x4	/* thread */
#define	LC_UNIXTHREAD	0x5	/* unix thread (includes a stack) */
#define	LC_LOADFVMLIB	0x6	/* load a specified fixed VM shared library */
#define	LC_IDFVMLIB	0x7	/* fixed VM shared library identification */
#define	LC_IDENT	0x8	/* object identification info (obsolete) */
#define LC_FVMFILE	0x9	/* fixed VM file inclusion (internal use) */
#define LC_PREPAGE      0xa     /* prepage command (internal use) */
#define	LC_DYSYMTAB	0xb	/* dynamic link-edit symbol table info */
#define	LC_LOAD_DYLIB	0xc	/* load a dynamically linked shared library */
#define	LC_ID_DYLIB	0xd	/* dynamically linked shared lib ident */
#define LC_LOAD_DYLINKER 0xe	/* load a dynamic linker */
#define LC_ID_DYLINKER	0xf	/* dynamic linker identification */
#define	LC_PREBOUND_DYLIB 0x10	/* modules prebound for a dynamically */
				/*  linked shared library */
#define	LC_ROUTINES	0x11	/* image routines */
#define	LC_SUB_FRAMEWORK 0x12	/* sub framework */
#define	LC_SUB_UMBRELLA 0x13	/* sub umbrella */
#define	LC_SUB_CLIENT	0x14	/* sub client */
#define	LC_SUB_LIBRARY  0x15	/* sub library */
#define	LC_TWOLEVEL_HINTS 0x16	/* two-level namespace lookup hints */
#define	LC_PREBIND_CKSUM  0x17	/* prebind checksum */

/*
 * load a dynamically linked shared library that is allowed to be missing
 * (all symbols are weak imported).
 */
#define	LC_LOAD_WEAK_DYLIB (0x18 | LC_REQ_DYLD)

#define	LC_SEGMENT_64	0x19	/* 64-bit segment of this file to be
				   mapped */
#define	LC_ROUTINES_64	0x1a	/* 64-bit image routines */
#define LC_UUID		0x1b	/* the uuid */
#define LC_RPATH       (0x1c | LC_REQ_DYLD)    /* runpath additions */
#define LC_CODE_SIGNATURE 0x1d	/* local of code signature */
#define LC_SEGMENT_SPLIT_INFO 0x1e /* local of info to split segments */
#define LC_REEXPORT_DYLIB (0x1f | LC_REQ_DYLD) /* load and re-export dylib */
#define	LC_LAZY_LOAD_DYLIB 0x20	/* delay load of dylib until first use */
#define	LC_ENCRYPTION_INFO 0x21	/* encrypted segment information */
#define	LC_DYLD_INFO 	0x22	/* compressed dyld information */
#define	LC_DYLD_INFO_ONLY (0x22|LC_REQ_DYLD)	/* compressed dyld information only */

所有这些加载命令由系统内核加载器直接使用,或由动态链接器处理。其中几个常见的加载命令为LC_SEGMENT、LC_LOAD_DYLINKER、LC_LOAD_DYLIB、LC_MAIN、LC_CODE_SIGNATURE、LC_ENCRYPTION_INFO等。下面进行分析:

segment_command结构体定义:

/*
 * The segment load command indicates that a part of this file is to be
 * mapped into the task's address space.  The size of this segment in memory,
 * vmsize, maybe equal to or larger than the amount to map from this file,
 * filesize.  The file is mapped starting at fileoff to the beginning of
 * the segment in memory, vmaddr.  The rest of the memory of the segment,
 * if any, is allocated zero fill on demand.  The segment's maximum virtual
 * memory protection and initial virtual memory protection are specified
 * by the maxprot and initprot fields.  If the segment has sections then the
 * section structures directly follow the segment command and their size is
 * reflected in cmdsize.
 */
struct segment_command { /* for 32-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT */
	uint32_t	cmdsize;	/* includes sizeof section structs */
	char		segname[16];	/* segment name */
	uint32_t	vmaddr;		/* memory address of this segment */
	uint32_t	vmsize;		/* memory size of this segment */
	uint32_t	fileoff;	/* file offset of this segment */
	uint32_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};

/*
 * The 64-bit segment load command indicates that a part of this file is to be
 * mapped into a 64-bit task's address space.  If the 64-bit segment has
 * sections then section_64 structures directly follow the 64-bit segment
 * command and their size is reflected in cmdsize.
 */
struct segment_command_64 { /* for 64-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT_64 */
	uint32_t	cmdsize;	/* includes sizeof section_64 structs */
	char		segname[16];	/* segment name */
	uint64_t	vmaddr;		/* memory address of this segment */
	uint64_t	vmsize;		/* memory size of this segment */
	uint64_t	fileoff;	/* file offset of this segment */
	uint64_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};

 例子:

含义
_PAGEZERO空指针陷阱段,映射到虚拟内存空间第一页,捕捉对NULL指针的引用
_TEXT代码段、只读数据段
_DATA读取和写入数据段
_LINKEDITdyld需要使用的信息,包括重定位、绑定、懒加载信息等

 

dyld_info_command结构体

/*
 * The dyld_info_command contains the file offsets and sizes of 
 * the new compressed form of the information dyld needs to 
 * load the image.  This information is used by dyld on Mac OS X
 * 10.6 and later.  All information pointed to by this command
 * is encoded using byte streams, so no endian swapping is needed
 * to interpret it. 
 */
struct dyld_info_command {
   uint32_t   cmd;		/* LC_DYLD_INFO or LC_DYLD_INFO_ONLY */
   uint32_t   cmdsize;		/* sizeof(struct dyld_info_command) */

    /*
     * Dyld rebases an image whenever dyld loads it at an address different
     * from its preferred address.  The rebase information is a stream
     * of byte sized opcodes whose symbolic names start with REBASE_OPCODE_.
     * Conceptually the rebase information is a table of tuples:
     *    <seg-index, seg-offset, type>
     * The opcodes are a compressed way to encode the table by only
     * encoding when a column changes.  In addition simple patterns
     * like "every n'th offset for m times" can be encoded in a few
     * bytes.
     */
    uint32_t   rebase_off;	/* file offset to rebase info  */
    uint32_t   rebase_size;	/* size of rebase info   */
    
    /*
     * Dyld binds an image during the loading process, if the image
     * requires any pointers to be initialized to symbols in other images.  
     * The rebase information is a stream of byte sized 
     * opcodes whose symbolic names start with BIND_OPCODE_.
     * Conceptually the bind information is a table of tuples:
     *    <seg-index, seg-offset, type, symbol-library-ordinal, symbol-name, addend>
     * The opcodes are a compressed way to encode the table by only
     * encoding when a column changes.  In addition simple patterns
     * like for runs of pointers initialzed to the same value can be 
     * encoded in a few bytes.
     */
    uint32_t   bind_off;	/* file offset to binding info   */
    uint32_t   bind_size;	/* size of binding info  */
        
    /*
     * Some C++ programs require dyld to unique symbols so that all
     * images in the process use the same copy of some code/data.
     * This step is done after binding. The content of the weak_bind
     * info is an opcode stream like the bind_info.  But it is sorted
     * alphabetically by symbol name.  This enable dyld to walk 
     * all images with weak binding information in order and look
     * for collisions.  If there are no collisions, dyld does
     * no updating.  That means that some fixups are also encoded
     * in the bind_info.  For instance, all calls to "operator new"
     * are first bound to libstdc++.dylib using the information
     * in bind_info.  Then if some image overrides operator new
     * that is detected when the weak_bind information is processed
     * and the call to operator new is then rebound.
     */
    uint32_t   weak_bind_off;	/* file offset to weak binding info   */
    uint32_t   weak_bind_size;  /* size of weak binding info  */
    
    /*
     * Some uses of external symbols do not need to be bound immediately.
     * Instead they can be lazily bound on first use.  The lazy_bind
     * are contains a stream of BIND opcodes to bind all lazy symbols.
     * Normal use is that dyld ignores the lazy_bind section when
     * loading an image.  Instead the static linker arranged for the
     * lazy pointer to initially point to a helper function which 
     * pushes the offset into the lazy_bind area for the symbol
     * needing to be bound, then jumps to dyld which simply adds
     * the offset to lazy_bind_off to get the information on what 
     * to bind.  
     */
    uint32_t   lazy_bind_off;	/* file offset to lazy binding info */
    uint32_t   lazy_bind_size;  /* size of lazy binding infs */
    
    /*
     * The symbols exported by a dylib are encoded in a trie.  This
     * is a compact representation that factors out common prefixes.
     * It also reduces LINKEDIT pages in RAM because it encodes all  
     * information (name, address, flags) in one small, contiguous range.
     * The export area is a stream of nodes.  The first node sequentially
     * is the start node for the trie.  
     *
     * Nodes for a symbol start with a byte that is the length of
     * the exported symbol information for the string so far.
     * If there is no exported symbol, the byte is zero. If there
     * is exported info, it follows the length byte.  The exported
     * info normally consists of a flags and offset both encoded
     * in uleb128.  The offset is location of the content named
     * by the symbol.  It is the offset from the mach_header for
     * the image.  
     *
     * After the initial byte and optional exported symbol information
     * is a byte of how many edges (0-255) that this node has leaving
     * it, followed by each edge.
     * Each edge is a zero terminated cstring of the addition chars
     * in the symbol, followed by a uleb128 offset for the node that
     * edge points to.
     *  
     */
    uint32_t   export_off;	/* file offset to lazy binding info */
    uint32_t   export_size;	/* size of lazy binding infs */
};

例子:

 

dymanic load info

说明

举例

重定向数据 rebase

demo中该段数据位 11 22 10 51

11: 高位0x10表示设置立即数类型,低位0x01表示立即数类型为指针 

 

22: 表示REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB + 2 重定向到数据段2。结合上面的信息,就是重定向到数据段2,该段数据信息为一个指针

 

结合数据段2的数据,获得一个重定向符号指针[0x100001924 -> _NSLog]

绑定数据 bind

在demo中进行动态绑定依赖的dyld的函数

U dyld_stub_binder

弱绑定数据 weak bind

用于弱绑定动态库,就像weak_framework一样

 

懒绑定数据 lazy bind

对于需要从动态库加载的函数符号

demo中有一个:

_NSLog

export数据

用于对外开放的函数

demo中有:

0x10000442f   __mh_execute_header

0x100004453  myNSLog

0x10000445b main

LC_SYMTAB结构体

/*
 * The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
 * "stab" style symbol table information as described in the header files
 * <nlist.h> and <stab.h>.
 */
struct symtab_command {
	uint32_t	cmd;		/* LC_SYMTAB */
	uint32_t	cmdsize;	/* sizeof(struct symtab_command) */
	uint32_t	symoff;		/* symbol table offset */
	uint32_t	nsyms;		/* number of symbol table entries */
	uint32_t	stroff;		/* string table offset */
	uint32_t	strsize;	/* string table size in bytes */
};

例子:

 

 LC_LOAD_DYLINKER结构体

/*
 * A program that uses a dynamic linker contains a dylinker_command to identify
 * the name of the dynamic linker (LC_LOAD_DYLINKER).  And a dynamic linker
 * contains a dylinker_command to identify the dynamic linker (LC_ID_DYLINKER).
 * A file can have at most one of these.
 */
struct dylinker_command {
	uint32_t	cmd;		/* LC_ID_DYLINKER or LC_LOAD_DYLINKER */
	uint32_t	cmdsize;	/* includes pathname string */
	union lc_str    name;		/* dynamic linker's path name */
};

例子:

 

LC_UUID结构体 

/*
 * The uuid load command contains a single 128-bit unique random number that
 * identifies an object produced by the static link editor.
 */
struct uuid_command {
    uint32_t	cmd;		/* LC_UUID */
    uint32_t	cmdsize;	/* sizeof(struct uuid_command) */
    uint8_t	uuid[16];	/* the 128-bit uuid */
};

例子:

 LC_LOAD_DYLIB结构体

/*
 * Dynamicly linked shared libraries are identified by two things.  The
 * pathname (the name of the library as found for execution), and the
 * compatibility version number.  The pathname must match and the compatibility
 * number in the user of the library must be greater than or equal to the
 * library being used.  The time stamp is used to record the time a library was
 * built and copied into user so it can be use to determined if the library used
 * at runtime is exactly the same as used to built the program.
 */
struct dylib {
    union lc_str  name;			/* library's path name */
    uint32_t timestamp;			/* library's build time stamp */
    uint32_t current_version;		/* library's current version number */
    uint32_t compatibility_version;	/* library's compatibility vers number*/
};

/*
 * A dynamically linked shared library (filetype == MH_DYLIB in the mach header)
 * contains a dylib_command (cmd == LC_ID_DYLIB) to identify the library.
 * An object that uses a dynamically linked shared library also contains a
 * dylib_command (cmd == LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, or
 * LC_REEXPORT_DYLIB) for each library it uses.
 */
struct dylib_command {
	uint32_t	cmd;		/* LC_ID_DYLIB, LC_LOAD_{,WEAK_}DYLIB,
					   LC_REEXPORT_DYLIB */
	uint32_t	cmdsize;	/* includes pathname string */
	struct dylib	dylib;		/* the library identification */
};

例子:

 LC_RPATH结构体

/*
 * The rpath_command contains a path which at runtime should be added to
 * the current run path used to find @rpath prefixed dylibs.
 */
struct rpath_command {
    uint32_t	 cmd;		/* LC_RPATH */
    uint32_t	 cmdsize;	/* includes string */
    union lc_str path;		/* path to add to run path */
};

 例子:

Load Commands解析
segname cmd说明举例
LC_SEGMENT_64表示这是一个段加载命令,需要将它加载到对应的进程空间中 
LC_DYLD_INFO_ONLY动态库信息,根据该命令是真正动态库绑定,地址重定向重要的信息 
LC_SYMTAB符号表地址 
LC_DYSYMTAB动态符号表地址 
LC_LOAD_DYLINKER使用何种动态加载库,dyld的默认路径例子中用的是/usr/lib/dyld
LC_UUID文件的唯一标识,crash解析中也会有该只,去确定dysm文件和crash文件是匹配的 

LC_BUILD_VERSION

二进制文件要求的sdk的build版本

sdk 12.4

LC_SOURCE_VERSION构建该二进制文件使用的源代码版本例子中是 0.0

LC_MAIN

设置程序主线程的入口地址和栈大小例子中是入口地址:0x0000000000000EF0,栈大小:0
LC_LOAD_DYLIB加载额外的动态库,仔细看这个命令格式,动态库地址和名,当前版本号,兼容版本号,该设计比较合理,如果对于动态库有版本管理能力 
LC_RPATH

rpath_命令包含运行时应添加到的路径

用于查找前缀为@rpath的dylibs的当前运行路径

例子中为

@executable_path/Frameworks

LC_FUNCTION_STARTS函数起始地址表,指向了FUNCTION_STARTS的首地址例子中0x000044E0
LC_DATA_IN_CODE代码签名信息 

0xx3 DATA(Section64)        

Load Commands区域下来接着就是DATA区域,展开Load Commands下的LC_SEGMENT_64可以看到多个Section64,各个Section的具体信息可以在Load Commands紧接着的部分查看,它们是一一对应的:

section的数据结构如下:

/*
 * A segment is made up of zero or more sections.  Non-MH_OBJECT files have
 * all of their segments with the proper sections in each, and padded to the
 * specified segment alignment when produced by the link editor.  The first
 * segment of a MH_EXECUTE and MH_FVMLIB format file contains the mach_header
 * and load commands of the object file before its first section.  The zero
 * fill sections are always last in their segment (in all formats).  This
 * allows the zeroed segment padding to be mapped into memory where zero fill
 * sections might be. The gigabyte zero fill sections, those with the section
 * type S_GB_ZEROFILL, can only be in a segment with sections of this type.
 * These segments are then placed after all other segments.
 *
 * The MH_OBJECT format has all of its sections in one segment for
 * compactness.  There is no padding to a specified segment boundary and the
 * mach_header and load commands are not part of the segment.
 *
 * Sections with the same section name, sectname, going into the same segment,
 * segname, are combined by the link editor.  The resulting section is aligned
 * to the maximum alignment of the combined sections and is the new section's
 * alignment.  The combined sections are aligned to their original alignment in
 * the combined section.  Any padded bytes to get the specified alignment are
 * zeroed.
 *
 * The format of the relocation entries referenced by the reloff and nreloc
 * fields of the section structure for mach object files is described in the
 * header file <reloc.h>.
 */
struct section { /* for 32-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint32_t	addr;		/* memory address of this section */
	uint32_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
};

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint64_t	addr;		/* memory address of this section */
	uint64_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};

section节已经是最小的分类,大部分内容集中在__TEXT,__DATA这两段中,部分内容如下:

名称作用
TEXT.text只有可执行的机器码
TEXT.cstring去重后的C字符串
TEXT.const初始化过的常量
TEXT.stubs符号桩。本质上是一小段会直接跳入lazybinding的表对应项指针指向的地址的代码。
TEXT.stub_helper辅助函数。上述提到的lazybinding的表中对应项的指针在没有找到真正的符号地址的时候,都指向这。
TEXT.unwind_info用于存储处理异常情况信息
TEXT.eh_frame调试辅助信息
DATA.data初始化过的可变的数据
DATA.nl_symbol_ptr非lazy-binding的指针表,每个表项中的指针都指向一个在装载过程中,被动态链机器搜索完成的符号
DATA.la_symbol_ptrlazy-binding的指针表,每个表项中的指针一开始指向stub_helper
DATA.const没有初始化过的常量
DATA.mod_init_func初始化函数,在main之前调用
DATA.mod_term_func终止函数,在main返回之后调用
DATA.bss没有初始化的静态变量
DATA.common没有初始化过的符号声明

0xx4 Dynamic Loader Info

在:Loader Commands下的LC_DYLD_INFO_ONLY指向了它,分别有rebase info、Binding info、Weak Binding info、Lazy Binding info、Export Info。

几个重要的参数:

dymanic load info 
Rebase info重定向数据
Binding info进行动态绑定依赖的dyld的函数
Lazy Binding info需要从动态库加载的函数符号
Export info对外开放的函数

rebase修复的是指向当前镜像内部的资源指针(mach-o每次加载到内存中不是固定的地址)

binding就是将这个二进制调用的外部符号进行绑定的过程。 比如我们objc代码中需要使用到NSObject, 即符号OBJC_CLASS$_NSObject,但是这个符号又不在我们的二进制中,在系统库 Foundation.framework中,因此就需要binding这个操作将对应关系绑定到一起。

lazyBinding就是在加载动态库的时候不会立即binding, 当时当第一次调用这个方法的时候再实施binding。 做到的方法也很简单: 通过dyld_stub_binder 这个符号来做。 lazy binding的方法第一次会调用到dyld_stub_binder, 然后dyld_stub_binder负责找到真实的方法,并且将地址bind到桩上,下一次就不用再bind了。

weakBinding针对弱符号进行绑定的过程, 在c语言中,函数、初始化的全局变量、static变量是强符号,未初始化的全局变量是弱符号。 weakBinding这一步会把AllImages中所有含有弱符号的映像合并成一个列表,再进行binding. 所以这一步放在最后,等加载完成后在做操作。

Rebase Info表示重定位,数据是以命令码(命令码就是一个字节码)的形式,传递具体内容。高四位表示真正命令名,低四位表示一个立即数。00表示该类型命令结束.

定义的数据:

/*
 * The following are used to encode rebasing information
 */
#define REBASE_TYPE_POINTER					1
#define REBASE_TYPE_TEXT_ABSOLUTE32				2
#define REBASE_TYPE_TEXT_PCREL32				3

#define REBASE_OPCODE_MASK					0xF0
#define REBASE_IMMEDIATE_MASK					0x0F
#define REBASE_OPCODE_DONE					0x00
#define REBASE_OPCODE_SET_TYPE_IMM				0x10
#define REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB		0x20
#define REBASE_OPCODE_ADD_ADDR_ULEB				0x30
#define REBASE_OPCODE_ADD_ADDR_IMM_SCALED			0x40
#define REBASE_OPCODE_DO_REBASE_IMM_TIMES			0x50
#define REBASE_OPCODE_DO_REBASE_ULEB_TIMES			0x60
#define REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEB			0x70
#define REBASE_OPCODE_DO_REBASE_ULEB_TIMES_SKIPPING_ULEB	0x80


/*
 * The following are used to encode binding information
 */
#define BIND_TYPE_POINTER					1
#define BIND_TYPE_TEXT_ABSOLUTE32				2
#define BIND_TYPE_TEXT_PCREL32					3

#define BIND_SPECIAL_DYLIB_SELF					 0
#define BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE			-1
#define BIND_SPECIAL_DYLIB_FLAT_LOOKUP				-2

#define BIND_SYMBOL_FLAGS_WEAK_IMPORT				0x1
#define BIND_SYMBOL_FLAGS_NON_WEAK_DEFINITION			0x8

#define BIND_OPCODE_MASK					0xF0
#define BIND_IMMEDIATE_MASK					0x0F
#define BIND_OPCODE_DONE					0x00
#define BIND_OPCODE_SET_DYLIB_ORDINAL_IMM			0x10
#define BIND_OPCODE_SET_DYLIB_ORDINAL_ULEB			0x20
#define BIND_OPCODE_SET_DYLIB_SPECIAL_IMM			0x30
#define BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM		0x40
#define BIND_OPCODE_SET_TYPE_IMM				0x50
#define BIND_OPCODE_SET_ADDEND_SLEB				0x60
#define BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB			0x70
#define BIND_OPCODE_ADD_ADDR_ULEB				0x80
#define BIND_OPCODE_DO_BIND					0x90
#define BIND_OPCODE_DO_BIND_ADD_ADDR_ULEB			0xA0
#define BIND_OPCODE_DO_BIND_ADD_ADDR_IMM_SCALED			0xB0
#define BIND_OPCODE_DO_BIND_ULEB_TIMES_SKIPPING_ULEB		0xC0


/*
 * The following are used on the flags byte of a terminal node
 * in the export information.
 */
#define EXPORT_SYMBOL_FLAGS_KIND_MASK				0x03
#define EXPORT_SYMBOL_FLAGS_KIND_REGULAR			0x00
#define EXPORT_SYMBOL_FLAGS_KIND_THREAD_LOCAL			0x01
#define EXPORT_SYMBOL_FLAGS_WEAK_DEFINITION			0x04
#define EXPORT_SYMBOL_FLAGS_INDIRECT_DEFINITION			0x08
#define EXPORT_SYMBOL_FLAGS_HAS_SPECIALIZATIONS			0x10

如在这段中读到的第一个字节是0x11,那么这个命令掩码=0x10和立即数1

#define REBASE_OPCODE_SET_TYPE_IMM 0x10​ 说明当前命令是在选择重定位的数据的类型后面的立即数就是类型。

#define REBASE_TYPE_POINTER      1

#define REBASE_TYPE_TEXT_ABSOLUTE32      2

#define REBASE_TYPE_TEXT_PCREL32      3

​上面综合起来,就是设置当前的重定位类型为指针,这也是最多见的类型。

​这个重定位命令就结束了,再读取下一个字节,同样的方法判断命令掩码。

        如果是0x21,那么就是​REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB + 1。这个代表着定位重定位段到第1个段(通常是数据段),因为代码段只读,代码段的内容已经设计成位置无关(PIC position indepent code)​。后面还有一个ULEB类型的数据,代表在这个段中偏移多少。每当遇到DO_REBASE开头的命令,才会真正对上面一段关于重定位的信息来做重定位,具体怎么做我们不需要关心,编译器和链接器已经把哪些东西需要重定位写入了重定位表。

可以用dylyinfo获取这部分信息:xcrun dyldinfo -opcodes

rebase opcodes:
0x0000 REBASE_OPCODE_SET_TYPE_IMM(1)
0x0001 REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(2, 0x00000028)
0x0003 REBASE_OPCODE_DO_REBASE_ULEB_TIMES(19)
0x0005 REBASE_OPCODE_ADD_ADDR_IMM_SCALED(0x10)
0x0006 REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEB(32)
0x0008 REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEB(32)
0x000A REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEB(16)
0x000C REBASE_OPCODE_DO_REBASE_IMM_TIMES(4)
0x000D REBASE_OPCODE_ADD_ADDR_IMM_SCALED(0x20)
0x000E REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEB(56)
0x0010 REBASE_OPCODE_DO_REBASE_IMM_TIMES(6)
......

binding opcodes:
0x0000 BIND_OPCODE_SET_DYLIB_ORDINAL_IMM(3)
0x0001 BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(0x00, _OBJC_METACLASS_$_NSObject)
0x001D BIND_OPCODE_SET_TYPE_IMM(1)
0x001E BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(0x02, 0x00000DA0)
0x0021 BIND_OPCODE_DO_BIND_ADD_ADDR_IMM_SCALED(0x00000028)
0x0022 BIND_OPCODE_DO_BIND()
0x0023 BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(0x00, __objc_empty_cache)
0x0037 BIND_OPCODE_ADD_ADDR_ULEB(0xFFFFFFB8)
......

+ __stubs区和__stub_helper区是帮助动态链接器找到指定数据段__nl_symbol_ptr区,二进制文件用0x0000000000000000进行占位,在运行时,系统根据dynamic loader info信息,把占位符换为调用dylib的dyld_stub_binder函数的汇编指令。 

+ 当第一次调用完动态库中的符号后,动态链接器会根据dynamic loader info信息,把数据段__la_symbol_ptr指向正在的符号地址,而不是指向_nl_symbol_ptr区

0xx5 Function Starts、Symbol Table、Dynamic Symbol Table、String Table、Code Signature

Function Starts指明了函数的起始地址,描述(uleb128)、value的值

下面的按例子程序加载的流程说:

程序通过动态加载NSLog函数到DATA段的Lazy Symbol Pointers,(在加载之前vlaue值是不对的(站位的))加载后dyld将真实的NSLog的地址写入到Lazy Symbol Pointers中,这个表与Dynamic Symbol Table下的Indirect Symbols的表项一一对应,在Indirect Symbols表项的第一位找到NSLog函数的偏移0x79,然后在Symbol Table下的Symbols表的第0x79项找到NSLog函数,得到它在String Table中的索引(0x9b),然后在StringTable中找到NSLog字符串,在String Table定位:基址加偏移   0x4f1c+0x9b = 0x4fb7  ,在StringTable中找到对应位置0x4fb7

这就是这几个表之间的联系和作用,最后Code Signature是用来代码签名的文件。

可执行文件运行过程

根据上面总结。该段总结下可执行文件运行过程。 
我总结如下:

  • 解析mach-o文件
  • 设置运行环境参数 
    • 文本段VM映射参数
    • 加载命令
    • 动态库信息
    • 符号表地址信息
    • 动态符号表地址信息
    • 常亮字符串表地址信息
    • 动态库加载信息
    • 符号函数地址
    • 依赖动态库信息
    • 动态链接器地址信息
  • 根据动态库加载信息,把桩占位符,填写为指定调用_nl_symbol_ptr的汇编指令
  • 根据LC_MAIN的entry point调用指定entry offset偏移地址
  • 执行entry offset相关二进制(逻辑是按照汇编指令,进行运行)
  • 第一次运行到动态库函数时,进行一次懒加载动态绑定,并且动态链接器自动修改_la_symbol_ptr区的地址,指向动态库对应符号的地址
  • 第二次运行到动态库函数时,直接jmp到指定的符号地址

注意:系统很多动态库都是共有的,所以XOS做了共享库缓存优化,只要有相关进程使用过相关动态库,在另一进程,动态链接器在填桩时,直接会把桩_la_symbol_ptr区的地址,指向动态库对应符号的地址。

参考链接:

Mach-O文件格式和程序从加载到执行过程

解析Mach-O文件

macOS软件内幕

MAC系统中可执行文件格式(Mach-O)的学习(四)

Mach-O --- 动态库与静态库

深入剖析Macho (1)

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值