mach-o格式分析

最新推荐文章于 2022-11-18 22:17:02 发布

1212424684

最新推荐文章于 2022-11-18 22:17:02 发布

阅读量920

点赞数

0x00 摘要

人生无根蒂，飘如陌上尘。分散逐风转，此已非常身。

— 陶渊明《杂诗》

mach-o格式是OS X系统上的可执行文件格式，类似于windows的PE与linux的ELF，如果不彻底搞清楚mach-o的格式与相关知识，去做其他研究，无异于建造空中阁楼。

每个Mach-O文件斗包含一个Mach-O头，然后是载入命令(Load Commands),最后是数据块(Data)。

接下来就对整个Mach-O的格式做出详细的分析。

0x01 Mach-O格式简单介绍

Mach-O文件的格式如下图所示：

又如下几个部分组成：

Header：保存了Mach-O的一些基本信息，包括了平台、文件类型、LoadCommands的个数等等。
LoadCommands：这一段紧跟Header，加载Mach-O文件时会使用这里的数据来确定内存的分布。
Data：每一个segment的具体数据都保存在这里，这里包含了具体的代码、数据等等。

0x02 Headers

2.1 数据结构

Headers的定义可以在开源的内核代码中找到。

/*
 * The 32-bit mach header appears at the very beginning of the object file for
 * 32-bit architectures.
 */
struct mach_header {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
};

/* Constant for the magic field of the mach_header (32-bit architectures) */
#define	MH_MAGIC	0xfeedface	/* the mach magic number */
#define MH_CIGAM	0xcefaedfe	/* NXSwapInt(MH_MAGIC) */

/*
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
	uint32_t	reserved;	/* reserved */
};

/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */

根据mach_header与mach_header_64的定义，很明显可以看出，Headers的主要作用就是帮助系统迅速的定位Mach-O文件的运行环境，文件类型。

2.2 实例

使用工具分析一个mach-o文件来具体的看一下Mach-O Headers。

通过otool可以得到Mach header的具体的情况，但是可读性略微有一点差。

➜  bin otool -h git
git:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x80           2    17       1432 0x00200085

还有一个工具是MachOview可以看的更清楚一点。

MagicNumber的值为0xFEEDFACF所以该文件是一个64位平台上的文件
CPU Type和CPU SubType也很容易理解，运行在X86_64的CPU平台上
File Type标示了该文件是一个可执行文件，后面具体分析
Flags标示了这个MachO文件的四个特性，后面具体分析

2.3 具体参数

2.3.1 FileType

因为Mach-O文件不仅仅用来实现可执行文件，同时还用来实现了其他内容

内核扩展
库文件
CoreDump
…

他的源码定义如下：

#define	MH_OBJECT	0x1		/* relocatable object file */
#define	MH_EXECUTE	0x2		/* demand paged executable file */
#define	MH_FVMLIB	0x3		/* fixed VM shared library file */
#define	MH_CORE		0x4		/* core file */
#define	MH_PRELOAD	0x5		/* preloaded executable file */
#define	MH_DYLIB	0x6		/* dynamically bound shared library */
#define	MH_DYLINKER	0x7		/* dynamic link editor */
#define	MH_BUNDLE	0x8		/* dynamically bound bundle file */
#define	MH_DYLIB_STUB	0x9		/* shared library stub for static */
					/*  linking only, no section contents */
#define	MH_DSYM		0xa		/* companion file with only debug */
					/*  sections */
#define	MH_KEXT_BUNDLE	0xb		/* x86_64 kexts */

解释一下一些常用到的文件类型。

File Type	用处	例子
MH_OBJECT	编译过程中产生的*.obj文件	gcc -c xxx.c 生成xxx.o文件
MH_EXECUTABLE	可执行二进制文件	/usr/bin/git
MH_CORE	CoreDump	崩溃时的Dump文件
MH_DYLIB	动态库	/usr/lib/里面的那些库文件
MH_DYLINKER	连接器linker	/usr/lib/dyld文件
MH_KEXT_BUNDLE	内核扩展文件	自己开发的简单内核模块

2.3.2 flags

Mach-O headers还包含了一些很重要的dyld的加载参数。代码中的定义如下：

#define	MH_INCRLINK	0x2		/* the object file is the output of an
					   incremental link against a base file
					   and can't be link edited again */
#define MH_DYLDLINK	0x4		/* the object file is input for the
					   dynamic linker and can't be staticly
					   link edited again */
#define MH_BINDATLOAD	0x8		/* the object file's undefined
					   references are bound by the dynamic
					   linker when loaded. */
#define MH_PREBOUND	0x10		/* the file has its dynamic undefined
					   references prebound. */
#define MH_SPLIT_SEGS	0x20		/* the file has its read-only and
					   read-write segments split */
#define MH_LAZY_INIT	0x40		/* the shared library init routine is
					   to be run lazily via catching memory
					   faults to its writeable segments
					   (obsolete) */
#define MH_TWOLEVEL	0x80		/* the image is using two-level name
					   space bindings */
...
//太长，有兴趣可以自己看源码
// EXTERNAL_HEADERS/mach-o/x86_64/loader.h

同样简单的介绍几个比较重要的。

Flag Type	含义
MH_NOUNDEFS	目标没有未定义的符号，不存在链接依赖
MH_DYLDLINK	该目标文件是dyld的输入文件，无法被再次的静态链接
MH_PIE	允许随机的地址空间
MH_ALLOW_STACK_EXECUTION	栈内存可执行代码，一般是默认关闭的。
MH_NO_HEAP_EXECUTION	堆内存无法执行代码

2.4 Headers小结

0x03 Load Commands

这是load_command的数据结构

struct load_command {
	uint32_t cmd;		/* type of load command */
	uint32_t cmdsize;	/* total size of command in bytes */
};

Load Commands 直接就跟在Header后面，所有command占用内存的总和在Mach-O Header里面已经给出了。在加载过Header之后就是通过解析LoadCommand来加载接下来的数据了。我简单的看了一下内核中是如何解析macho数据的，抛开内核的实现细节，逻辑其实也十分简单。

static
load_return_t
parse_machfile(
	struct vnode 		*vp,       
	vm_map_t		map,
	thread_t		thread,
	struct mach_header	*header,
	off_t			file_offset,
	off_t			macho_size,
	int			depth,
	int64_t			aslr_offset,
	int64_t			dyld_aslr_offset,
	load_result_t		*result
)
{
	[...] //此处省略大量初始化与检测

		/*
		 * Loop through each of the load_commands indicated by the
		 * Mach-O header; if an absurd value is provided, we just
		 * run off the end of the reserved section by incrementing
		 * the offset too far, so we are implicitly fail-safe.
		 */
		offset = mach_header_sz;
		ncmds = header->ncmds;

		while (ncmds--) {
			/*
			 *	Get a pointer to the command.
			 */
			lcp = (struct load_command *)(addr + offset);
			//lcp设为当前要解析的cmd的地址
			oldoffset = offset;
			//oldoffset是从macho文件内存开始的地方偏移到当前command的偏移量
			offset += lcp->cmdsize;
			//重新计算offset，再加上当前command的长度，offset的值为文件内存起始地址到下一个command的偏移量
			/*
			 * Perform prevalidation of the struct load_command
			 * before we attempt to use its contents.  Invalid
			 * values are ones which result in an overflow, or
			 * which can not possibly be valid commands, or which
			 * straddle or exist past the reserved section at the
			 * start of the image.
			 */
			if (oldoffset > offset ||
			    lcp->cmdsize < sizeof(struct load_command) ||
			    offset > header->sizeofcmds + mach_header_sz) {
				ret = LOAD_BADMACHO;
				break;
			}
			//做了一个检测，与如何加载进入内存无关

			/*
			 * Act on struct load_command's for which kernel
			 * intervention is required.
			 */
			switch(lcp->cmd) {
			case LC_SEGMENT:
				[...]
				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);
				break;
			case LC_SEGMENT_64:
				[...]
				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);
				break;
			case LC_UNIXTHREAD:
				if (pass != 1)
					break;
				ret = load_unixthread(
						 (struct thread_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			case LC_MAIN:
				if (pass != 1)
					break;
				if (depth != 1)
					break;
				ret = load_main(
						 (struct entry_point_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			case LC_LOAD_DYLINKER:
				if (pass != 3)
					break;
				if ((depth == 1) && (dlp == 0)) {
					dlp = (struct dylinker_command *)lcp;
					dlarchbits = (header->cputype & CPU_ARCH_MASK);
				} else {
					ret = LOAD_FAILURE;
				}
				break;
			case LC_UUID:
				if (pass == 1 && depth == 1) {
					ret = load_uuid((struct uuid_command *) lcp,
							(char *)addr + mach_header_sz + header->sizeofcmds,
							result);
				}
				break;
			case LC_CODE_SIGNATURE:
				[...]
				ret = load_code_signature(
					(struct linkedit_data_command *) lcp,
					vp,
					file_offset,
					macho_size,
					header->cputype,
					result);
				[...]
				break;
#if CONFIG_CODE_DECRYPTION
			case LC_ENCRYPTION_INFO:
			case LC_ENCRYPTION_INFO_64:
				if (pass != 3)
					break;
				ret = set_code_unprotect(
					(struct encryption_info_command *) lcp,
					addr, map, slide, vp, file_offset,
					header->cputype, header->cpusubtype);
				if (ret != LOAD_SUCCESS) {
					printf("proc %d: set_code_unprotect() error %d "
					       "for file \"%s\"\n",
					       p->p_pid, ret, vp->v_name);
					/* 
					 * Don't let the app run if it's 
					 * encrypted but we failed to set up the
					 * decrypter. If the keys are missing it will
					 * return LOAD_DECRYPTFAIL.
					 */
					 if (ret == LOAD_DECRYPTFAIL) {
						/* failed to load due to missing FP keys */
						proc_lock(p);
						p->p_lflag |= P_LTERM_DECRYPTFAIL;
						proc_unlock(p);
					 }
					 psignal(p, SIGKILL);
				}
				break;
#endif
			default:
				/* Other commands are ignored by the kernel */
				ret = LOAD_SUCCESS;
				break;
			}
			if (ret != LOAD_SUCCESS)
				break;
		}
		if (ret != LOAD_SUCCESS)
			break;
	}

	[...] //此处略去加载之后的处理代码
}

3.1cmdsize字段

这里主要看while循环刚刚进入的时候几行代码,来理解是如何通过load_command的cmd字段来解析Macho文件的数据。

...
lcp = (struct load_command *)(addr + offset);
//lcp设为当前要解析的cmd的地址
oldoffset = offset;
//oldoffset是从macho文件内存开始的地方偏移到当前command的偏移量
offset += lcp->cmdsize;
//重新计算offset，再加上当前command的长度，offset的值为文件内存起始地址到下一个command的偏移量
...

3.2 cmd字段

switch(lcp->cmd) {
			case LC_SEGMENT:
				[...]
				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);
				break;
			case LC_SEGMENT_64:
				[...]
				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);
				break;
			case LC_UNIXTHREAD:
				if (pass != 1)
					break;
				ret = load_unixthread(
						 (struct thread_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			case LC_MAIN:
				if (pass != 1)
					break;
				if (depth != 1)
					break;
				ret = load_main(
						 (struct entry_point_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			case LC_LOAD_DYLINKER:
				if (pass != 3)
					break;
				if ((depth == 1) && (dlp == 0)) {
					dlp = (struct dylinker_command *)lcp;
					dlarchbits = (header->cputype & CPU_ARCH_MASK);
				} else {
					ret = LOAD_FAILURE;
				}
				break;
			case LC_UUID:
				if (pass == 1 && depth == 1) {
					ret = load_uuid((struct uuid_command *) lcp,
							(char *)addr + mach_header_sz + header->sizeofcmds,
							result);
				}
				break;
			case LC_CODE_SIGNATURE:
				[...]
				ret = load_code_signature(
					(struct linkedit_data_command *) lcp,
					vp,
					file_offset,
					macho_size,
					header->cputype,
					result);
				[...]
				break;
#if CONFIG_CODE_DECRYPTION
			case LC_ENCRYPTION_INFO:
			case LC_ENCRYPTION_INFO_64:
				if (pass != 3)
					break;
				ret = set_code_unprotect(
					(struct encryption_info_command *) lcp,
					addr, map, slide, vp, file_offset,
					header->cputype, header->cpusubtype);
				if (ret != LOAD_SUCCESS) {
					printf("proc %d: set_code_unprotect() error %d "
					       "for file \"%s\"\n",
					       p->p_pid, ret, vp->v_name);
					/* 
					 * Don't let the app run if it's 
					 * encrypted but we failed to set up the
					 * decrypter. If the keys are missing it will
					 * return LOAD_DECRYPTFAIL.
					 */
					 if (ret == LOAD_DECRYPTFAIL) {
						/* failed to load due to missing FP keys */
						proc_lock(p);
						p->p_lflag |= P_LTERM_DECRYPTFAIL;
						proc_unlock(p);
					 }
					 psignal(p, SIGKILL);
				}
				break;
#endif
			default:
				/* Other commands are ignored by the kernel */
				ret = LOAD_SUCCESS;
				break;
			}

从这一段代码可以看出，根据cmd字段的类型不同，使用了不同的函数来加载。简单的列出一张表看一看在内核代码中不同的command类型都有哪些作用。

Command类型	处理函数	用途
LC_SEGMENT；LC_SEGMENT_64	load_segment	将segment中的数据加载并映射到进程的内存空间去
LC_LOAD_DYLINKER	load_dylinker	调用/usr/lib/dyld程序
LC_UUID	load_uuid	加载128-bit的唯一ID
LC_THREAD	load_thread	开启一个MACH线程，但是不分配栈空间。
LC_UNIXTHREAD	load_unixthread	开启一个UNIX线程
LC_CODE_SIGNATURE	load_code_signature	进行数字签名
LC_ENCRYPTION_INFO	set_code_unprotect	加密二进制文件

0x04 Segment&Section

加载数据时，主要加载的就是LC_SEGMET活着LC_SEGMENT_64。其他的Segment的用途在上一节已经简单的介绍了，这里不做深究。

LCSEGMENT以及LC_SEGMENT_64的数据结构是这样的。


struct segment_command { /* for 32-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT */
	uint32_t	cmdsize;	/* includes sizeof section structs */
	char		segname[16];	/* segment name */
	uint32_t	vmaddr;		/* memory address of this segment */
	uint32_t	vmsize;		/* memory size of this segment */
	uint32_t	fileoff;	/* file offset of this segment */
	uint32_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};


struct segment_command_64 { /* for 64-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT_64 */
	uint32_t	cmdsize;	/* includes sizeof section_64 structs */
	char		segname[16];	/* segment name */
	uint64_t	vmaddr;		/* memory address of this segment */
	uint64_t	vmsize;		/* memory size of this segment */
	uint64_t	fileoff;	/* file offset of this segment */
	uint64_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};

可以看出，这里大部分的数据是用来帮助内核将Segment映射到虚拟内存的。主要要关注的是nsects

字段，标示了Segment中有多少secetion。section是具体有用的数据存放的地方。

Section的数据结构如下：

struct section { /* for 32-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint32_t	addr;		/* memory address of this section */
	uint32_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
};

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint64_t	addr;		/* memory address of this section */
	uint64_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};

除了同样有帮助内存映射的变量外，在了解Mach-O格式的时候，只需要知道不同的Section有着不同的作用就可以了。

Section	作用
__text	代码
__cstring	硬编码的字符串
__const	const 关键词修饰过的变量
__DATA.__bss	bss段

因为section类型已经是最小的分类了，还有更多复杂section段就不一一例举了，遇到没见过的section类型可以自行查找Apple文档。

0x05 小结

通过对Mach-O格式的仔细分析，可以更好的理解Mach-O文件的加载过程，为研究dyld或者其他OS X系统下的模块打好基础。

参考

1.mach-o文件加载的全过程(1)

http://dongaxis.github.io/2015/01/01/mac-o%E6%96%87%E4%BB%B6%E5%8A%A0%E8%BD%BD%E7%9A%84%E5%85%A8%E8%BF%87%E7%A8%8B-1/

2.Mach-O 可执行文件

http://objccn.io/issue-6-3/

3.iPhone Mach-O文件格式与代码签名

http://zhiwei.li/text/2012/02/15/iphone-mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E4%B8%8E%E4%BB%A3%E7%A0%81%E7%AD%BE%E5%90%8D/

4.Dynamic Linking of Imported Functions in Mach-O

http://www.codeproject.com/Articles/187181/Dynamic-Linking-of-Imported-Functions-in-Mach-O

5.otool详解Mach-o文件头部

http://www.mc2lab.com/?p=68

PS:

希望可以多多交流，不足之处还希望大家可以给与指正：）

原文: http://turingh.github.io/2016/03/07/mach-o文件格式分析/　　作者: mrh

1212424684

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mach-o格式分析

0x00 摘要人生无根蒂，飘如陌上尘。分散逐风转，此已非常身。— 陶渊明《杂诗》mach-o格式是OS X系统上的可执行文件格式，类似于windows的PE与linux的ELF，如果不彻底搞清楚mach-o的格式与相关知识，去做其他研究，无异于建造空中阁楼。每个Mach-O文件斗包含一个Mach-O头，然后是载入命令(Load Commands),最
复制链接

扫一扫