由于我们在上面回溯线程调用栈拿到的是一组地址,所以这里进行符号化的输入输出应该分别是地址和符号,接口设计类似如下:
- (NSString *)symbolicateAddress:(uintptr_t)addr;
不过在实际操作中,我们需要依赖于dyld相关方法和数据结构:
/*
* Structure filled in by dladdr().
*/
typedef struct dl_info {
const char *dli_fname; /* Pathname of shared object */
void *dli_fbase; /* Base address of shared object */
const char *dli_sname; /* Name of nearest symbol */
void *dli_saddr; /* Address of nearest symbol */
} Dl_info;
extern int dladdr(const void *, Dl_info *);
DESCRIPTION
These routines provide additional introspection of dyld beyond that provided by dlopen() and dladdr()
_dyld_image_count() returns the current number of images mapped in by dyld. Note that using this count
to iterate all images is not thread safe, because another thread may be adding or removing images dur-ing during
ing the iteration.
_dyld_get_image_header() returns a pointer to the mach header of the image indexed by image_index. If
image_index is out of range, NULL is returned.
_dyld_get_image_vmaddr_slide() returns the virtural memory address slide amount of the image indexed by
image_index. If image_index is out of range zero is returned.
_dyld_get_image_name() returns the name of the image indexed by image_index. The C-string continues to
be owned by dyld and should not deleted. If image_index is out of range NULL is returned.
又为了要判断此次解析是否成功,所以接口设计演变为:
bool jdy_symbolicateAddress(const uintptr_t addr, Dl_info *info)
Dl_info用来填充解析的结果。
3. 算法思路
对一个地址进行符号化解析说起来也是比较直接的,就是找到地址所属的内存镜像,然后定位该镜像中的符号表,最后从符号表中匹配目标地址的符号。
(图片来源于苹果官方文档)
以下思路是描述一个大致的方向,并没有涵盖具体的细节,比如基于ASLR的偏移量
// 基于ASLR的偏移量https://en.wikipedia.org/wiki/Address_space_layout_randomization
/**
* When the dynamic linker loads an image,
* the image must be mapped into the virtual address space of the process at an unoccupied address.
* The dynamic linker accomplishes this by adding a value "the virtual memory slide amount" to the base address of the image.
*/
3.1 寻找包含地址的目标镜像
起初看到一个API还有点小惊喜,可惜iPhone上用不了:
extern bool _dyld_image_containing_address(const void* address)
__OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_3,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);
所以得自己来判断。
怎么判断呢?
A segment defines a range of bytes in a Mach-O file and the addresses and memory protection attributes at which those bytes are mapped into virtual memory when the dynamic linker loads the application. As such, segments are always virtual memory page aligned. A segment contains zero or more sections.
通过遍历每个段,判断目标地址是否落在该段包含的范围内:
/*
* The segment load command indicates that a part of this file is to be
* mapped into the task's address space. The size of this segment in memory,
* vmsize, maybe equal to or larger than the amount to map from this file,
* filesize. The file is mapped starting at fileoff to the beginning of
* the segment in memory, vmaddr. The rest of the memory of the segment,
* if any, is allocated zero fill on demand. The segment's maximum virtual
* memory protection and initial virtual memory protection are specified
* by the maxprot and initprot fields. If the segment has sections then the
* section structures directly follow the segment command and their size is
* reflected in cmdsize.
*/
struct segment_command { /* for 32-bit architectures */
uint32_t cmd; /* LC_SEGMENT */
uint32_t cmdsize; /* includes sizeof section structs */
char segname[16]; /* segment name */
uint32_t vmaddr; /* memory address of this segment */
uint32_t vmsize; /* memory size of this segment */
uint32_t fileoff; /* file offset of this segment */
uint32_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
/**
* @brief 判断某个segment_command是否包含addr这个地址,基于segment的虚拟地址和段大小来判断
*/
bool jdy_segmentContainsAddress(const struct load_command *cmdPtr, const uintptr_t addr) {
if (cmdPtr->cmd == LC_SEGMENT) {
struct segment_command *segPtr = (struct segment_command *)cmdPtr;
if (addr >= segPtr->vmaddr && addr < (segPtr->vmaddr + segPtr->vmsize)) {
return true;
}
这样一来,我们就可以找到包含目标地址的镜像文件了。
3.2 定位目标镜像的符号表
由于符号的收集和符号表的创建贯穿着编译和链接阶段,这里就不展开了,而是只要确定除了代码段_TEXT和数据段DATA外,还有个_LINKEDIT段包含符号表:
The __LINKEDIT segment contains raw data used by the dynamic linker, such as symbol, string, and relocation table entries.
所以现在我们需要先定位到__LINKEDIT段,同样摘自苹果官方文档:
Segments and sections are normally accessed by name. Segments, by convention, are named using all uppercase letters preceded by two underscores (for example, _TEXT); sections should be named using all lowercase letters preceded by two underscores (for example, _text). This naming convention is standard, although not required for the tools to operate correctly.
我们通过遍历每个段,比较段名称是否和__LINKEDIT相同:
usr/include/mach-o/loader.h
#define SEG_LINKEDIT "__LINKEDIT"
接着来找符号表:
/**
* 摘自《The Mac Hacker's Handbook》:
* The LC_SYMTAB load command describes where to find the string and symbol tables within the __LINKEDIT segment. The offsets given are file offsets, so you subtract the file offset of the __LINKEDIT segment to obtain the virtual memory offset of the string and symbol tables. Adding the virtual memory offset to the virtual-memory address where the __LINKEDIT segment is loaded will give you the in-memory location of the string and sym- bol tables.
*/
也就是说,我们需要结合__LINKEDIT segment_command(见上面结构描述)和LC_SYMTAB load_command(见下面结构描述)来定位符号表
/*
* The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
* "stab" style symbol table information as described in the header files
* <nlist.h> and <stab.h>.
*/
struct symtab_command {
uint32_t cmd; /* LC_SYMTAB */
uint32_t cmdsize; /* sizeof(struct symtab_command) */
uint32_t symoff; /* symbol table offset */
uint32_t nsyms; /* number of symbol table entries */
uint32_t stroff; /* string table offset */
uint32_t strsize; /* string table size in bytes */
};
通过以上可以知道,LC_SYMTAB 描述了string表和symbol表在__LINKEDIT中的位置,而symbol表描述了符号的地址信息,以及符号对应的字符串(函数名)在string表中的位置。所以要找到栈地址对应的字符串分为下面几步:
- 假如当前的栈地址为address
- ASLR slide 为imageVMAddrSlide = (uintptr_t)_dyld_get_image_vmaddr_slide(idx);
- addressWithSlide = address - imageVMAddrSlide;
- 拿到目标镜像中__LINKEDIT的基址,由于获取基址是通过镜像内部计算的,所以此时的segmentBase不是加载进内存的基址。要想得到载入内存的基址还要加上ASLR slide。此处记为segmentBase;
- 在LC_SYMTAB拿到符号表的偏移 + segmentBase 得到symbolTable,即BS_NLIST* symbolTable = (BS_NLIST*)(segmentBase + symtabCmd->symoff);
- 同理 stringTable = segmentBase + symtabCmd->stroff;
- addr >= symbol.value; 因为addr是某个函数中的一条指令地址,它应该大于等于这个函数的入口地址,也就是对应符号的值;symbol.value is nearest to addr; 离指令地址addr更近的函数入口地址,才是更准确的匹配项;所以遍历symbolTable获取所有的symbol.value 与addressWithSlide比较,得到一个最接近于addressWithSlide 的symbol.value
- info->dli_saddr = (void*)(bestMatch->n_value + imageVMAddrSlide);得到最接近于address的符号地址(symbol address)address - symbol address = slide,这里的slide正是crash 堆栈中的slide
- 得到符号对应的字符串(函数名) info->dli_sname = (char*)((intptr_t)stringTable + (intptr_t)bestMatch->n_un.n_strx);
下面是__LINKEDIT的基址代码:
uintptr_t bs_segmentBaseOfImageIndex(const uint32_t idx) {
const struct mach_header* header = _dyld_get_image_header(idx);
// Look for a segment command and return the file image address.
uintptr_t cmdPtr = bs_firstCmdAfterHeader(header);
if(cmdPtr == 0) {
return 0;
}
for(uint32_t i = 0;i < header->ncmds; i++) {
const struct load_command* loadCmd = (struct load_command*)cmdPtr;
/*
The __LINKEDIT segment contains raw data used by the dynamic linker, such as symbol, string, and relocation table entries.
*/
if(loadCmd->cmd == LC_SEGMENT) {
const struct segment_command* segmentCmd = (struct segment_command*)cmdPtr;
if(strcmp(segmentCmd->segname, SEG_LINKEDIT) == 0) {
return segmentCmd->vmaddr - segmentCmd->fileoff;
}
}
else if(loadCmd->cmd == LC_SEGMENT_64) {
const struct segment_command_64* segmentCmd = (struct segment_command_64*)cmdPtr;
if(strcmp(segmentCmd->segname, SEG_LINKEDIT) == 0) {
return (uintptr_t)(segmentCmd->vmaddr - segmentCmd->fileoff);
}
}
cmdPtr += loadCmd->cmdsize;
}
return 0;
}
下面是符号化的代码:
bool bs_dladdr(const uintptr_t address, Dl_info* const info) {
info->dli_fname = NULL;
info->dli_fbase = NULL;
info->dli_sname = NULL;
info->dli_saddr = NULL;
const uint32_t idx = bs_imageIndexContainingAddress(address);
if(idx == UINT_MAX) {
return false;
}
const struct mach_header* header = _dyld_get_image_header(idx);
//ASLR slide
const uintptr_t imageVMAddrSlide = (uintptr_t)_dyld_get_image_vmaddr_slide(idx);
const uintptr_t addressWithSlide = address - imageVMAddrSlide;
//在内存中的基址
const uintptr_t segmentBase = bs_segmentBaseOfImageIndex(idx) + imageVMAddrSlide;
if(segmentBase == 0) {
return false;
}
info->dli_fname = _dyld_get_image_name(idx);
info->dli_fbase = (void*)header;
// Find symbol tables and get whichever symbol is closest to the address.
const BS_NLIST* bestMatch = NULL;
uintptr_t bestDistance = ULONG_MAX;
uintptr_t cmdPtr = bs_firstCmdAfterHeader(header);
if(cmdPtr == 0) {
return false;
}
for(uint32_t iCmd = 0; iCmd < header->ncmds; iCmd++) {
const struct load_command* loadCmd = (struct load_command*)cmdPtr;
if(loadCmd->cmd == LC_SYMTAB) {
const struct symtab_command* symtabCmd = (struct symtab_command*)cmdPtr;
const BS_NLIST* symbolTable = (BS_NLIST*)(segmentBase + symtabCmd->symoff);
const uintptr_t stringTable = segmentBase + symtabCmd->stroff;
for(uint32_t iSym = 0; iSym < symtabCmd->nsyms; iSym++) {
// If n_value is 0, the symbol refers to an external object.
if(symbolTable[iSym].n_value != 0) {
// 1. addr >= symbol.value; 因为addr是某个函数中的一条指令地址,它应该大于等于这个函数的入口地址,也就是对应符号的值;
// 2. symbol.value is nearest to addr; 离指令地址addr更近的函数入口地址,才是更准确的匹配项;
uintptr_t symbolBase = symbolTable[iSym].n_value;
uintptr_t currentDistance = addressWithSlide - symbolBase;
if((addressWithSlide >= symbolBase) &&
(currentDistance <= bestDistance)) {
bestMatch = symbolTable + iSym;
bestDistance = currentDistance;
}
}
}
if(bestMatch != NULL) {
info->dli_saddr = (void*)(bestMatch->n_value + imageVMAddrSlide);
if(bestMatch->n_desc == 16)
{
// This image has been stripped. The name is meaningless, and
// almost certainly resolves to "_mh_execute_header"
info->dli_sname = NULL;
}
else
{
info->dli_sname = (char*)((intptr_t)stringTable + (intptr_t)bestMatch->n_un.n_strx);
if(*info->dli_sname == '_')
{
info->dli_sname++;
}
}
break;
}
}
cmdPtr += loadCmd->cmdsize;
}
return true;
}
这里要介绍一个东西:ASLR,全称为 Address Space Layout Randomization,地址空间布局随机化。ASLR 技术在 2005 年的 kernel 2.6.12 中被引入到 Linux 系统,它将进程的某些内存空间地址进行随机化来增大入侵者预测目的地址的难度,从而降低进程被成功入侵的风险。当前 Linux、Windows 等主流操作系统都已经采用该项技术。
在很多博客中看到 symbol address = stack address - slide; 很多人说这个slide是ASLR的偏移量,这里我有一个疑问
因为ASLR slide = _dyld_get_image_vmaddr_slide(idx); 可以知道 对于任意一个镜像的ASLR slide是不变的,而我们的crash 堆栈中 同一个镜像的slide是不一样的,所以说上面的slide并非ASLR slide。如下:UIKit的slide是不一样的
0 BSBacktraceLogger 0x0000000102bfcc98 -[ViewController viewDidLoad] + 152
1 UIKit 0x000000018df7eee0 <redacted> + 1020
2 UIKit 0x000000018df7eacc <redacted> + 28
3 UIKit 0x000000018df6fd60 <redacted> + 136
4 UIKit 0x000000018df6eb94 <redacted> + 272
5 UIKit 0x000000018dffc6a8 <redacted> + 48
6 UIKit 0x000000018df722f0 <redacted> + 3660
7 UIKit 0x000000018df3f65c <redacted> + 1680
8 UIKit 0x000000018e56fa0c <redacted> + 784
9 UIKit 0x000000018df3ee4c <redacted> + 160
10 UIKit 0x000000018df3ece8 <redacted> + 240
11 UIKit 0x000000018df3db78 <redacted> + 724
12 UIKit 0x000000018ebd372c <redacted> + 296
13 UIKit 0x000000018df3d268 <redacted> + 432
14 UIKit 0x000000018e9b89b8 <redacted> + 220
15 UIKit 0x000000018eb06ae8 _performActionsWithDelayForTransitionContext + 112
16 UIKit 0x000000018df3cc88 <redacted> + 248
17 UIKit 0x000000018df3c624 <redacted> + 368
18 UIKit 0x000000018df3965c <redacted> + 540
19 UIKit 0x000000018df393ac <redacted> + 364
20 FrontBoardServices 0x0000000186ba0470 <redacted> + 364
21 FrontBoardServices 0x0000000186ba8d6c <redacted> + 224
22 libdispatch.dylib 0x0000000102f6d220 _dispatch_client_callout + 16
23 libdispatch.dylib 0x0000000102f79850 _dispatch_block_invoke_direct + 232
24 FrontBoardServices 0x0000000186bd4878 <redacted> + 36
25 FrontBoardServices 0x0000000186bd451c <redacted> + 404
26 FrontBoardServices 0x0000000186bd4ab8 <redacted> + 56
27 CoreFoundation 0x000000018434b404 <redacted> + 24
28 CoreFoundation 0x000000018434ac2c <redacted> + 276
29 CoreFoundation 0x000000018434879c <redacted> + 1204
30 CoreFoundation 0x0000000184268da8 CFRunLoopRunSpecific + 552
31 GraphicsServices 0x000000018624b020 GSEventRunModal + 100
32 UIKit 0x000000018e24978c UIApplicationMain + 236
33 BSBacktraceLogger 0x0000000102bfd098 main + 124
34 libdyld.dylib 0x0000000183cf9fc0 <redacted> + 4
所以要得到最终的实际内存地址,还需要加上基于ASLR的偏移量
info->dli_saddr = (void*)(bestMatch->n_value + imageVMAddrSlide);
*
* This is the symbol table entry structure for 64-bit architectures.
*/
struct nlist_64 {
union {
uint32_t n_strx; /* index into the string table */
} n_un;
uint8_t n_type; /* type flag, see below */
uint8_t n_sect; /* section number or NO_SECT */
uint16_t n_desc; /* see <mach-o/stab.h> */
uint64_t n_value; /* value of this symbol (or stab offset) */
};
找到匹配的nlist结构后,我们可以通过.n_un.n_strx来定位字符串表中相应的符号名。