iOS开发之runtime：static_init()讲解-CSDN博客

本系列博客是本人的源码阅读笔记，如果有iOS开发者在看runtime的，欢迎大家多多交流。为了方便讨论，本人新建了一个微信群(iOS技术讨论群)，想要加入的，请添加本人微信：zhujinhui207407，【加我前请备注：ios 】，本人博客http://www.kyson.cn 也在不停的更新中，欢迎一起讨论

本文完整版详见笔者小专栏：xiaozhuanlan.com/runtime

背景

在文件objc-runtime-new.m中，给如下代码打个断点：

可以看到调用栈中有如下函数：

static_init()
复制代码

以及

_objc_init()
复制代码

这是我们很熟悉的两个方法：_objc_init()是上篇文章中说的，static_init()方法是在_objc_init()中被调用的，其定义如下：

/***********************************************************************
* static_init
* Run C++ static constructor functions.
* libc calls _objc_init() before dyld would call our static constructors, 
* so we have to do it ourselves.
**********************************************************************/
static void static_init()
{
    size_t count;
    Initializer *inits = getLibobjcInitializers(&_mh_dylib_header, &count);
    for (size_t i = 0; i < count; i++) {
        inits[i]();
    }
}
复制代码

通过其注释，我们大概知道static_init函数的作用是运行C++的静态构造函数。其原因在于dyld调用我们的静态构造函数晚于libc调用_objc_init函数。这句话咋一看比较难理解，更让人难以理解的是，在断点前并不是static_init函数，而是一个方法：_GLOBAL__sub_I_objc_runtime_new，笔者进入该断点看到如下内容：

可以看到，里面有好多类似于

__cxx_global_var_init
复制代码

的方法。那么，这些方法又是做什么的呢，这是本文讨论的问题之一，解决这个问题再一起阅读static_init的源码。

分析

为了解释上面的代码，我们做个实验。

在XCode的main.m文件中输入以下代码：

class Person{
public:
    Person(){
        printf("Person::Person()");
    }

    ~Person(){
        printf("Person::~Person()");
    }
};

Person kyson;

int main() {
    return 0;
}
复制代码

执行后会打印如下结果：

Person::Person()Person::~Person()
复制代码

说明执行了Person类的构造函数以及析构函数。如果读者对C++的构造函数以及析构函数还有任何疑问的话，可以大概了解一下C++的语法。笔者的侧重点在于，我们只是声明了：

Person kyson;
复制代码

为什么会执行构造函数以及析构函数呢。稍微debug一下，我们居然发现，Person kyson; 这句代码居然比main()函数提前执行。这有悖于我们之前了解的只有load函数早于main()函数执行的常识。那么，main()函数执行之前，系统究竟执行了哪些操作，哪些我们能hook呢。带着这个疑问，我们深入研究一下C++的全局变量。

C++ 全局变量初始化

实验

在objc_init()方法中删掉

    static_init();
复制代码

这一行，会发现程序有崩溃，崩溃的调用栈如下：

看右下角可知，其崩溃在方法pthread_rwlock_wrlock中。而这正是因为我们删掉static_init()后

rwlock_t runtimeLock;
rwlock_t selLock;
mutex_t cacheUpdateLock;
recursive_mutex_t loadMethodLock;
复制代码

这四行代码没有执行引起的（因为对应的构造函数不能执行）。至此谜题终于解开了。

继续深入讲解static_init方法，可以看出，getLibobjcInitializers方法是它的实现主体，点击进入可以看到如下实现：

GETSECT(getLibobjcInitializers,       Initializer,     "__objc_init_func");
复制代码

其中GETSECT是一个宏：

#define GETSECT(name, type, sectname)                                   \
type *name(const headerType *mhdr, size_t *outCount) {              \
    return getDataSection<type>(mhdr, sectname, nil, outCount);     \
}                                                                   \
type *name(const header_info *hi, size_t *outCount) {               \
    return getDataSection<type>(hi->mhdr(), sectname, nil, outCount); \
}
复制代码

因此我们可以将以上代码展开如下：

Initializer *getLibobjcInitializers(const headerType *mhdr, size_t *outCount) {
    return getDataSection<Initializer>(mhdr, "__objc_init_func", nil, outCount);
}

Initializer *getLibobjcInitializers(const header_info *hi, size_t *outCount) {
    return getDataSection<Initializer>(hi->mhdr(), "__objc_init_func", nil, outCount);
}
复制代码

getDataSection的代码如下：

// Look for a __DATA or __DATA_CONST or __DATA_DIRTY section 
// with the given name that stores an array of T.
template <typename T>
T* getDataSection(const headerType *mhdr, const char *sectname, 
                  size_t *outBytes, size_t *outCount)
{
    unsigned long byteCount = 0;
    T* data = (T*)getsectiondata(mhdr, "__DATA", sectname, &byteCount);
    if (!data) {
        data = (T*)getsectiondata(mhdr, "__DATA_CONST", sectname, &byteCount);
    }
    if (!data) {
        data = (T*)getsectiondata(mhdr, "__DATA_DIRTY", sectname, &byteCount);
    }
    if (outBytes) *outBytes = byteCount;
    if (outCount) *outCount = byteCount / sizeof(T);
    return data;
}
复制代码

其实看注释就大概清楚，这是为了获取“区”的数据，那么什么是“区”，为什么要读取区的数据，本文将带大家细细品味。

分析

要理解区，首先要了解苹果系统的可执行文件类型：mach-o文件的维基百科。

Mach-O文件 Mach-O为Mach Object文件格式的缩写，它是一种用于可执行文件，目标代码，动态库，内核转储的文件格式。作为a.out格式的替代，Mach-O提供了更强的扩展性，并提升了符号表中信息的访问速度。 Mach-O曾经为大部分基于Mach核心的操作系统所使用。NeXTSTEP，Darwin和Mac OS X等系统使用这种格式作为其原生可执行文件，库和目标代码的格式。而同样使用GNU Mach作为其微内核的GNU Hurd系统则使用ELF而非Mach-O作为其标准的二进制文件格式。

也就是说不论是iOS还是Mac中的可执行文件都是Mach-O类型的，有人说，不对啊，iOS的文件类型不是IPA么。但其实IPA真正是什么呢，我们可以再次看看维基百科的定义：

ipa后缀的文件是iOS系统的软件包，全称为iPhone application archive。通常情况下，ipa文件都是使用苹果公司的FairPlayDRM技术进行加密保护的。每个IPA文件都是ARM架构的可执行文件以及该应用的资源文件的打包文件，只能安装在iPhone，iPod Touch或iPad上。该文件可以通过修改后缀名为zip后，进行解压缩，查看其软件包中的内容。

也就是说IPA文件其实是个压缩包，里面包含了Mach-O，即可执行文件。那既然如此，一个Mach-O文件是如何构成的呢，这个我们可以从苹果官网找到答案：

Overview of the Mach-O Executable Format Mach-O is the native executable format of binaries in OS X and is the preferred format for shipping code. An executable format determines the order in which the code and data in a binary file are read into memory. The ordering of code and data has implications for memory usage and paging activity and thus directly affects the performance of your program. A Mach-O binary is organized into segments. Each segment contains one or more sections. Code or data of different types goes into each section. Segments always start on a page boundary, but sections are not necessarily page-aligned. The size of a segment is measured by the number of bytes in all the sections it contains and rounded up to the next virtual memory page boundary. Thus, a segment is always a multiple of 4096 bytes, or 4 kilobytes, with 4096 bytes being the minimum size. The segments and sections of a Mach-O executable are named according to their intended use. The convention for segment names is to use all-uppercase letters preceded by double underscores (for example, __TEXT); the convention for section names is to use all-lowercase letters preceded by double underscores (for example, __text). There are several possible segments within a Mach-O executable, but only two of them are of interest in relation to performance: the __TEXT segment and the __DATA segment.

以上摘自Introduction to Code Size Performance Guidelines

当然纯文字总是难以建立形象。这里给出一张大家都在用的Mach-O的结构图：

从这张图上来看，Mach-O文件的数据主体可分为三大部分，分别是头部（Header）、加载命令（Load commands）、和最终的数据（Data）。具体每个部分的含义这里不多做介绍了，后面的文章会慢慢为大家揭晓。这里仅需要知道，Data部分的某些SectionData我们可以往里面写入或者读取相应的数据。而对应的读方法是：

extern uint8_t *getsectiondata(
    const struct mach_header_64 *mhp,
    const char *segname,
    const char *sectname,
    unsigned long *size);
复制代码

这就能理解我们文章开头提出的static_init()方法的含义了。其实就是找出__objc_init_func区的数据，获取了Initializer指针，然后按顺序调用。