iOS进阶之底层原理-应用程序加载(dyld加载流程、类与分类的加载)

iOS应用程序的入口是main函数,那么main函数之前系统做了什么呢?

我们定义一个类方法load,打断点,查看栈进程,我们发现dyld做了很多事,接下来就来探究到底dyld做了什么。

什么是dyld

dyld是英文 the dynamic link editor的简写,翻译过来就是动态链接器,是苹果操作系统的一个重要的组成部分。在 iOS/Mac OSX系统中,仅有很少量的进程只需要内核就能完成加载,基本上所有的进程都是动态链接的,所以 Mach-O镜像文件中会有很多对外部的库和符号的引用,但是这些引用并不能直接用,在启动时还必须要通过这些引用进行内容的填补,这个填补工作就是由 动态链接器dyld来完成的,也就是符号绑定。系统内核在加载 Mach-O文件时,都需要用 dyld链接程序,将程序加载到内存中。

dyld加载流程

根据程序堆栈,程序入口为dyld_start,我们从官网下载dyld源码,找到start函数入口,由于按照截图内直接复制查找找不到,但是start肯定是一个函数,所以全局搜索为截图内容,这就是程序入口了。

//
//  This is code to bootstrap dyld.  This work in normally done for a program by dyld and crt.
//  In dyld we have to do this manually.
//
uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
				const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{

    // Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536>
    dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0);

	// if kernel had to slide dyld, we need to fix up load sensitive locations
	// we have to do this before using any global variables
    rebaseDyld(dyldsMachHeader);

	// kernel sets up env pointer to be just past end of agv array
	const char** envp = &argv[argc+1];
	
	// kernel sets up apple pointer to be just past end of envp array
	const char** apple = envp;
	while(*apple != NULL) { ++apple; }
	++apple;

	// set up random value for stack canary
	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	// run all C++ initializers inside dyld
	runDyldInitializers(argc, argv, envp, apple);
#endif

	_subsystem_init(apple);

	// now that we are done bootstrapping dyld, call dyld's main
	uintptr_t appsSlide = appsMachHeader->getSlide();
	return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
  • 重定位dyld地址,rebaseDyld
  • 为堆栈canary设置随机值__guard_setup
  • 进入dyld的main函数

main

  • 检查环境变量,有设置环境变量,打印对应内容
  • 加载共享缓存
  • 链接共享缓存
  • 初始化可执行主程序(这是重点,我们将在这个里面加载镜像,与程序main函数链接)
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
		int argc, const char* argv[], const char* envp[], const char* apple[], 
		uintptr_t* startGlue)
{
    // 省略一堆代码 展示关键的代码
	checkEnvironmentVariables(envp);
	defaultUninitializedFallbackPaths(envp);
    if ( sEnv.DYLD_PRINT_OPTS )
		printOptions(argv);
	if ( sEnv.DYLD_PRINT_ENV ) 
		printEnvironmentVariables(envp);

	// load shared cache
	checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH,     mainExecutableSlide);
    mapSharedCache(mainExecutableSlide);

	// instantiate ImageLoader for main executable
	sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
	gLinkContext.mainExecutable = sMainExecutable;
	gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
	

	// run all initializers
	initializeMainExecutable(); 
}

初始化可执行主程序

initializeMainExecutable->runInitializers->processInitializers->recursiveInitialization,recursiveInitialization这边就是重点了

void initializeMainExecutable()
{
	// record that we've reached this step
	gLinkContext.startedInitializingMainExecutable = true;

	// run initialzers for any inserted dylibs
	ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
	initializerTimes[0].count = 0;
	const size_t rootCount = sImageRoots.size();
	if ( rootCount > 1 ) {
		for(size_t i=1; i < rootCount; ++i) {
			sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
		}
	}
	
	// run initializers for main executable and everything it brings up 
	sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
}

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	processInitializers(context, thisThread, timingInfo, up);
}

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
									 InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
	for (uintptr_t i=0; i < images.count; ++i) {
		images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
	}
}
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
										  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
			// let objc know we are about to initialize this image
			uint64_t t1 = mach_absolute_time();
			fState = dyld_image_state_dependents_initialized;
			oldState = fState;
			context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
			
			// initialize this image
			bool hasInitializers = this->doInitialization(context);

			// let anyone know we finished initializing this image
			fState = dyld_image_state_initialized;
			oldState = fState;
			context.notifySingle(dyld_image_state_initialized, this, NULL);
	
}
  • 在第一个notifySingle前,是让对象知道我们在初始化这个镜像
  • doInitialization初始化(内部会注册回调main函数,接下来会详细介绍)

第二个notifySingle,就是去调用load_images函数

static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{
	if ( state == dyld_image_state_mapped ) {
		// <rdar://problem/7008875> Save load addr + UUID for images from outside the shared cache
		// <rdar://problem/50432671> Include UUIDs for shared cache dylibs in all image info when using private mapped shared caches
		if (!image->inSharedCache()
			|| (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion)) {
			dyld_uuid_info info;
			if ( image->getUUID(info.imageUUID) ) {
				info.imageLoadAddress = image->machHeader();
				addNonSharedCacheImageUUID(info);
			}
		}
	}
	if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
		uint64_t t0 = mach_absolute_time();
		dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
		(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
		uint64_t t1 = mach_absolute_time();
		uint64_t t2 = mach_absolute_time();
		uint64_t timeInObjC = t1-t0;
		uint64_t emptyTime = (t2-t1)*100;
		if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
			timingInfo->addTime(image->getShortName(), timeInObjC);
		}
	}
    // mach message csdlc about dynamically unloaded images
	if ( image->addFuncNotified() && (state == dyld_image_state_terminated) ) {
		notifyKernel(*image, false);
		const struct mach_header* loadAddress[] = { image->machHeader() };
		const char* loadPath[] = { image->getPath() };
		notifyMonitoringDyld(true, 1, loadAddress, loadPath);
	}
}

我们为什么会知道以上流程呢,因为最开始的堆栈打印,notifySingle之后就直接是load_images了,notifySingle源码内也没有相关的直接调用的。这里就有个小细节,*sNotifyObjCInit直接函数调用,这就是关键了,这是个回调函数,全局搜索,在registerObjCNotifiers内部找到了sNotifyObjCInit的赋值,接下来就要探索registerObjCNotifiers什么时候调用了。同样是全局搜索,我们发现在_dyld_objc_notify_register这个函数中调用了,之后再全局搜索就没有什么调用了。这时候,就需要换个思维了,直接打断点。

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
	sNotifyObjCUnmapped = unmapped;
    // 省略了部分代码
}

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}

函数回调

我们发现在doInitialization之后还做了相应的处理,从这个函数开始跟随着堆栈流程doInitialization->doModInitFunctions->libSystem_initializer..->_objc_init->_dyld_objc_notify_register

这样,完整的一个流程就走完了在objc源码内,查找_objc_init,的确在内部有注册_dyld_objc_notify_register的一个回调。

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    runtime_init();
    exception_init();
#if __OBJC2__
    cache_t::init();
#endif
    _imp_implementationWithBlock_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}

/***********************************************************************
* static_init
* Run C++ static constructor functions.
* libc calls _objc_init() before dyld would call our static constructors, 
* so we have to do it ourselves.
**********************************************************************/
static void static_init()
{
    size_t count;
    auto inits = getLibobjcInitializers(&_mh_dylib_header, &count);
    for (size_t i = 0; i < count; i++) {
        inits[i]();
    }
    auto offsets = getLibobjcInitializerOffsets(&_mh_dylib_header, &count);
    for (size_t i = 0; i < count; i++) {
        UnsignedInitializer init(offsets[i]);
        init();
    }
}
  • environ环境处理

  • tls线程处理

  • static_init 运行了系统级别的全局C++的静态构造函数,所以load方法会在这个之后调用

  • runtime_init

    • objc::unattachedCategories.init(32)和objc::allocatedClasses.init();

  • exception_init异常回调

总结

  • _dyld_start
  • dyld::_main

    • 检查环境变量,有设置则去打印

    • 加载共享缓存,链接共享缓存

  • 可执行主程序镜像

    • 初始化镜像

    • 读取镜像文件到内存也是_dyld_objc_notify_register传过来的第一个参数回调map_images

    • doInitialization->doModInitFunctions->libSystem_initializer..->_objc_init->_dyld_objc_notify_register,递归初始化,注册回调函数

  • notifySingle发送通知,调用回调函数load_images

  • 最后回到程序入口main函数

map_images

这个函数主要作用是读取镜像文件到内存中,

  • 第一次进来会创建表
  • 修复sel指针,从header中读取sel存到sels一个表中
  • 将cls加入到表中
  • protocol加入到表中,并重映射
  • 非懒加载类重新加入表中并初始化realizeClassWithoutSwift
    • 递归创建rw,supercls,metacls,并将创建的这些赋值给cls,建立父类与元类的关系
    • methodizeClass(Fixes up cls's method list, protocol list, and property list.)
      • 将方法列表、属性列表、协议列表从ro中写入到rw中
        • 通过attachLists分别添加method_list_t、property_list_t、protocol_list_t
void
map_images(unsigned count, const char * const paths[],
           const struct mach_header * const mhdrs[])
{
    mutex_locker_t lock(runtimeLock);
    return map_images_nolock(count, paths, mhdrs);
}
void 
map_images_nolock(unsigned mhCount, const char * const mhPaths[],
                  const struct mach_header * const mhdrs[]){
        _read_images(hList, hCount, totalClasses, unoptimizedTotalClasses);
}

void _read_images(header_info **hList, uint32_t hCount, int totalClasses, int unoptimizedTotalClasses)
{
    if (!doneOnce){
        // 创建表
        // namedClasses
        // Preoptimized classes don't go in this table.
        // 4/3 is NXMapTable's load factor
        int namedClassesSize = 
            (isPreoptimized() ? unoptimizedTotalClasses : totalClasses) * 4 / 3;
        gdb_objc_realized_classes =
            NXCreateMapTable(NXStrValueMapPrototype, namedClassesSize);
    }
    // Fix up @selector references
    static size_t UnfixedSelectors;
    {
        // 修复sel指针
    }
    for (EACH_HEADER) {
        for (i = 0; i < count; i++) {
            Class cls = (Class)classlist[I];
            // readClass :将class加入到表中
            Class newCls = readClass(cls, headerIsBundle, headerIsPreoptimized);

            if (newCls != cls  &&  newCls) {// 这里不会来到
                // Class was moved but not deleted. Currently this occurs 
                // only when the new class resolved a future class.
                // Non-lazily realize the class below.
                resolvedFutureClasses = (Class *)
                    realloc(resolvedFutureClasses, 
                            (resolvedFutureClassCount+1) * sizeof(Class));
                resolvedFutureClasses[resolvedFutureClassCount++] = newCls;
            }
        }
    }

    // Category discovery MUST BE Late to avoid potential races
    // when other threads call the new category code before
    // this thread finishes its fixups.

    // +load handled by prepare_load_methods()

    // Realize non-lazy classes (for +load methods and static instances)
    for (EACH_HEADER) {
        classref_t const *classlist = hi->nlclslist(&count);
        for (i = 0; i < count; i++) {
            Class cls = remapClass(classlist[i]);
            if (!cls) continue;
            // 再加一次进入表中,防止遗漏
            addClassTableEntry(cls);

            // 这里是重点,初始化类,创建rw,设置data,supercls
            realizeClassWithoutSwift(cls, nil);
        }
    }
}

static Class realizeClassWithoutSwift(Class cls, Class previously)
{
    runtimeLock.assertLocked();

    class_rw_t *rw;
    Class supercls;
    Class metacls;

    if (!cls) return nil;

    auto ro = (const class_ro_t *)cls->data();
    auto isMeta = ro->flags & RO_META;
    if (ro->flags & RO_FUTURE) {// 不会来到
        // This was a future class. rw data is already allocated.
        rw = cls->data();
        ro = cls->data()->ro();
        ASSERT(!isMeta);
        cls->changeInfo(RW_REALIZED|RW_REALIZING, RW_FUTURE);
    } else {
        // Normal class. Allocate writeable class data.
        rw = objc::zalloc<class_rw_t>();
        rw->set_ro(ro);
        rw->flags = RW_REALIZED|RW_REALIZING|isMeta;
        cls->setData(rw);
    }

    supercls = realizeClassWithoutSwift(remapClass(cls->getSuperclass()), nil);
    metacls = realizeClassWithoutSwift(remapClass(cls->ISA()), nil);

// SUPPORT_NONPOINTER_ISA
#endif

    // Update superclass and metaclass in case of remapping
    cls->setSuperclass(supercls);
    cls->initClassIsa(metacls);

    // Reconcile instance variable offsets / layout.
    // This may reallocate class_ro_t, updating our ro variable.
    if (supercls  &&  !isMeta) reconcileInstanceVariables(cls, supercls, ro);

    // Set fastInstanceSize if it wasn't set already.
    cls->setInstanceSize(ro->instanceSize);

    if (supercls) {
        addSubclass(supercls, cls);
    } else {
        addRootClass(cls);
    }

    // Attach categories 这里是将ro内容赋值到rw中
    methodizeClass(cls, previously);

    return cls;
}

static void methodizeClass(Class cls, Class previously)
{
    runtimeLock.assertLocked();

    bool isMeta = cls->isMetaClass();
    auto rw = cls->data();
    auto ro = rw->ro();
    auto rwe = rw->ext();

    // Methodizing for the first time
    if (PrintConnecting) {
        _objc_inform("CLASS: methodizing class '%s' %s", 
                     cls->nameForLogging(), isMeta ? "(meta)" : "");
    }

    // Install methods and properties that the class implements itself.
    method_list_t *list = ro->baseMethods();
    if (list) {
        prepareMethodLists(cls, &list, 1, YES, isBundleClass(cls), nullptr);
        if (rwe) rwe->methods.attachLists(&list, 1);
    }

    property_list_t *proplist = ro->baseProperties;
    if (rwe && proplist) {
        rwe->properties.attachLists(&proplist, 1);
    }

    protocol_list_t *protolist = ro->baseProtocols;
    if (rwe && protolist) {
        rwe->protocols.attachLists(&protolist, 1);
    }

    // Root classes get bonus method implementations if they don't have 
    // them already. These apply before category replacements.
    if (cls->isRootMetaclass()) {
        // root metaclass
        addMethod(cls, @selector(initialize), (IMP)&objc_noop_imp, "", NO);
    }

    // Attach categories.
    if (previously) {
        if (isMeta) {
            objc::unattachedCategories.attachToClass(cls, previously,
                                                     ATTACH_METACLASS);
        } else {
            // When a class relocates, categories with class methods
            // may be registered on the class itself rather than on
            // the metaclass. Tell attachToClass to look for those.
            objc::unattachedCategories.attachToClass(cls, previously,
                                                     ATTACH_CLASS_AND_METACLASS);
        }
    }
    objc::unattachedCategories.attachToClass(cls, cls,
                                             isMeta ? ATTACH_METACLASS : ATTACH_CLASS);


}

load_images

读取镜像文件内容

void
load_images(const char *path __unused, const struct mach_header *mh)
{
    if (!didInitialAttachCategories && didCallDyldNotifyRegister) {
        didInitialAttachCategories = true;
        loadAllCategories();
    }

    // Return without taking locks if there are no +load methods here.
    if (!hasLoadMethods((const headerType *)mh)) return;

    recursive_mutex_locker_t lock(loadMethodLock);

    // Discover load methods
    {
        mutex_locker_t lock2(runtimeLock);
        prepare_load_methods((const headerType *)mh);
    }

    // Call +load methods (without runtimeLock - re-entrant)
    call_load_methods();
}

void call_load_methods(void)
{
    do {
        // 1. Repeatedly call class +loads until there aren't any more
        while (loadable_classes_used > 0) {
            call_class_loads();
        }

        // 2. Call category +loads ONCE
        more_categories = call_category_loads();

        // 3. Run more +loads if there are classes OR more untried categories
    } while (loadable_classes_used > 0  ||  more_categories);

    objc_autoreleasePoolPop(pool);

    loading = NO;
}
  • 准备加载oad方法
    • 递归schedule_class_load,先将有load方法的父类加入可执行loadable_list表中

    • 将有load方法的分类加入loadable_list表中

  • 先加载类的load方法,再加载分类的load方法

有load方法,那么initialize是什么时候调用呢

在之前我们讲过消息发送,会进入lookUpImpOrForward,在这个函数内调用realizeAndInitializeIfNeeded_locked-->initializeNonMetaClass这个方法,会先递归父类,然后再如果类有实现+initialize,那么就会调用。也就是说,类在第一次发送消息的时候,父类有实现的话,会优先调用,再调用子类的initialize方法。

static Class
realizeAndInitializeIfNeeded_locked(id inst, Class cls, bool initialize)
{
    runtimeLock.assertLocked();
    if (slowpath(!cls->isRealized())) {
        cls = realizeClassMaybeSwiftAndLeaveLocked(cls, runtimeLock);
        // runtimeLock may have been dropped but is now locked again
    }

    if (slowpath(initialize && !cls->isInitialized())) {
        cls = initializeAndLeaveLocked(cls, inst, runtimeLock);
        // runtimeLock may have been dropped but is now locked again

        // If sel == initialize, class_initialize will send +initialize and
        // then the messenger will send +initialize again after this
        // procedure finishes. Of course, if this is not being called
        // from the messenger then it won't happen. 2778172
    }
    return cls;
}

void initializeNonMetaClass(Class cls)
{
    ASSERT(!cls->isMetaClass());

    Class supercls;
    bool reallyInitialize = NO;

    // Make sure super is done initializing BEFORE beginning to initialize cls.
    // See note about deadlock above.
    supercls = cls->getSuperclass();
    if (supercls  &&  !supercls->isInitialized()) {
        initializeNonMetaClass(supercls);
    }
    
// 省略了很多判断代码
    callInitialize(cls);
    lockAndFinishInitializing(cls, supercls);
    
}

拓展

如果有很多流程不知道怎么走,可以参考逻辑教育Cooci老师分享的可执行源码,这样我们可以打断点进行调试。dyld有检查环境变量,根据这个变量可以打印相关信息,对于我们程序优化有很大的帮助。在终端输入export OBJC_HELP=1可以打印

就拿这个OBJC_PRINT_LOAD_METHODS为例,这是打印所有的load方法,这样就会在程序运行的时候打印出,在以后的启动优化中,可以减少自己创建的类load方法调用,用initialize和dispatch_once替代。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值