Selector uniquing in the dyld shared cache

转载 2015年11月19日 16:39:56

因为在阅读英文版教材时遇到了几个术语,不太理解,然后就转载了下。

Mac OS X Snow Leopard cuts in half the launch-time overhead of starting the Objective-C runtime, and simultaneously saves a few hundred KB of memory per app. This comes for free to every app, courtesy of one of the few pieces of Mac OS X that lives below even the Objective-C runtime: dyld.

一.dyld and the shared cache

dyld is the dynamic loader and linker. When your process starts, dyld loads your executable and its shared libraries into memory, links the cross-library C function and variable references together, and starts execution on its way towards main().

In theory a shared library could be different every time your program is run. In practice, you get the same version of the shared libraries almost every time you run, and so does every other process on the system. The system takes advantage of this by building the dyld shared cache. The shared cache contains a copy of many system libraries, with most of dyld’s linking and loading work done in advance. Every process can then share that shared cache, saving memory and launch time.

(Incidentally, the shared cache beats the pants off the pre-Leopard prebinding system that was supposed to achieve the same optimizations. Remember the post-install “Optimizing System Performance” step that often took longer than the install itself? That was prebinding being updated. Rebuilding the shared cache is so blazingly fast that the installer doesn’t bother to report it anymore.)

二.Objective-C selector uniquing

Leopard’s dyld shared cache is great for C code, but it didn’t do anything to help Objective-C’s startup overhead. The single biggest launch cost for Objective-C is selector uniquing. The app and every shared library contain their own copies of selector names like “alloc” and “init”. The runtime needs to choose a single canonical SEL pointer value for each selector name, and then update the metadata for every call site and method list to use the blessed unique value. This means building a big hash table (memory), calling strcmp() a lot (time), and modifying copy-on-write metadata (more memory).

There are tens of thousands of unique selectors present in a typical process. If you run strings /usr/lib/libobjc.dylib on Leopard you can see the thirty-thousand-line built-in selector table that was a previous attempt to reduce the memory cost. Even so the cost goes up with every new class and method added to Cocoa.framework; left unchecked, an identical app would take longer to launch and use more memory after every OS upgrade.

The obvious solution? Do the work of selector uniquing in the dyld shared cache. Build a selector table into the shared cache itself, and update the selector references in the cached copy of the shared libraries. Then you save memory because every process shares the same selector table, and save time because the runtime does not need to rebuild it during every app launch. The runtime only needs to fix the selector references from the app itself. The catch? Selectors are too dynamic to be implemented as C symbols, so the shared cache construction tool needed to be taught how to read and write Objective-C’s metadata.

三.Optimization WIN

Snow Leopard’s dyld shared cache uniques Objective-C selectors, and Snow Leopard’s Objective-C runtime recognizes when the selectors in a shared library are already uniqued courtesy of the shared cache. About half of the runtime’s initialization time is eliminated, making warm app launch several tenths of a second faster. Typical memory savings is 200-500 KB per process, adding up to a few megabytes system-wide. When this optimization ships on the iPhone OS side, it’s estimated to save 1 MB on a 128 MB device. The iPhone performance team would pay any number of arms and legs for that kind of gain.

You can watch the system in action with various debugging flags.

$ sudo /usr/bin/update_dyld_shared_cache -debug -verify
[…]
update_dyld_shared_cache: for x86_64, uniquing objc selectors
update_dyld_shared_cache: for x86_64, found 68761 unique objc selectors
update_dyld_shared_cache: for x86_64, 541736/590908 bytes (91%) used in libobjc unique selector section
update_dyld_shared_cache: for x86_64, updated 205230 selector references

$ OBJC_PRINT_PREOPTIMIZATION=YES /usr/bin/defaults
objc[424]: PREOPTIMIZATION: selector preoptimization ENABLED (version 3)
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /usr/lib/libobjc.A.dylib
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Metadata.framework/Versions/A/Metadata
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation

You can estimate the memory savings with the allmemory tool. Record post-launch memory usage of an app run with and without environment variable OBJC_DISABLE_PREOPTIMIZATION=YES. Look for the count of dirty pages; each dirty page is 4 KB eaten by that process. With 64-bit TextEdit I see the dirty page count jump from 725 to 1069 after disabling the optimization. This is an overestimate - many of those pages would have been not-dirty in Leopard because of the old built-in selector table - but it does show the magnitude of the win.

The Objective-C runtime does more than just selector uniquing during launch. Future improvements to the dyld shared cache may precompute some of that other work, to further improve launch time, save memory, and reduce the cost of linking to Objective-C code that you don’t actually use. But selector uniquing as seen in Snow Leopard is by far the biggest bang for the buck.

原网页:http://www.sealiesoftware.com/blog

相关文章推荐

故障排除:Shared Pool优化和Library Cache Latch冲突优化 (文档 ID 1523934.1)

文档内容   用途   提出问题,得到帮助并分享您的心得   排错步骤   什么是shared ...

buffer cache 和shared pool详解(之五,问题诊断总结)

诊断和解决ORA-04031 错误 Shared Pool的主要问题在根本上只有一个,就是碎片过多带来的性能影响。 1.2.7.1 什么是ORA-04031错误 当尝试在共享池分配大块的连续...

Buffer Cache与Shared Pool原理

LRU与Dirty List在Buffer Cache中,Oracle通过几个链表进行内存管理。LRU list用于维护内存中的Buffer,按照LRU算法进行管理。数据库初始化时,所有的Buffer...

【每日一摩斯】-Shared Pool优化和Library Cache Latch冲突优化 (1523934.1)-系列4

CURSOR_SHARING 参数 (8.1.6 以上)        这个参数需要小心使用。如果它被设为FORCE,那么Oracle会尽可能用系统产生的绑定变量来替换原来SQL中的literals...
  • bisal
  • bisal
  • 2013-09-02 09:21
  • 1601

【每日一摩斯】-Shared Pool优化和Library Cache Latch冲突优化 (1523934.1)-系列2

下面来谈一谈系列1中讲到的Literal SQL和Shared SQL的比较。 首先是Literal SQL: 在有完整的统计信息并且SQL语句在predicate(限定条件)中使用具体值...
  • bisal
  • bisal
  • 2013-08-31 10:08
  • 3128

library cache —— latch: shared pool

shared pool锁存器起到保护堆(共享池的基本内存结构)的作用。为了查找free chunk,检索空闲列,分配适当的chunk,必要时分隔空闲chunk的一连串工作,全都只能在获得shared ...

db_cache_size、shared_pool_size一些初始化参数为0?!

很奇怪,今天查询db_cache_size的时候发现value为0,后来查了下一些共享区,居然也是0。 SQL> show parameter db_cache_size NAME        ...

【每日一摩斯】-Shared Pool优化和Library Cache Latch冲突优化 (1523934.1)-系列6

使用SQL 查看Shared Pool问题        这一章节展示了一些可以用来帮助找到shared pool中的潜在问题的SQL语句。这些语句的输出最好spool到一个文件中。 注意:这些语...
  • bisal
  • bisal
  • 2013-09-05 09:25
  • 1673

深入理解Oracle中的shared pool与library cache组件及相关等待事件

SQL执行: 1,

【每日一摩斯】-Shared Pool优化和Library Cache Latch冲突优化 (1523934.1)-系列1

什么是Shared Pool?        Oracle的实例主要包括共享内存(主要是SGA,还有PGA)和Background Processes,其中SGA中又包括了Shared Pool、Bu...
  • bisal
  • bisal
  • 2013-08-28 10:17
  • 1301
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:深度学习:神经网络中的前向传播和反向传播算法推导
举报原因:
原因补充:

(最多只允许输入30个字)