Mac OS X Snow Leopard cuts in half the launch-time overhead of starting the Objective-C runtime, and simultaneously saves a few hundred KB of memory per app. This comes for free to every app, courtesy of one of the few pieces of Mac OS X that lives below even the Objective-C runtime: dyld.
dyld and the shared cache
dyld is the dynamic loader and linker. When your process starts, dyld loads your executable and its shared libraries into memory, links the cross-library C function and variable references together, and starts execution on its way towards main().
In theory a shared library could be different every time your program is run. In practice, you get the same version of the shared libraries almost every time you run, and so does every other process on the system. The system takes advantage of this by building the dyld shared cache. The shared cache contains a copy of many system libraries, with most of dyld’s linking and loading work done in advance. Every process can then share that shared cache, saving memory and launch time.
(Incidentally, the shared cache beats the pants off the pre-Leopard prebinding system that was supposed to achieve the same optimizations. Remember the post-install “Optimizing System Performance” step that often took longer than the install itself? That was prebinding being updated. Rebuilding the shared cache is so blazingly fast that the installer doesn’t bother to report it anymore.)
Objective-C selector uniquing
Leopard’s dyld shared cache is great for C code, but it didn’t do anything to help Objective-C’s startup overhead. The single biggest launch cost for Objective-C is selector uniquing. The app and every shared library contain their own copies of selector names like “alloc” and “init”. The runtime needs to choose a single canonical SEL pointer value for each selector name, and then update the metadata for every call site and method list to use the blessed unique value. This means building a big hash table (memory), calling strcmp() a lot (time), and modifying copy-on-write metadata (more memory).
There are tens of thousands of unique selectors present in a typical process. If you run
strings /usr/lib/libobjc.dylib on Leopard you can see the thirty-thousand-line built-in selector table that was a previous attempt to reduce the memory cost. Even so the cost goes up with every new class and method added to Cocoa.framework; left unchecked, an identical app would take longer to launch and use more memory after every OS upgrade.
The obvious solution? Do the work of selector uniquing in the dyld shared cache. Build a selector table into the shared cache itself, and update the selector references in the cached copy of the shared libraries. Then you save memory because every process shares the same selector table, and save time because the runtime does not need to rebuild it during every app launch. The runtime only needs to fix the selector references from the app itself. The catch? Selectors are too dynamic to be implemented as C symbols, so the shared cache construction tool needed to be taught how to read and write Objective-C’s metadata.
Snow Leopard’s dyld shared cache uniques Objective-C selectors, and Snow Leopard’s Objective-C runtime recognizes when the selectors in a shared library are already uniqued courtesy of the shared cache. About half of the runtime’s initialization time is eliminated, making warm app launch several tenths of a second faster. Typical memory savings is 200-500 KB per process, adding up to a few megabytes system-wide. When this optimization ships on the iPhone OS side, it’s estimated to save 1 MB on a 128 MB device. The iPhone performance team would pay any number of arms and legs for that kind of gain.
You can watch the system in action with various debugging flags.
$ sudo /usr/bin/update_dyld_shared_cache -debug -verify
update_dyld_shared_cache: for x86_64, uniquing objc selectors
update_dyld_shared_cache: for x86_64, found 68761 unique objc selectors
update_dyld_shared_cache: for x86_64, 541736/590908 bytes (91%) used in libobjc unique selector section
update_dyld_shared_cache: for x86_64, updated 205230 selector references
$ OBJC_PRINT_PREOPTIMIZATION=YES /usr/bin/defaults
objc: PREOPTIMIZATION: selector preoptimization ENABLED (version 3)
objc: PREOPTIMIZATION: honoring preoptimized selectors in /usr/lib/libobjc.A.dylib
objc: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
objc: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Metadata.framework/Versions/A/Metadata
objc: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation
You can estimate the memory savings with the allmemory tool. Record post-launch memory usage of an app run with and without environment variable OBJC_DISABLE_PREOPTIMIZATION=YES. Look for the count of dirty pages; each dirty page is 4 KB eaten by that process. With 64-bit TextEdit I see the dirty page count jump from 725 to 1069 after disabling the optimization. This is an overestimate - many of those pages would have been not-dirty in Leopard because of the old built-in selector table - but it does show the magnitude of the win.
The Objective-C runtime does more than just selector uniquing during launch. Future improvements to the dyld shared cache may precompute some of that other work, to further improve launch time, save memory, and reduce the cost of linking to Objective-C code that you don’t actually use. But selector uniquing as seen in Snow Leopard is by far the biggest bang for the buck.