Welcome back to another Friday Q&A. I apologize for the unannounced hiatus in posts. It's not due to anything interesting, just a shortage of time. Friday Q&A will continue, and I will continue to aim for my regular biweekly postings. For today's article, I have a little story about an autorelease
call that didn't do what it was supposed to do.
The Setup
ARC is a lovely technology but it doesn't cover everything. Sometimes you need to use CoreFoundation objects and you're back in the world of manual memory management.
Normally, that's no problem. I did manual memory management for many years, and while I enjoy not doing it with ARC, I still remember how. However, ARC makes some things a bit more difficult than they used to be. In particular, sometimes you want to autorelease
a CoreFoundation object. Without ARC, you might write something like this:
CFDictionaryRef MakeDictionary(void) {
CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL);
// Put some stuff in the dictionary here perhaps
[(id)dict autorelease];
return dict;
}
This gives you nice memory management semantics, where the caller is not responsible for releasing the return value, just like we're used to with most Cocoa methods. It takes advantage of the fact that all CoreFoundation objects are also Objective-C objects, and anautorelease
is a way to balance a CoreFoundation Create
call.
This code no longer works with ARC, because the call to autorelease
is not permitted. To solve this, Apple helpfully provided us with a CFAutorelease
function which does the same thing and can be used with ARC. Unfortunately, it's only available as of iOS 7 and Mac OS X 10.9. For those of us who need to support older OS releases, we have to improvise.
My solution was to get the selector for autorelease
using the sel_getUid
runtime call, which sneaks past ARC's rules. Then I'd send that selector to the CoreFoundation object, thus accomplishing the same thing as [(id)dict autorelease]
. My code looked like this:
CFDictionaryRef MakeDictionary(void) {
CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL);
// Put some stuff in the dictionary here perhaps
SEL autorelease = sel_getUid("autorelease");
IMP imp = class_getMethodImplementation(object_getClass((__bridge id)dict), autorelease);
((id (*)(CFTypeRef, SEL))imp)(dict, autorelease);
return dict;
}
Note: if you ended up here because you need code to accomplish this, do not use this code. As we will see shortly, it's broken. If you want working code for this, check the end of the article.
I tested this code and everything worked fine. A little later, another programmer on the project reported that it was consistently crashing for him. Fortunately, I was able to replicate his crash without too much difficulty. However, it took a while to figure out just what was going on.
The Crash
This code does not crash itself. However, it can cause a crash in subsequent code. For example:
CFDictionaryRef dict = MakeDictionary();
NSLog(@"Testing.");
NSLog(@"%@", dict);
This crashes on the second NSLog
line. The stack trace looks like a typical memory management crash:
frame #0: 0x00007fff917980a3 libobjc.A.dylib`objc_msgSend + 35
frame #1: 0x00007fff97175184 Foundation`_NSDescriptionWithLocaleFunc + 41
frame #2: 0x00007fff9077bd94 CoreFoundation`__CFStringAppendFormatCore + 7332
frame #3: 0x00007fff907aa313 CoreFoundation`_CFStringCreateWithFormatAndArgumentsAux + 115
frame #4: 0x00007fff907e1b9b CoreFoundation`_CFLogvEx + 123
frame #5: 0x00007fff9719ed0c Foundation`NSLogv + 79
frame #6: 0x00007fff9719ec98 Foundation`NSLog + 148
It seems that the dictionary is being destroyed before the NSLog
call. But how can that be? We called autorelease
in the function, and the autorelease pool has not yet been drained. The release
that will balance the CoreFoundation Create
call hasn't happened yet, so the object should still exist.
The Assembly
After poking at the code in various ways, I decided to read the assembly code generated by the compiler. There wasn't much to the code I wrote, so whatever problem there was must have been deeper.
Here's the x86-64 assembly output for the broken MakeDictionary
function:
_MakeDictionary: ## @MakeDictionary
.cfi_startproc
Lfunc_begin0:
.loc 1 11 0 ## test.m:11:0
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
subq $32, %rsp
movabsq $0, %rax
.loc 1 12 0 prologue_end ## test.m:12:0
Ltmp5:
movq %rax, %rdi
movq %rax, %rsi
movq %rax, %rdx
movq %rax, %rcx
callq _CFDictionaryCreateMutable
leaq L_.str(%rip), %rdi
movq %rax, -8(%rbp)
.loc 1 15 0 ## test.m:15:0
callq _sel_getUid
movq %rax, -16(%rbp)
.loc 1 16 0 ## test.m:16:0
movq -8(%rbp), %rax
movq %rax, %rdi
callq _object_getClass
movq -16(%rbp), %rsi
movq %rax, %rdi
callq _class_getMethodImplementation
movq %rax, -24(%rbp)
.loc 1 17 0 ## test.m:17:0
movq -24(%rbp), %rax
movq -8(%rbp), %rcx
movq -16(%rbp), %rsi
movq %rcx, %rdi
callq *%rax
movq %rax, %rdi
callq _objc_retainAutoreleasedReturnValue
movq %rax, %rdi
callq _objc_release
.loc 1 19 0 ## test.m:19:0
movq -8(%rbp), %rax
addq $32, %rsp
popq %rbp
ret
Pretty straightforward here. Since no real calculations are done, we can just look at the sequence of callq
instructions to see what functions are called. It callsCFDictionaryCreateMutable
, sel_getUid
, object_getClass
, class_getMethodImplementation
, and then there's an indirect call through the function pointer which is where it actually makes theautorelease
call. ARC then hops in and does some pointless but harmless work on the return value from the call by retaining it and then immediately releasing it. The function then returns the dictionary to the caller.
Mostly Harmless
It took me a little while to realize what was going on, but then it was obvious. I said that the ARC calls inserted are "pointless but harmless." In fact, they are anything but!
One of the interesting features that came with ARC is fast handling of autoreleased return values. This sort of pattern is extremely common with ARC:
// callee
obj = [[SomeClass alloc] init];
[obj setup];
return [obj autorelease];
// caller
obj = [[self method] retain];
[obj doStuff];
[obj release];
A human programmer would typically omit the retain
and release
calls in the caller, but ARC is more paranoid. This would make things a bit slower when using ARC, which is where the fast autorelease handling comes in.
There is some extremely fancy and mind-bending code in the Objective-C runtime's implementation of autorelease
. Before actually sending an autorelease
message, it first inspects the caller's code. If it sees that the caller is going to immediately callobjc_retainAutoreleasedReturnValue
, it completely skips the message send. It doesn't actually do an autorelease
at all. Instead, it just stashes the object in a known location, which signals that it hasn't sent autorelease
at all.
objc_retainAutoreleasedReturnValue
cooperates in this scheme. Before calling retain
, it first checks that known location. If it contains the right object, it skips the retain. The net result is that the above code is effectively transformed into this:
// callee
obj = [[SomeClass alloc] init];
[obj setup];
return obj;
// caller
obj = [self method];
[obj doStuff];
[obj release];
This is faster because it skips the autorelease pool entirely, saving three message sends and the accompanying work: autorelease
, the caller's retain
, and the eventual release
sent by the autorelease pool. It also allows the object to be destroyed earlier, reducing memory and cache pressure.
The beautiful thing about this technique is that because the runtime checks the caller's code before making this optimization, everything is perfectly compatible with code that doesn't participate in the scheme. If the caller does something else with the return value, then the runtime simply calls autorelease
and everything works normally.
I said that this code is not pointless. What, then, is the point of the retain
immediately followed by release
in the assembly above? It allows the caller to participate in this scheme even though it's not using the return value. It would be correct to simply omit them, but in that case, the fast autorelease
path is lost. It ends up being faster to make these two extra calls, at least in the common case.
I also said that this code is not harmless. The harm here is exactly that fast autorelease path. To ARC, an autorelease
in a function or method followed by a retain
in the caller is just a way to pass ownership around. However, that's not what's going on in this code. This code is attempting to actually put the object into the autorelease pool no matter what. ARC's clever optimization ends up bypassing that attempt and as a result, the dictionary is immediately destroyed instead of being placed in the autorelease pool for later destruction.
Root Cause
It all comes down to the function pointer cast used when making the autorelease
call:
((id (*)(CFTypeRef, SEL))imp)(dict, autorelease);
I wrote it like this because that's what the type is. The autorelease
method returns id
and takes two (normally implicit) parameters: self
and the selector being sent. I changed theself
parameter to CFTypeRef
instead of id
for convenience, but left the return type as id
since that's what it really is in the underlying autorelease
method. It shouldn't matter, since the return value is ignored anyway.
That return type is this code's downfall. I was careful to avoid ARC's meddling for the most part, but that id
makes ARC come in and start inserting calls, and that causes the dictionary to be immediately destroyed.
The Fix
Once all of this is known, the fix is easy. Get ARC out of the picture by having the call return CFTypeRef
instead of id
. Here's the complete function with the fix:
CFDictionaryRef MakeDictionary(void) {
CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL);
// Put some stuff in the dictionary here perhaps
SEL autorelease = sel_getUid("autorelease");
IMP imp = class_getMethodImplementation(object_getClass((__bridge id)dict), autorelease);
((CFTypeRef (*)(CFTypeRef, SEL))imp)(dict, autorelease);
return dict;
}
Dumping the assembly shows that ARC is now out of the picture:
_MakeDictionary: ## @MakeDictionary
.cfi_startproc
Lfunc_begin0:
.loc 1 11 0 ## test.m:11:0
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
subq $32, %rsp
movabsq $0, %rax
.loc 1 12 0 prologue_end ## test.m:12:0
Ltmp5:
movq %rax, %rdi
movq %rax, %rsi
movq %rax, %rdx
movq %rax, %rcx
callq _CFDictionaryCreateMutable
leaq L_.str(%rip), %rdi
movq %rax, -8(%rbp)
.loc 1 15 0 ## test.m:15:0
callq _sel_getUid
movq %rax, -16(%rbp)
.loc 1 16 0 ## test.m:16:0
movq -8(%rbp), %rax
movq %rax, %rdi
callq _object_getClass
movq -16(%rbp), %rsi
movq %rax, %rdi
callq _class_getMethodImplementation
movq %rax, -24(%rbp)
.loc 1 17 0 ## test.m:17:0
movq -24(%rbp), %rax
movq -8(%rbp), %rcx
movq -16(%rbp), %rsi
movq %rcx, %rdi
callq *%rax
.loc 1 19 0 ## test.m:19:0
movq -8(%rbp), %rcx
movq %rax, -32(%rbp) ## 8-byte Spill
movq %rcx, %rax
addq $32, %rsp
popq %rbp
ret
Architectures
One question remains: why did this code work for me initially, and my colleage only uncovered the crash later?
The answer is actually pretty simple, once everything else is known. This is an iOS project. I tested the code in the simulator, while he tried it on a real iPhone. The runtime function that performs the fast autorelease check is calledcallerAcceptsFastAutorelease
. It's architecture-specific since it's inspecting machine code. If you look at the version used in the 32-bit iOS simulator, the problem becomes apparent:
# elif __i386__ && TARGET_IPHONE_SIMULATOR
static bool callerAcceptsFastAutorelease(const void *ra)
{
return false;
}
In short, the fast autorelease handling is not implemented for the 32-bit iOS simulator. It makes sense that it wouldn't be. It's going to be some non-trivial amount of effort to implement and fix. Meanwhile, ARC is not supported on i386
for Mac programs, so the only way to hit this path on i386
is to run in the simulator. There's no real point in putting effort into extreme optimizations that will only apply to simulator apps.
Aside
Before writing this article, I first wrote a small test case so I could easily experiment and examine the problem in isolation. However, there was a big problem: the test case didn't work! Or rather, it did work just fine, and refused to crash. The code was really simple, roughly:
int main(int argc, char **argv)
{
@autoreleasepool {
CFDictionaryRef dict = MakeDictionary();
NSLog(@"Testing.");
NSLog(@"%@", dict);
}
return 0;
}
There isn't much room for error there, so it was baffling why it wouldn't crash.
After many single-steps through assembly in the debugger, I realized that it had to do with dyld
lazy binding. References to external functions aren't fully bound when a program is initially loaded. Instead, a stub is generated which has enough information to complete the binding the first time the call is made. On the first call to an external function, the address for that function is looked up, the stub is rewritten to point to it, and then the function call is made. Subsequent calls go directly to the function. By binding lazily, program startup time is improved and time isn't wasted looking up functions that are never called.
That means that on the very first run of this code, the call toobjc_retainAutoreleasedReturnValue
isn't fully bound. Because it's not fully bound,callerAcceptsFastAutorelease
doesn't realize that the call is toobjc_retainAutoreleasedReturnValue
. Because it doesn't see the call toobjc_retainAutoreleasedReturnValue
, the fast autorelease path isn't used. The dictionary goes into the autorelease pool as was originally intended, and the code works... once.
Once I figured that out, it was trivial to force the crash by inserting a loop:
int main(int argc, char **argv)
{
while(1) @autoreleasepool {
CFDictionaryRef dict = MakeDictionary();
NSLog(@"Testing.");
NSLog(@"%@", dict);
}
return 0;
}
The loop reliably crashes on the second iteration. The first time through triggers lazy binding of objc_retainAutoreleasedReturnValue
, which then allows the next call to take the fast autorelease path and trigger the bug.
This has little consequence for normal programs, which will perform the lazy binding for functions like these early on. It ended up being a severe complicating factor for a small test program, though.
Conclusion
ARC is great technology, but sometimes it's necessary to work around it. When working around it, you have to be sure you really work around it, and not give it any opportunity to jump in. If you do, it might decide to eliminate what looks like a useless autorelease call, causing your objects to be instantaneously destroyed instead of being peacefully returned to the caller.