Table of Contents
- objc_msgSend() Tour Part 1: The Road Map
- objc_msgSend() Tour Part 2: Setting the Stage
- objc_msgSend() Tour Part 3: The Fast Path
- objc_msgSend() Tour Part 4: Method Lookup & Some Odds and Ends
In the first three parts, I gave an overview, explained a bit of the ABI used by Objective-C, and took a near instruction by instruction tour of what happens on the fast path of Objective-C method dispatch. By fast path, I mean what happens 99.9% of the time; a very fast, no overhead, no function call, no locking, set of instructions that grabs the method implementation from the cache and does a tail-call jump to that implementation.
The slow path is used rarely. As little as once per unique selector invoked per class with a goal of filling the cache such that the slow path is never used again for that selector/class combination. A handful of operations will cause a class’s cache to be flushed; method swizzling, category loading, and the like.
Note that during
+initialize, methods won’t always be cached. Yet another reason to not do any real work during
+initialize!In part 3, the cache lookup loop contained a NULL check and, if NULL was encountered in the cache, then the code jumped to a cache miss label. It looked like this (with the original source interleaved):
// movq buckets(%a5, %a4), %r11 // method = cache->buckets[bytes/8] 0x0000512d movq 0x10(%r8,%rcx),%r11 // testq %r11, %r11 // if (method == NULL) 0x00005132 testq %r11,%r11 // je LCacheMiss$1 // goto cacheMissLabel 0x00005135 je 0x0000515c
Note that the disassembly shows that the cache miss label is located at address
0x0000515c. Not at all coincidentally, that is exactly where this particular post’s tour starts. Namely, what happens when the cache lookup misses and the cache must be filled.
When I referred to this code path as the slow path, I wasn’t kidding! This is the one spot where the messenger actually makes a call into a C function which is then responsible for traipsing about the runtime metadata to resolve the method and fill the cache. Beyond that, the C function —
_class_lookupMethodAndLoadCache() — is also responsible for ensuring that any
+initialize methods of the class (and superclasses) are invoked prior to the method itself being invoked. This also implies that
objc_msgSend() must effectively be recursively safe across this particular call site.
And that requirement leads to the need to preserve all of the various registers and push a stack from for the purposes of making the call. This is actually considerably more involved than a normal call site because the runtime is effectively hijacking the method invocation call to call something totally alien to the original method’s implementation!
0x0000515c pushq %rbp 0x0000515d movq %rsp,%rbp 0x00005160 subq $0x000000c0,%rsp
The above saves the current stack pointer and then bumps the stack pointer down by 0xc0 (192) bytes. This is to make room for…
0x00005167 movdqa %xmm0,0xffffff40(%rbp) 0x0000516f movdqa %xmm1,0xffffff50(%rbp) 0x00005177 movdqa %xmm2,0xffffff60(%rbp) 0x0000517f movdqa %xmm3,0xffffff70(%rbp) 0x00005187 movdqa %xmm4,0x80(%rbp) 0x0000518c movdqa %xmm5,0x90(%rbp) 0x00005191 movdqa %xmm6,0xa0(%rbp) 0x00005196 movdqa %xmm7,0xb0(%rbp) 0x0000519b movq %r10,0xc0(%rbp) 0x0000519f movq %rax,0xc8(%rbp) 0x000051a3 movq %rdi,0xd0(%rbp) 0x000051a7 movq %rsi,0xd8(%rbp) 0x000051ab movq %rdx,0xe0(%rbp)
… all of the registers that need to be pushed onto the stack. Note that %r10 doesn’t actually have to be pushed — it is a scratch register. Someday,
%r10 might disappear from that instruction stream.
0x000051af movq (%rdi),%rdi 0x000051b2 movq %rsi,%rsi 0x000051b5 callq __class_lookupMethodAndLoadCache 0x000051ba movq %rax,%r11
Now we get to the actual function call itself.
IMP _class_lookupMethodAndLoadCache(Class cls, SEL sel)
You can find the declaration and implementation in objc-class.m.
If you had read Greg’s “So You Crashed in Objc_MsgSend”, you would know that
%rdi contains the first argument and
%rsi contains the second argument when calling a function.
movq (%rdi),%rdi instruction dereferences the contents of
%rdi and shoves the result into
%rdi. Effectively, it grabs the
isa of the targeted object and passes it as the first parameter to the function. In the case of an instance, this will be the class of the instance. In the case of a class, it will be the metaclass of the class.
movq %rsi,%rsi instruction makes no sense to me. It is clearly loading the second argument, but it is effectively a no-op since the source and destination are the same and the
movq instruction doesn’t set or reset any of the processor’s status flags. Then again, I could easily be missing something.
movq %rsi,%rsi instruction looks like nonsense in this context. It is here, but I forgot that this is actually an expanded macro (thanks, Greg, for the explanation!). If we return to the original source and grab the line from the macro, you will see:
movq $0, %a1 movq $1, %a2 call __class_lookupMethodAndLoadCache
movq $1, %a2 instruction generates the
movq %rsi,%rsi instruction when expanded. Note that the source is a parameter to the macro, though! For other variants of
objc_msgSend() — for the ones that return values on the stack and not in registers — the “self” argument is actually in
%a2 and “_cmd” is in
%a3. Thus, the reason for the sometimes meaningless instruction.
If this particular codepath was performance sensitive to the degree where one or two instructions matters, the assembly macro language does have conditionals that could be used to eliminate the instruction when source and destination are the same.
Finally, the return value is passed back in register
%rax and is stowed away into register
%r11 for use shortly.
0x000051bd movdqa 0xffffff40(%rbp),%xmm0 0x000051c5 movdqa 0xffffff50(%rbp),%xmm1 0x000051cd movdqa 0xffffff60(%rbp),%xmm2 0x000051d5 movdqa 0xffffff70(%rbp),%xmm3 0x000051dd movdqa 0x80(%rbp),%xmm4 0x000051e2 movdqa 0x90(%rbp),%xmm5 0x000051e7 movdqa 0xa0(%rbp),%xmm6 0x000051ec movdqa 0xb0(%rbp),%xmm7 0x000051f1 movq 0xc0(%rbp),%r10 0x000051f5 movq 0xc8(%rbp),%rax 0x000051f9 movq 0xd0(%rbp),%rdi 0x000051fd movq 0xd8(%rbp),%rsi 0x00005201 movq 0xe0(%rbp),%rdx 0x00005205 movq 0xe8(%rbp),%rcx 0x00005209 movq 0xf0(%rbp),%r8 0x0000520d movq 0xf8(%rbp),%r9 0x00005211 movq %rbp,%rsp 0x00005214 popq %rbp
And… all these instructions are simply to restore all the registers back to the state they were in prior to the call to
0x00005215 cmpq %r11,%r11 0x00005218 jmp *%r11d
Finally, dispatch! The
cmpq resets the status registers to indicate a non-structure return value. In other words, the above two instructions are no longer contained in the method lookup macro, but are found in the messenger itself (and will be different from the other messengers).
_class_lookupMethodAndLoadCache() will never return NULL and, hence no need for a NULL check above. If a method isn’t found, then
_class_lookupMethodAndLoadCache() returns the address of the forwarding handler. Because forwarding may actually come into play often, the forwarding handler is actually put into the cache such that future invocations can leverage the fast path.
That is it; that is both the fast and the slow path of method invocation.
But what are these instructions?? There appear to be some leftovers!?!
0x0000521b movq 0x000b32be(%rip),%rdi 0x00005222 testq %rdi,%rdi 0x00005225 jneq 0x0000510a 0x0000522b movq $0x00000000,%rax 0x00005232 movq $0x00000000,%rdx 0x00005239 xorps %xmm0,%xmm0 0x0000523c xorps %xmm1,%xmm1 0x0000523f ret
Way back as the very first two instructions to
objc_msgSend() there was a nil check. If the target of method invocation is nil, then
This is the nil handling code. The last five instructions — the two
ret — take care of actually zeroing out all of the return value registers and returning control to the caller. This also explains why not some types of return values are undefined on message-to-nil. Message-to-nil can only safely zero out values that are returned in the return registers. There isn’t enough metadata in the C ABI to know how much of the stack to zero for values returned on the stack.
THe first three instructions load the value —
movq 0x000b32be(%rip),%rdi — contained in the
_objc_nilReceiver global into the register
testq %rdi,%rdi instruction sets the processor’s
zero flag in the status register if the value contained in
%rdi is zero. The
jneq 0x0000510a instruction will jump if the value is not zero. That address is actually back into
objc_msgSend and, in particular, basically does a dispatch to the nil message receiver with the same selector.
I wrote up how to write your own nil receiver quite some time ago.
0x00005240 movq %rdi,%rax 0x00005243 ret
These final two instructions are what happens when you invoke one of the ignored selectors under GC. It effectively causes the method to
return self; by moving the target of the method call from the first argument register
%rdi into the return value register
Note that this is also the reason why
-retainCount returns such an outrageous value under GC; it is the address of the object.
There you have it. That is every instruction that may be executed in
objc_msgSend() from beginning to end.