objc_msgSend() Tour Part 3: The Fast Path
Table of Contents
- objc_msgSend() Tour Part 1: The Road Map
- objc_msgSend() Tour Part 2: Setting the Stage
- objc_msgSend() Tour Part 3: The Fast Path
- objc_msgSend() Tour Part 4: Method Lookup & Some Odds and Ends
In any case, with the foundation set — with the id of the object to be targeted in %rdi and the selector of the method to be invoked in %rsi — we can jump into objc_msgSend() and understand exactly what happens instruction by instruction. Or more specifically, the compiler issues a call into objc_msgSend() (which sets up a stackframe for objc_msgSend() which, through tail call optimization, turns into the stackframe for the called method) and the method implementation that objc_msgSend() jumps to will issue a ret instruction to unwind the stack back to the original caller’s frame.
It is pretty easy to correlate the disassembly with the comments and code in the original source file. However, if you ever need to step through the messenger (si steps by instruction in gdb), this will be easier to follow as this is closer to the reality during a debug session.
For almost all method dispatches, dispatch takes what is called the “fast path”. That is, objc_msgSend() finds the implementation in the method cache and passes control to the implementation. Since this is the most common path, it is a good opportunity to break the tour of objc_msgSend() into two parts; the fast path and the slow path (with administrivia).
- Check for ignored selectors (GC) and short-circuit.
// this is the first instruction of objc_msgSend 0x000050f4 cmpq $0xfffeb010,%rsi 0x000050fb jeq 0x00005240 return;
The
cmpqinstruction compares the selector of the method call in register%rsiwith the value$0xfffeb010. If equal, then a jump to a couple of instruction below is made. Those instructions effectively returnselfto the caller as the result of the message send.Under GC, the selectors
retain,release,autorelease,retainCountanddeallocare re-mapped to the ignored selector. Thus, when any of these methods are invoked under GC,objc_msgSend()very quickly returnsselfwithout otherwise doing anything. In dual-mode code, a great percentage of method calls are these ignored selectors and, thus, this makes dual-mode code faster (while not significantly penalizing non-GC code; a fraction of a percent overhead in a typical application). - Check for nil target.
- Find the IMP on the class of the target and jump to it
0x00005101 testq %rdi,%rdi 0x00005104 jeq 0x0000521b
The testq instruction is effectively an AND of the operand without storing the resulting bits. “Odd”, you say, “why AND something with itself?”. As it turns out, this is the fastest way to test to see if something is 0. Effectively, this is testing the target of the method invocation — the self parameter — to see if it is zero. If it is zero, the zero status flag (a bit in the CPU’s status register) will be set and the JEQ or jump if equal will check the zero flag and make the jump if it is set.
What happens at 0x0000521b is explained below.
0x0000510a movq (%rdi),%r11
Now we are into the actual dispatch process. The first step is to grab the class of the target object. To do this, the contents of register %rdi are dereferenced and the result is put into %r11. Remember that the first instance variable in NSObject is the object’s isa pointer where the isa is really just a pointer to the class of the object!
Thus, this code grabs a pointer to the class of the object (or the metaclass of the class, if we are messaging a class object) and shoves it into %r11.
Once we have that, the search is on!
- Search the class’s method cache for the method IMP
This next set of instructions is actually the CacheLookup macro, if you are looking at the original source file.
0x0000510d movq %rcx,0xe0(%rsp) 0x00005112 movq %r8,0xe8(%rsp) 0x00005117 movq %r9,0xf0(%rsp)
Any time you see a chunk of code that involves a sequence of movq operations with the source operand being registers and the destination being a series of (%rsp) relative destinations, you can pretty much assume that the contents of the registers are being pushed onto the stack for preservation and that you’ll find pretty much the exact same sequence of instructions with operands reversed near the end of the code block.
In this case, it is preserving registers %r9, %r8 and %rcx. Most likely because those registers are going to be used during cache lookup!
And this is the actual cache lookup code. It is rather dense.
0x0000511c movq 0x10(%r11),%r8 0x00005120 movl (%r8),%r9d 0x00005123 shlq $0x03,%r9 0x00005127 movq %rsi,%rcx 0x0000512a andq %r9,%rcx 0x0000512d movq 0x10(%r8,%rcx),%r11 0x00005132 testq %r11,%r11 0x00005135 je 0x0000515c 0x00005137 addq $0x08,%rcx 0x0000513b andq %r9,%rcx 0x0000513e cmpq (%r11),%rsi 0x00005141 jne 0x0000512d
The source, though, is not quite so dense. The above is produced by this bit of assembly in a macro:
movq cache(%r11), %a5 // cache = class->cache // movl mask(%a5), %a6d shlq $$3, %a6 // %a6 = cache->mask < < 3 mov $0, %a4 // bytes = sel andq %a6, %a4 // bytes &= (mask << 3)
This grabs the cache out of the class and then sets up the various bits used to look in the cache for the method entry corresponding to the selector. The "method triplet" alluded to near the end is a triplet of data containing the selector, the IMP (the function pointer that is the method's implementation), and a C string containing type identifiers for the arguments (and return value) of the method. The goal of the following code is to find that triplet or jump to the code that loads the cache (the slow path).
// // search the receiver's cache // r11 = method (soon) // a4 = bytes // a5 = cache // a6 = mask < < 3 // $0 = sel LMsgSendProbeCache_$1: movq buckets(%a5, %a4), %r11 // method = cache->buckets[bytes/8] testq %r11, %r11 // if (method == NULL) je LCacheMiss$1 // goto cacheMissLabel // addq $$8, %a4 // bytes += 8 andq %a6, %a4 // bytes &= (mask < < 3) cmpq method_name(%r11), $0 // if (method_name != sel) jne LMsgSendProbeCache_$1 // goto loop // // cache hit, r11 = method triplet //
This is the actual cache lookup code. It loops across the cache and searches for an entry that has the same SEL as the desired selector. Note that the selector is a C string, but the selectors are also uniqued, thus allowing the cache lookup to simply compare the string address -- the SEL -- with the one in the cache entry (the cmpq instruction). If a NULL is encountered, the cache has been exhausted and, thus, execution jumps to the cache miss handler (the LCacheMiss$1 label).
This assembly looks a little different than most because it is actually a macro and, thus, is expanded into the instruction stream when used. Thus, when you see things like $0 and $1 those are actually the parameters passed to the macro. Because jumping to a label is a common task and because the macro is expanded multiple times, the label contains an argument such that the label can be made unique per use.
0x00005143 movq 0xe0(%rsp),%rcx 0x00005148 movq 0xe8(%rsp),%r8 0x0000514d movq 0xf0(%rsp),%r9
Restore the previously preserved preserved register values.
0x00005152 movq 0x10(%r11),%r11 0x00005156 cmpq %r11,%r11 0x00005159 jmp *%r11d
IMP was found! Or, actually, the address of the structure containing the cache entry for the method — the method triplet — was put into %r11. The movq with source 0x10(%r11) dereferences the address in %r11 and copies a quad-words worth of data from that address plus an offset of 0x10, which is where the IMP is stored.
jmp is an unconditional jump. The IMP that was stored into %r11 will be jumped to. So, why the comparison-to-same without testing in the line prior? That instruction clears and sets various status flags that are used by the x86_64 ABI to determine what kind of call is being made.
If you were to look at the source, you would see the comment // set nonstret (eq) for forwarding. “nonstret” refers to non-structure return. The Objective-C forwarding mechanism needs to know whether the method returns a structure because, if so, the self and _cmd arguments will be in different registers. Normally, the C compiler takes care of these shenanigans automatically (including when to turn method calls into calls to objc_msgSend_stret() or objc_msgSend_fpret()), but the forwarding mechanism has to be able to take apart the stack frame and, thus, needs to have a means of detecting which function ABI is in use.
Greg Parker, as often is the case, has a brilliant post discussion why objc_msgSend_stret() exists. There is also — and equally as well described on Greg’s weblog — objc_msgSend_fpret() that is used to handle certain kinds of floats/double (the set changes per architecture).
What follows is the rest of the instruction stream that is used to handle the “slow path”. That is, is used to fully resolve methods when not found in the cache.


December 22nd, 2009 at 2:14 pm
Thanks for the very helpful annotated tour! This stuff is fascinating, and I think it’s important to understand how things work from top to bottom.
January 6th, 2010 at 11:44 am
[…] 2.5 License. Amazon.com Widgets « objc_msgSend() Tour Part 1: The Road Map objc_msgSend() Tour Part 3: The Fast Path […]
February 4th, 2010 at 1:47 am
[…] objc_msgSend() Tour Part 3: The Fast Path […]
February 4th, 2010 at 2:13 am
[…] objc_msgSend() Tour Part 3: The Fast Path […]
May 17th, 2010 at 3:58 am
[…] a method on a NULL variable is a no-op in Objective-C. And the runtime already has a super-fast check for this case. So adding an extra if test into your code will probably slow things […]
May 26th, 2010 at 9:43 pm
[…] a garbage-collected environment, sending any object a release message is hardcoded by the runtime to do nothing (very quickly). So [pool release] won’t do anything. But [pool drain] will […]