objc_msgSend() Tour Part 3: The Fast Path

Table of Contents

  1. objc_msgSend() Tour Part 1: The Road Map
  2. objc_msgSend() Tour Part 2: Setting the Stage
  3. objc_msgSend() Tour Part 3: The Fast Path
  4. objc_msgSend() Tour Part 4: Method Lookup & Some Odds and Ends

In any case, with the foundation set — with the id of the object to be targeted in %rdi and the selector of the method to be invoked in %rsi — we can jump into objc_msgSend() and understand exactly what happens instruction by instruction. Or more specifically, the compiler issues a call into objc_msgSend() (which sets up a stackframe for objc_msgSend() which, through tail call optimization, turns into the stackframe for the called method) and the method implementation that objc_msgSend() jumps to will issue a ret instruction to unwind the stack back to the original caller’s frame.

It is pretty easy to correlate the disassembly with the comments and code in the original source file. However, if you ever need to step through the messenger (si steps by instruction in gdb), this will be easier to follow as this is closer to the reality during a debug session.

For almost all method dispatches, dispatch takes what is called the “fast path”. That is, objc_msgSend() finds the implementation in the method cache and passes control to the implementation. Since this is the most common path, it is a good opportunity to break the tour of objc_msgSend() into two parts; the fast path and the slow path (with administrivia).

  1. Check for ignored selectors (GC) and short-circuit.
    // this is the first instruction of objc_msgSend
    0x000050f4  cmpq        $0xfffeb010,%rsi
    0x000050fb  jeq         0x00005240                    return;

    The cmpq instruction compares the selector of the method call in register %rsi with the value $0xfffeb010. If equal, then a jump to a couple of instruction below is made. Those instructions effectively return self to the caller as the result of the message send.

    Under GC, the selectors retain, release, autorelease, retainCount and dealloc are re-mapped to the ignored selector. Thus, when any of these methods are invoked under GC, objc_msgSend() very quickly returns self without otherwise doing anything. In dual-mode code, a great percentage of method calls are these ignored selectors and, thus, this makes dual-mode code faster (while not significantly penalizing non-GC code; a fraction of a percent overhead in a typical application).

  2. Check for nil target.
  3. 0x00005101  testq       %rdi,%rdi
    0x00005104  jeq         0x0000521b

    The testq instruction is effectively an AND of the operand without storing the resulting bits. “Odd”, you say, “why AND something with itself?”. As it turns out, this is the fastest way to test to see if something is 0. Effectively, this is testing the target of the method invocation — the self parameter — to see if it is zero. If it is zero, the zero status flag (a bit in the CPU’s status register) will be set and the JEQ or jump if equal will check the zero flag and make the jump if it is set.

    What happens at 0x0000521b is explained below.

  4. Find the IMP on the class of the target and jump to it
  5. 0x0000510a  movq        (%rdi),%r11

    Now we are into the actual dispatch process. The first step is to grab the class of the target object. To do this, the contents of register %rdi are dereferenced and the result is put into %r11. Remember that the first instance variable in NSObject is the object’s isa pointer where the isa is really just a pointer to the class of the object!

    Thus, this code grabs a pointer to the class of the object (or the metaclass of the class, if we are messaging a class object) and shoves it into %r11.

    Once we have that, the search is on!

  1. Search the class’s method cache for the method IMP
  2. This next set of instructions is actually the CacheLookup macro, if you are looking at the original source file.

    0x0000510d  movq        %rcx,0xe0(%rsp)
    0x00005112  movq        %r8,0xe8(%rsp)
    0x00005117  movq        %r9,0xf0(%rsp)

    Any time you see a chunk of code that involves a sequence of movq operations with the source operand being registers and the destination being a series of (%rsp) relative destinations, you can pretty much assume that the contents of the registers are being pushed onto the stack for preservation and that you’ll find pretty much the exact same sequence of instructions with operands reversed near the end of the code block.

    In this case, it is preserving registers %r9, %r8 and %rcx. Most likely because those registers are going to be used during cache lookup!

    And this is the actual cache lookup code. It is rather dense.

    0x0000511c  movq        0x10(%r11),%r8
    0x00005120  movl        (%r8),%r9d
    0x00005123  shlq        $0x03,%r9
    0x00005127  movq        %rsi,%rcx
    0x0000512a  andq        %r9,%rcx
    0x0000512d  movq        0x10(%r8,%rcx),%r11
    0x00005132  testq       %r11,%r11
    0x00005135  je          0x0000515c
    0x00005137  addq        $0x08,%rcx
    0x0000513b  andq        %r9,%rcx
    0x0000513e  cmpq        (%r11),%rsi
    0x00005141  jne         0x0000512d

    The source, though, is not quite so dense. The above is produced by this bit of assembly in a macro:

    	movq	cache(%r11), %a5 // cache = class->cache
    	movl	mask(%a5), %a6d
    	shlq	$$3, %a6		 // %a6 = cache->mask < < 3
    	mov	$0, %a4			     // bytes = sel
    	andq	%a6, %a4		 // bytes &= (mask << 3)

    This grabs the cache out of the class and then sets up the various bits used to look in the cache for the method entry corresponding to the selector. The "method triplet" alluded to near the end is a triplet of data containing the selector, the IMP (the function pointer that is the method's implementation), and a C string containing type identifiers for the arguments (and return value) of the method. The goal of the following code is to find that triplet or jump to the code that loads the cache (the slow path).

    	// search the receiver's cache
    	// r11 = method (soon)
    	// a4 = bytes
    	// a5 = cache
    	// a6 = mask < < 3
    	// $0 = sel
    	movq	buckets(%a5, %a4), %r11	// method = cache->buckets[bytes/8]
    	testq	%r11, %r11			// if (method == NULL)
    	je	LCacheMiss$1			//   goto cacheMissLabel
    	addq	$$8, %a4			// bytes += 8
    	andq	%a6, %a4			// bytes &= (mask < < 3)
    	cmpq	method_name(%r11), $0		// if (method_name != sel)
    	jne	LMsgSendProbeCache_$1	//   goto loop
    	// cache hit, r11 = method triplet

    This is the actual cache lookup code. It loops across the cache and searches for an entry that has the same SEL as the desired selector. Note that the selector is a C string, but the selectors are also uniqued, thus allowing the cache lookup to simply compare the string address -- the SEL -- with the one in the cache entry (the cmpq instruction). If a NULL is encountered, the cache has been exhausted and, thus, execution jumps to the cache miss handler (the LCacheMiss$1 label).

    This assembly looks a little different than most because it is actually a macro and, thus, is expanded into the instruction stream when used. Thus, when you see things like $0 and $1 those are actually the parameters passed to the macro. Because jumping to a label is a common task and because the macro is expanded multiple times, the label contains an argument such that the label can be made unique per use.

    0x00005143  movq        0xe0(%rsp),%rcx
    0x00005148  movq        0xe8(%rsp),%r8
    0x0000514d  movq        0xf0(%rsp),%r9

    Restore the previously preserved preserved register values.

    0x00005152  movq        0x10(%r11),%r11
    0x00005156  cmpq        %r11,%r11
    0x00005159  jmp         *%r11d

    IMP was found! Or, actually, the address of the structure containing the cache entry for the method — the method triplet — was put into %r11. The movq with source 0x10(%r11) dereferences the address in %r11 and copies a quad-words worth of data from that address plus an offset of 0x10, which is where the IMP is stored.

    jmp is an unconditional jump. The IMP that was stored into %r11 will be jumped to. So, why the comparison-to-same without testing in the line prior? That instruction clears and sets various status flags that are used by the x86_64 ABI to determine what kind of call is being made.

    If you were to look at the source, you would see the comment // set nonstret (eq) for forwarding. “nonstret” refers to non-structure return. The Objective-C forwarding mechanism needs to know whether the method returns a structure because, if so, the self and _cmd arguments will be in different registers. Normally, the C compiler takes care of these shenanigans automatically (including when to turn method calls into calls to objc_msgSend_stret() or objc_msgSend_fpret()), but the forwarding mechanism has to be able to take apart the stack frame and, thus, needs to have a means of detecting which function ABI is in use.

    Greg Parker, as often is the case, has a brilliant post discussion why objc_msgSend_stret() exists. There is also — and equally as well described on Greg’s weblogobjc_msgSend_fpret() that is used to handle certain kinds of floats/double (the set changes per architecture).

What follows is the rest of the instruction stream that is used to handle the “slow path”. That is, is used to fully resolve methods when not found in the cache.

6 Responses to “objc_msgSend() Tour Part 3: The Fast Path”

  1. Andy Matuschak says:

    Thanks for the very helpful annotated tour! This stuff is fascinating, and I think it’s important to understand how things work from top to bottom.

  2. bbum’s weblog-o-mat » Blog Archive » objc_msgSend() Tour Part 2: Setting the Stage says:

    [...] 2.5 License. Amazon.com Widgets « objc_msgSend() Tour Part 1: The Road Map objc_msgSend() Tour Part 3: The Fast Path [...]

  3. bbum's weblog-o-mat » Blog Archive » objc_msgSend() Tour Part 1: The Road Map says:

    [...] objc_msgSend() Tour Part 3: The Fast Path [...]

  4. bbum's weblog-o-mat » Blog Archive » objc_msgSend() Tour Part 4: Method Lookup & Some Odds and Ends says:

    [...] objc_msgSend() Tour Part 3: The Fast Path [...]

  5. Don’t Check For nil in Your dealloc Methods « Vincent Gable’s Blog says:

    [...] a method on a NULL variable is a no-op in Objective-C. And the runtime already has a super-fast check for this case.  So adding an extra if test into your code will probably slow things [...]

  6. drain an NSAutoReleasePool Don’t release it « Vincent Gable’s Blog says:

    [...] a garbage-collected environment, sending any object a release message is hardcoded by the runtime to do nothing (very quickly). So [pool release] won’t do anything. But [pool drain] will [...]

Leave a Reply

Line and paragraph breaks automatic.
XHTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>