objc_msgSend() Tour Part 4: Method Lookup & Some Odds and Ends

Table of Contents

  1. objc_msgSend() Tour Part 1: The Road Map
  2. objc_msgSend() Tour Part 2: Setting the Stage
  3. objc_msgSend() Tour Part 3: The Fast Path
  4. objc_msgSend() Tour Part 4: Method Lookup & Some Odds and Ends

In the first three parts, I gave an overview, explained a bit of the ABI used by Objective-C, and took a near instruction by instruction tour of what happens on the fast path of Objective-C method dispatch. By fast path, I mean what happens 99.9% of the time; a very fast, no overhead, no function call, no locking, set of instructions that grabs the method implementation from the cache and does a tail-call jump to that implementation.

The slow path is used rarely. As little as once per unique selector invoked per class with a goal of filling the cache such that the slow path is never used again for that selector/class combination. A handful of operations will cause a class’s cache to be flushed; method swizzling, category loading, and the like.

Note that during +initialize, methods won’t always be cached. Yet another reason to not do any real work during +initialize!In part 3, the cache lookup loop contained a NULL check and, if NULL was encountered in the cache, then the code jumped to a cache miss label. It looked like this (with the original source interleaved):

// 	movq	buckets(%a5, %a4), %r11	// method = cache->buckets[bytes/8]
0x0000512d  movq        0x10(%r8,%rcx),%r11
// 	testq	%r11, %r11			// if (method == NULL)
0x00005132  testq       %r11,%r11
// 	je	LCacheMiss$1			//   goto cacheMissLabel
0x00005135  je          0x0000515c

Note that the disassembly shows that the cache miss label is located at address 0x0000515c. Not at all coincidentally, that is exactly where this particular post’s tour starts. Namely, what happens when the cache lookup misses and the cache must be filled.

When I referred to this code path as the slow path, I wasn’t kidding! This is the one spot where the messenger actually makes a call into a C function which is then responsible for traipsing about the runtime metadata to resolve the method and fill the cache. Beyond that, the C function — _class_lookupMethodAndLoadCache() — is also responsible for ensuring that any +initialize methods of the class (and superclasses) are invoked prior to the method itself being invoked. This also implies that objc_msgSend() must effectively be recursively safe across this particular call site.

And that requirement leads to the need to preserve all of the various registers and push a stack from for the purposes of making the call. This is actually considerably more involved than a normal call site because the runtime is effectively hijacking the method invocation call to call something totally alien to the original method’s implementation!

0x0000515c  pushq       %rbp
0x0000515d  movq        %rsp,%rbp
0x00005160  subq        $0x000000c0,%rsp

The above saves the current stack pointer and then bumps the stack pointer down by 0xc0 (192) bytes. This is to make room for…

0x00005167  movdqa      %xmm0,0xffffff40(%rbp)
0x0000516f  movdqa      %xmm1,0xffffff50(%rbp)
0x00005177  movdqa      %xmm2,0xffffff60(%rbp)
0x0000517f  movdqa      %xmm3,0xffffff70(%rbp)
0x00005187  movdqa      %xmm4,0x80(%rbp)
0x0000518c  movdqa      %xmm5,0x90(%rbp)
0x00005191  movdqa      %xmm6,0xa0(%rbp)
0x00005196  movdqa      %xmm7,0xb0(%rbp)
0x0000519b  movq        %r10,0xc0(%rbp)
0x0000519f  movq        %rax,0xc8(%rbp)
0x000051a3  movq        %rdi,0xd0(%rbp)
0x000051a7  movq        %rsi,0xd8(%rbp)
0x000051ab  movq        %rdx,0xe0(%rbp)

… all of the registers that need to be pushed onto the stack. Note that %r10 doesn’t actually have to be pushed — it is a scratch register. Someday, %r10 might disappear from that instruction stream.

0x000051af  movq        (%rdi),%rdi
0x000051b2  movq        %rsi,%rsi
0x000051b5  callq       __class_lookupMethodAndLoadCache
0x000051ba  movq        %rax,%r11

Now we get to the actual function call itself.

IMP _class_lookupMethodAndLoadCache(Class cls, SEL sel)

You can find the declaration and implementation in objc-class.m.

If you had read Greg’s “So You Crashed in Objc_MsgSend”, you would know that %rdi contains the first argument and %rsi contains the second argument when calling a function.

The movq (%rdi),%rdi instruction dereferences the contents of %rdi and shoves the result into %rdi. Effectively, it grabs the isa of the targeted object and passes it as the first parameter to the function. In the case of an instance, this will be the class of the instance. In the case of a class, it will be the metaclass of the class.

Frankly, the movq %rsi,%rsi instruction makes no sense to me. It is clearly loading the second argument, but it is effectively a no-op since the source and destination are the same and the movq instruction doesn’t set or reset any of the processor’s status flags. Then again, I could easily be missing something.

The movq %rsi,%rsi instruction looks like nonsense in this context. It is here, but I forgot that this is actually an expanded macro (thanks, Greg, for the explanation!). If we return to the original source and grab the line from the macro, you will see:

	movq	$0, %a1
	movq	$1, %a2
	call	__class_lookupMethodAndLoadCache

The movq $1, %a2 instruction generates the movq %rsi,%rsi instruction when expanded. Note that the source is a parameter to the macro, though! For other variants of objc_msgSend() — for the ones that return values on the stack and not in registers — the “self” argument is actually in %a2 and “_cmd” is in %a3. Thus, the reason for the sometimes meaningless instruction.

If this particular codepath was performance sensitive to the degree where one or two instructions matters, the assembly macro language does have conditionals that could be used to eliminate the instruction when source and destination are the same.

Finally, the return value is passed back in register %rax and is stowed away into register %r11 for use shortly.

0x000051bd  movdqa      0xffffff40(%rbp),%xmm0
0x000051c5  movdqa      0xffffff50(%rbp),%xmm1
0x000051cd  movdqa      0xffffff60(%rbp),%xmm2
0x000051d5  movdqa      0xffffff70(%rbp),%xmm3
0x000051dd  movdqa      0x80(%rbp),%xmm4
0x000051e2  movdqa      0x90(%rbp),%xmm5
0x000051e7  movdqa      0xa0(%rbp),%xmm6
0x000051ec  movdqa      0xb0(%rbp),%xmm7
0x000051f1  movq        0xc0(%rbp),%r10
0x000051f5  movq        0xc8(%rbp),%rax
0x000051f9  movq        0xd0(%rbp),%rdi
0x000051fd  movq        0xd8(%rbp),%rsi
0x00005201  movq        0xe0(%rbp),%rdx
0x00005205  movq        0xe8(%rbp),%rcx
0x00005209  movq        0xf0(%rbp),%r8
0x0000520d  movq        0xf8(%rbp),%r9
0x00005211  movq        %rbp,%rsp
0x00005214  popq        %rbp

And… all these instructions are simply to restore all the registers back to the state they were in prior to the call to _class_lookupMethodAndLoadCache().

0x00005215  cmpq        %r11,%r11
0x00005218  jmp         *%r11d

Finally, dispatch! The cmpq resets the status registers to indicate a non-structure return value. In other words, the above two instructions are no longer contained in the method lookup macro, but are found in the messenger itself (and will be different from the other messengers).

Note that _class_lookupMethodAndLoadCache() will never return NULL and, hence no need for a NULL check above. If a method isn’t found, then _class_lookupMethodAndLoadCache() returns the address of the forwarding handler. Because forwarding may actually come into play often, the forwarding handler is actually put into the cache such that future invocations can leverage the fast path.

That is it; that is both the fast and the slow path of method invocation.

But what are these instructions?? There appear to be some leftovers!?!

0x0000521b  movq        0x000b32be(%rip),%rdi
0x00005222  testq       %rdi,%rdi
0x00005225  jneq        0x0000510a
0x0000522b  movq        $0x00000000,%rax
0x00005232  movq        $0x00000000,%rdx
0x00005239  xorps       %xmm0,%xmm0
0x0000523c  xorps       %xmm1,%xmm1
0x0000523f  ret

Way back as the very first two instructions to objc_msgSend() there was a nil check. If the target of method invocation is nil, then jeq 0x0000521b.

This is the nil handling code. The last five instructions — the two movq, two xorps, and ret — take care of actually zeroing out all of the return value registers and returning control to the caller. This also explains why not some types of return values are undefined on message-to-nil. Message-to-nil can only safely zero out values that are returned in the return registers. There isn’t enough metadata in the C ABI to know how much of the stack to zero for values returned on the stack.

THe first three instructions load the value — movq 0x000b32be(%rip),%rdi — contained in the _objc_nilReceiver global into the register %rdi. The testq %rdi,%rdi instruction sets the processor’s zero flag in the status register if the value contained in %rdi is zero. The jneq 0x0000510a instruction will jump if the value is not zero. That address is actually back into objc_msgSend and, in particular, basically does a dispatch to the nil message receiver with the same selector.

I wrote up how to write your own nil receiver quite some time ago.

0x00005240  movq        %rdi,%rax
0x00005243  ret

These final two instructions are what happens when you invoke one of the ignored selectors under GC. It effectively causes the method to return self; by moving the target of the method call from the first argument register %rdi into the return value register %rax.

Note that this is also the reason why -retainCount returns such an outrageous value under GC; it is the address of the object.

There you have it. That is every instruction that may be executed in objc_msgSend() from beginning to end.

Deprecated: link_pages is deprecated since version 2.1.0! Use wp_link_pages() instead. in /srv/www/friday/bbum/wp-includes/functions.php on line 4713

6 Responses to “objc_msgSend() Tour Part 4: Method Lookup & Some Odds and Ends”

  1. gparker says:

    Frankly, the movq %rsi,%rsi instruction makes no sense to me. It is clearly loading the second argument, but it is effectively a no-op since the source and destination are the same and the movq instruction doesn’t set or reset any of the processor’s status flags.

    The instruction does nothing. It’s a side effect of the macros used to generate the code; in some other variants of objc_msgSend(), the selector is not yet in %rsi, and the instruction at that point puts it there.

  2. Jean-Daniel says:

    Thank you for this series of articles. I never had the courage to “decrypt” this assembly to understand how it works.

    I saw in sources that objc 2 introduce a new _fixup variants of obj-c message functions. Are these variants significantly different than the one you describe here ?

  3. Tan says:

    Isn’t “movq %rsi,%rsi instruction” the side effect of the macros? At least, that’s what I had been led to believe. Thank’s for this series though, real great resource.

  4. Tom Dalling says:

    Wow, you went into a crazy amount of detail. Really interesting stuff. At the moment I’m playing around with swizzling, NSInvocation and , so these articles could come in really handy. Thanks.

  5. Rob Napier says:

    Thanks for a really useful series. Do you have a pointer to more information about the “method triplet” that you allude to? Looking at the code, the cache seems made up of a method_name pointer (offset 0) and a method_imp pointer (offset 16). I assume that the signature string-pointer is at offset 8 and is just excluded in the structure definition because it’s not needed in this file? Or am I misreading it? This seems to match a method_t. Is a method_t the same as a “method triplet?” (I’m having trouble finding how or if method_t relates to Method; I don’t immediately see how runtime.h compiles for OBJC2 since objc_method appears to be #ifdef’d out.)

  6. Hari Karam Singh says:

    Excellent article! I was scanning for an explanation why various sources discourage the use of objc message calls in CoreAudio render callbacks. Are there any additional implications with multiple threads? Perhaps it’s just because of the number of cycles required in the slow case. This is easily remedied by ensuring the IMP cache is populated which I’ve read is faster than even C++ method calls. Any thoughts?

Leave a Reply

Line and paragraph breaks automatic.
XHTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>