Surprising ARC performance characteristics

The project I’m working on at the moment has quite tight performance constraints. It needs to start up quickly, do its work at a particular rate and, being an iOS app, there’s a hard limit on how much RAM can be used.

The team’s got quite friendly with Instruments, watching the time profile, memory allocations, thread switches[*] and storage access trying to discover where we can trade one off in favour of another.

[*] this is a topic for a different post, but “dispatch_async() all the things” is a performance “solution” that brings its own problems.

It was during one of these sessions that I noticed a hot loop in the app was spending a lot of time in a C++ collection type called objc::DenseMap. This is apparently used by objc_retain() and objc_release(), the functions used to implement reference counting when Objective-C code is compiled using ARC.

The loop was implemented using the closure interface, -[NSDictionary enumerateKeysAndValuesUsingBlock:. Apparently the arguments to a block are strong references, so each was being retained on entering the block and released on return. Multiply by thousands of objects in the collection and tens of iterations per second, and that was a non-trivial amount of time to spend in memory management functions.

I started to think about other data types in which I could express the same collection—is there something in the C++ standard library I could use?

I ended up using a different interface to the same data type – something proposed by my colleague, Mo. Since Cocoa was released, Foundation data types have been accessible via the CoreFoundation C API[**]. The key difference as far as modern applications are concerned is that the C API uses void * to refer to its content rather than id. As a result, and with appropriate use of bridging casts, ARC doesn’t try to retain and release the objects.

[**]I think that Foundation on OPENSTEP was designed in the same way, but that the C API wasn’t exposed until the OS X 10.0 release.

So this:

[myDictionary enumerateKeysAndObjectsUsingBlock: ^(id key, id object, BOOL *stop) {
  //...
}];

became this:

CFDictionaryRef myCFDictionary = (__bridge CFDictionaryRef)myDictionary;
CFIndex count = CFDictionaryGetCount(myCFDictionary);
void *keys[count];
void *values[count];
CFDictionaryGetKeysAndValues(myCFDictionary, keys, values);

for (CFIndex i = 0; i < count; i++)
{
  __unsafe_unretained id key = (__bridge id)keys[i];
  __unsafe_unretained id value = (__bridge id)values[i];
  //...
}

which turned out to be about 12% faster in this case.

I’ll finish by addressing an open question from earlier, when should I consider ditching Foundation/CoreFoundation completely? There are times when it’s appropriate to move away from those data types. Foundation’s adaptive algorithms are very fast a lot of the time, choosing different representations under different conditions – but aren’t always the best choice.

Considering loops that enumerate over a collection like the loop investigated in this post, a C++ or C structure representation is good if the loop is calling a lot of messages. Hacks like IMP caching can also help, in which this:

for (MyObject *foo in bar)
{
  [foo doThing];
}

becomes this:

SEL doThingSelector = @selector(doThing);
IMP doThingImp = class_getMethodImplementation([MyObject class], doThingSelector);

for (MyObject *foo in bar)
{
  doThingImp(foo, doThingSelector);
}

If you’ve got lots (like tens of thousands, or hundreds of thousands) of instances of a class, Objective-C will add a measurable memory impact in the isa pointers (each object contains a pointer to its class), and the look aside table that tracks retain counts. Switching to a different representation can regain that space: in return for losing dynamic dispatch and reference-counted memory management—automatic or otherwise.

About Graham

I make it faster and easier for you to create high-quality code.
This entry was posted in code-level, performance, software-engineering. Bookmark the permalink.