Surprising ARC performance characteristics

The project I’m working on at the moment has quite tight performance constraints. It needs to start up quickly, do its work at a particular rate and, being an iOS app, there’s a hard limit on how much RAM can be used.

The team’s got quite friendly with Instruments, watching the time profile, memory allocations, thread switches[*] and storage access trying to discover where we can trade one off in favour of another.

[*] this is a topic for a different post, but “dispatch_async() all the things” is a performance “solution” that brings its own problems.

It was during one of these sessions that I noticed a hot loop in the app was spending a lot of time in a C++ collection type called objc::DenseMap. This is apparently used by objc_retain() and objc_release(), the functions used to implement reference counting when Objective-C code is compiled using ARC.

The loop was implemented using the closure interface, -[NSDictionary enumerateKeysAndValuesUsingBlock:. Apparently the arguments to a block are strong references, so each was being retained on entering the block and released on return. Multiply by thousands of objects in the collection and tens of iterations per second, and that was a non-trivial amount of time to spend in memory management functions.

I started to think about other data types in which I could express the same collection—is there something in the C++ standard library I could use?

I ended up using a different interface to the same data type – something proposed by my colleague, Mo. Since Cocoa was released, Foundation data types have been accessible via the CoreFoundation C API[**]. The key difference as far as modern applications are concerned is that the C API uses void * to refer to its content rather than id. As a result, and with appropriate use of bridging casts, ARC doesn’t try to retain and release the objects.

[**]I think that Foundation on OPENSTEP was designed in the same way, but that the C API wasn’t exposed until the OS X 10.0 release.

So this:

[myDictionary enumerateKeysAndObjectsUsingBlock: ^(id key, id object, BOOL *stop) {
  //...
}];

became this:

CFDictionaryRef myCFDictionary = (__bridge CFDictionaryRef)myDictionary;
CFIndex count = CFDictionaryGetCount(myCFDictionary);
void *keys[count];
void *values[count];
CFDictionaryGetKeysAndValues(myCFDictionary, keys, values);

for (CFIndex i = 0; i < count; i++)
{
  __unsafe_unretained id key = (__bridge id)keys[i];
  __unsafe_unretained id value = (__bridge id)values[i];
  //...
}

which turned out to be about 12% faster in this case.

I’ll finish by addressing an open question from earlier, when should I consider ditching Foundation/CoreFoundation completely? There are times when it’s appropriate to move away from those data types. Foundation’s adaptive algorithms are very fast a lot of the time, choosing different representations under different conditions – but aren’t always the best choice.

Considering loops that enumerate over a collection like the loop investigated in this post, a C++ or C structure representation is good if the loop is calling a lot of messages. Hacks like IMP caching can also help, in which this:

for (MyObject *foo in bar)
{
  [foo doThing];
}

becomes this:

SEL doThingSelector = @selector(doThing);
IMP doThingImp = class_getMethodImplementation([MyObject class], doThingSelector);

for (MyObject *foo in bar)
{
  doThingImp(foo, doThingSelector);
}

If you’ve got lots (like tens of thousands, or hundreds of thousands) of instances of a class, Objective-C will add a measurable memory impact in the isa pointers (each object contains a pointer to its class), and the look aside table that tracks retain counts. Switching to a different representation can regain that space: in return for losing dynamic dispatch and reference-counted memory management—automatic or otherwise.

Sideloading content into iOS apps

All non-trivial apps visualise content in some form, whether it’s game levels embedded in the app, data loaded from some internet service, or something else.

In many cases the developer who’s writing the Objective-C code isn’t going to be the person who creates or prepares this content. In the case of embedded content, this can lead to a slow feedback loop—the content experts create a database or some other assets, then send it to the developer. The developer prepares a build using the new assets, uploading it to TestFlight or some other ad-hoc distribution centre. Then the content people can download that app to see their content in the context of the application it’s designed for.

There’s a simple way to close this loop, letting content creators see the app with their latest changes as they make them. That is to use iTunes File Sharing to load the content via the app’s Documents folder.

If you have a line like this:

NSString *pathToContent = [[NSBundle mainBundle] pathToResource: @"myDatabase" ofType: @"sqlite"];

Change it to use a function like this:

NSString *pathToPotentiallySideloadedFile(NSString *filename, NSString *type)
{
    NSString *pathInDocumentsFolder = [[[NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) lastObject] stringByAppendingPathComponent: filename] stringByAppendingPathExtension: type];
    if (pathInDocumentsFolder)
        return pathInDocumentsFolder;
    else
        return [[NSBundle mainBundle] pathForResource: filename ofType: type];
}

//...
NSString *pathToContent = pathToPotentiallySideloadedFile(@"myDatabase", @"sqlite");

Now if people working on your app have a file in their Documents folder with the same name as the one used in the app, it’ll load their version. So, how do they get it in there?

You need to make a simple change to your app’s Info.plist:

    <key>UIFileSharingEnabled</key>
    <true/>

Now when anybody with the app connects their device to iTunes, they’ll be able to use file sharing to add their own content. Don’t forget to turn this off before you go live!

I mentioned at the beginning of this post that this technique can be used for networked apps. Obviously there isn’t really any difficulty getting updated content into a network-driven app; or if there is, someone did it wrong.

It’s the opposite problem you have: keeping the content fixed. If your online component—be it a CMS, a data feed from an API, or something else—is getting new data you can’t always ensure that the app is looking at the same stuff in testing. Indeed, sometimes I’ve found the CMS developers changing the data format without telling anyone; if you’re investigating a particular condition related to the state of the data, it can be hard to reproduce.

You can use the iTunes File Sharing technique to load a specific version of the app’s data without relying on the network connection and the server giving you the same output. This is great for regression testing, as you can ensure that only your code is changing between runs.

Object-Oriented callback design

One of the early promises of object-oriented programming, encapsulated in the design of the Smalltalk APIs, was a reduction – or really an encapsulation – of the complexity of code. Many programmers believe that the more complex a method or function is, the harder it is to understand and to maintain. Some developers even use tools to measure the complexity quantitatively, in terms of the number of loops or conditions present in the function’s logic. Get this “cyclomatic complexity” figure too high, and your build fails. Unfortunately many class APIs have been designed that don’t take the complexity of client code into account. Here’s an extreme example: the NSStreamDelegate protocol from Foundation.

-(void)stream:(NSStream *)stream handleEvent:(NSStreamEvent)streamEvent;

This is not so much an abstraction of the underlying C functions as a least-effort adaptation into Objective-C. Every return code the lower-level functionality exposes is mapped onto a code that’s funnelled into one place; this delegate callback. Were you trying to read or write the stream? Did it succeed or fail? Doesn’t matter; you’ll get this one callback. Any implementation of this protocol looks like a big bundle of if statements (or, more tersely, a big bundle of cases in a switch) to handle each of the possible codes. The default case has to handle the possibility that future version of the API adds a new event to the list. Whenever I use this API, I drop in the following implementation that “fans out” the different events to different handler methods.

-(void)stream:(NSStream *)stream handleEvent:(NSStreamEvent)streamEvent
{
  switch(streamEvent)
  {
  case NSStreamEventOpenCompleted:
    [self streamDidOpen: stream];
    break;
  //...
  default:
    NSAssert(NO, @"Apple changed the NSStream API");
    [self streamDidSomethingUnexpected: stream];
    break;
  }
}

Of course,

NSStream is an incredibly old class. We’ve learned a lot since then, so modern callback techniques are much better, aren’t they? In my opinion, they did indeed get better for a bit. But then something happened that led to a reduction in the quality of these designs. That thing was the overuse of blocks as callbacks. Here’s an example of what I mean, taken from Game Center’s authentication workflow.

@property(nonatomic, copy) void(^authenticateHandler)(UIViewController *viewController, NSError *error)

Let’s gloss over, for a moment, the fact that simply setting this property triggers authentication. There are three things that could happen as a result of calling this method: two are the related ideas that authentication could succeed or fail (related, but diametrically opposed). The third is that the API needs some user input, so wants the app to present a view controller for data entry. Three things, one entry point. Which do we need to handle on this run? We’re not told; we have to ask. This is the antithesis of accepted object-oriented practice. In this particular case, the behaviour required on event-handling is rich enough that a delegate protocol defining multiple methods would be a good way to handle the interaction:

@protocol GKLocalPlayerAuthenticationDelegate

@required
-(void)localPlayer: (GKLocalPlayer *)localPlayer needsToPresentAuthenticationInterface: (UIViewController *)viewController;
-(void)localPlayerAuthenticated: (GKLocalPlayer *)localPlayer;
-(void)localPlayer: (GKLocalPlayer *)localPlayer failedToAuthenticateWithError: (NSError *)error;

@end

The simpler case is where some API task either succeeds or fails. Smalltalk had a pattern for dealing with this which could be both supported in Objective-C, and extended to cover asynchronous design. Here’s how you might encapsulate a boolean success state with error handling in an object-oriented fashion.

typedef id(^conditionBlock)(NSError **error);
typedef void(^successHandler)(id result);
typedef void(^failureHandler)(NSError *error);
- (void)ifThis:(conditionBlock)condition then:(successHandler)success otherwise:(failureHandler)failure
{
  __block NSError *error;
  __block id result;
  if ((result = condition(&error)))
    success(result);
  else
    failure(error);
}

Now you’re telling client code whether your operation worked, not requiring that it ask. Each of the conditions is explicitly and separately handled. This is a bit different from Smalltalk’s condition handling, which works by sending the ifTrue:ifFalse: message to an object that knows which Boolean state it represents. The ifThis:then:otherwise: message needs to deal with the common Cocoa idiom of describing failure via an error object – something a Boolean wouldn’t know about.[] However, the Smalltalk pattern *is possible while still supporting the above requirements: see the coda to this post. This method could be exposed directly as API, or it can be used to service conditions inside other methods:

@implementation NSFileManager (BlockDelete)

- (void)deleteFileAtPath:(NSString *)path success:(successHandler)success failure:(failureHandler)failure
{
    [self ifThis: ^(NSError **error){
        return [self removeItemAtPath: path error: error]?@(1):nil;
    } then: success otherwise: failure];
}

@end

int main(int argc, const char * argv[])
{
    @autoreleasepool {
        [[NSFileManager defaultManager] deleteFileAtPath: @"/private/tmp" success: ^(id unused){
            NSLog(@"Holy crap, you deleted the temporary folder!");
        } failure: ^(NSError *error){
            NSLog(@"Meh, that didn't work. Here's why: %@", error);
        }];
    }
    return 0;
}

[*]As an aside, there’s no real reason that Cocoa needs to use indirect error pointers. Consider the following API:

-(id)executeFetchRequest:(NSFetchRequest *)fetchRequest

The return value could be an NSArray or an NSError. The problem with this is that in almost all cases this puts some ugly conditional code into the API’s client—though only the same ugly condition you currently have to do in testing the return code before examining the error. This separation of success and failure handlers encapsulates that condition in code the client author doesn’t need to see.

Coda: related pattern

I realised after writing this post that the Smalltalk-esque ifTrue:ifFalse: style of conditional can be supported, and leads to some interesting possibilities. First, consider defining an abstract Outcome class:

@interface Outcome : NSObject

- (void)ifTrue:(successHandler)success ifFalse: (failureHandler)failure;

@end

You can now define two subclasses which know what outcome they represent and the supporting data:

@interface Success : Outcome

+ (instancetype)successWithResult: (id)result;

@end

@interface Failure : Outcome

+ (instancetype)failureWithError: (NSError *)error;

@end

The implementation of these two classes is very similar, you can infer the behaviour of Failure from the behaviour of Success:

@implementation Success
{
  id _result;
}

+ (instancetype)successWithResult: (id)result
{
  Success *success = [self new];
  success->_result = result;
  return success;
}

- (void)ifTrue:(successHandler)success ifFalse:(failureHandler)failure
{
  success(_result);
}
@end

But that’s not the end of it. You could use the -ifThis:then:otherwise: method above to implement a Deferred outcome, which doesn’t evaluate its result until someone asks for it. Or you could build a Pending result, which starts the evaluation in the background, resolving to success or failure on completion. Or you could do something else.