On NSInvocation

I was going to get down to doing some writing, but then I got some new kit I needed to set up, so that isn’t going to happen. Besides which, I was talking to one developer about NSInvocation and writing to another about NSInvocation, then another asked about NSInvocation. So now seems like a good time to talk about NSInvocation.

What is NSInvocation?

Well, we could rely on Apple’s NSInvocation class reference to tell us that

An NSInvocation is an Objective-C message rendered static, that is, it is an action turned into an object.

This means that you can construct an invocation describing sending a particular message to a particular object, without actually sending the message. At some later point you can send the message as rendered, or you can change the target, or any of the parameters. This “store-and-forward” messaging makes implementing some parts of an app very easy, and represents a realisation of a design pattern called Command.

How is that useful?

The Gang of Four describes Command like this:

Encapsulate a request as an object, thereby letting you parameterize clients with different requests, queue or log requests, and support undoable operations.

Well, what is NSInvocation other than a request encapsulated as an object?

You can imagine that this would be useful in a distributed system, such as a remote procedure call (RPC) setup. In such a situation, code in the client process sends a message to its RPC library, which is actually acting as a proxy for the remote service. The library bundles up the invocation and passes it to the remote service, where the RPC implementation works out which object in the server process is being messaged and invokes the message on that target.

Spoiler alert: that really is how Distributed Objects on Mac OS X operates. NSInvocation instances can be serialised over a port connection and sent to remote processes, where they get deserialised and invoked.

An undo manager, similarly, works using the Command pattern and NSInvocation. Registering an undo action creates an invocation, describing what would need to be done to revert some user action. This invocation is placed on a queue, so the undo operations are all recorded in order. When the user hits Cmd-Z, the undo manager sends the most recent undo invocation to its target.

Similarly, an operation queue is just a list of requests that can be invoked later…this also sounds like it could be a job for NSInvocation (though to be sure, blocks are also used, which is another implementation of the same pattern).

The remaining common application of Command is for sending the same method to all of the objects in a collection. You would construct an invocation for the first object, then for each object in the collection change the invocation’s target before invoking it.

Got a Concrete Example?

OK, here’s one. You can use +[NSThread detachNewThreadSelector: toObject: withTarget:] to spawn a new thread. Because every thread in an iOS application needs its own autorelease pool, you need to create an autorelease pool at the beginning of the target selector’s method and release it at the end. Without using the Command pattern, this means one or more of:

  • Having a memory leak, if you can’t edit the method implementation
  • Having boilerplate autorelease pool code on every method that might – sometime – be called on its own thread
  • Having a wrapper method for any method that might – sometime – need to be called with or without a surrounding pool.

Sucks, huh? Let’s see if we can make that any better with NSInvocation and the Command pattern.

- (id)newResultOfAutoreleasedInvocation:(NSInvocation *)inv {
    id returnValue = nil;
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    [inv invoke];
    if ([[inv methodSignature] methodReturnLength] > 0) {
        if (strncmp([[inv methodSignature] methodReturnType],@encode(id), 1)) {
            char *buffer = malloc([[inv methodSignature] methodReturnLength]);
            if (buffer != NULL) {
                [inv getReturnValue: buffer];
                returnValue = [NSValue valueWithBytes: buffer objCType: [[inv methodSignature] methodReturnType]];
                free(buffer);
            }
        }
        else {
            [inv getReturnValue: &returnValue];
        }
        [returnValue retain];
    }
    [pool release];
    return returnValue;
}

Of course, we have to return a retained object, because the NSAutoreleasePool at the top of the stack when the invocation is fired off no longer exists. That’s why the method name is prefixed with “new”: it’s a hint to the analyser that the method will return a retained object.

The other trick here is the mess involving NSValue. That, believe it or not, is a convenience, so that the same method can be used to wrap invocations that have non-object return values. Of course, using NSInvocation means we’re subject to its limitations: we can’t use variadic methods or those that return a union type.

Now, for any method you want to call on a separate thread (or in an operation, or from a dispatch queue, or…) you can use this wrapper method to ensure that it has an autorelease pool in place without having to grub into the method implementation or write a specific wrapper method.

A side note on doing Objective-C properly: this method compares the result of -[NSMethodSignature methodReturnType] with a specific type using the @encode() keyword. Objective-C type encodings are documented to be C strings, and there’s even a page in the documentation listing the current values returned by @encode. It’s best not to rely on those, as Apple might choose to change or extend them in the future.

On comment docs

Something I’m looking at right now is generation of (in my case, HTML) API documentation from some simple markup format. The usual way to do this is by writing documentation markup inline in the source code, using specially formatted comments in header files.

The point

Some people argue that well-written source code should be its own documentation. Well, that’s true, it is: but it’s documentation with limited utility. Source code provides the following documentation:

  • Document, for the compiler’s benefit, the machine instructions that the compiler should generate
  • Document, for the programmer’s benefit, the machine instructions that the compiler should generate

Developing a high-level model of how software works from its source code is possible, but mentally taxing. It’s not designed for that. It would be like asking an ant to map the coastline of Africa: it can be done, but the information available is at entirely the wrong scale.

Several other approaches for high-level documentation of software systems exist. Of course, each of them is not actually the source code, but a model: a map of Africa is not actually Africa, but if you want to know what the coastline of Africa looks like then the map is a very useful model.

Comment documentation is one such model of software. Well-written comment docs explain why you might use a method or class, and how its properties or parameters help you to use it. It gives you a sense of how the classes fit together, and how you can exploit them. They’re called comment docs because they go into comments right alongside the source they document, usually marked up with particular tokens. As an example of a token, many documentation comment systems let me use a line like this:

@author Graham J. Lee

to indicate that I wrote a particular part of the project.

Of course, this documentation could go anywhere, so why put it into the header files? For a start, you already have the header files, so you’re not having to maintain two parallel hierarchies of content. Also, the proximity of the documentation to the source code means that there’s a higher probability (still not unity, but higher) that a developer who changes the intent or usage of a method will remember to update the documentation. Additionally, it means that the documentation can be no more verbose than required: anything that is obvious from the source (the names of methods and their parameters, for example) can be discovered from the source. This is not inconsistent with my earlier statement about source-as-documentation: you can easily find out a method signature without having to grub through the actual program instructions.

Finally, it means that when you’re working with the source, you have the documentation right there. This is one very common way to interact with comment documentation. The other way is to use a tool to create some friendly formatted output (for me, the goal is HTML) by reading the source files and extracting the useful information from the comments.

Doxygen

I have for the last forever (alright, three years) used Doxygen, for a couple of reasons:

  • The other people on my team at the time I adopted it were already using it
  • Since then, it has gained support for generating Xcode documentation sets, which can be viewed inside the Xcode organizer.

It’s not great, though. It’s a very complicated tool that tries to do all things for all people, so the configuration is huge and needs a lot of consideration. You can customise the look of the output using a CSS file but you pretty much need to as the default output looks like arse. Making it actually output different stuff is trickier, as it’s a C++ project and I’ve only ever learned enough C++ to customise Clang.

HeaderDoc

So I decided to fish around for alternatives. Of course, we know that Apple uses (and ships) HeaderDoc, but it uses its own comment format. Luckily, its comment format is identical to Javadoc, which is the format used by Doxygen. You can get headerdoc2html to look for the /** trigger by passing the -j flag.

Speaking of headerdoc2html, this is one of two programs in the HeaderDoc distribution. The other is gatherheaderdoc, which generates a table of contents from a collection of output files. Each of these is a well-written and well-documented perl script, which makes extension and modification super-easy.

Neither should be immediately required, in fact. The tool understands all of the Objective-C language features, and some extra bonus things like availability macros and groups of related methods. The default output basically looks like someone forgot to apply the style sheet to developer.apple.com. It’s trivial to configure HeaderDoc to use an external style sheet, and you can even create a custom template HTML file so that things appear in whatever fashion you want. Another useful configuration point is the C preprocessor, which you can get HeaderDoc to run through before interpreting the documentation.

Autogsdoc

The GNUstep project uses its own comment documentation tool called Autogsdoc. The fact that the autogsdoc documentation has buggy HTML does not fill one with confidence.

In fact, Autogsdoc’s output is unstyled XHTML 1.0, so it would be easy to style it to look more useful. Some projects that I’ve seen appear to have frames-based HTML, which is unfortunate.

Autogsdoc uses inline comment documentation, the same as the other options we’ve looked at. However, its markup is significantly different, as it uses SGML-style tags inside the documentation. It has a (well-documented, of course) Objective-C implementation with a good split of responsibilities between different classes. Hacking about on its innards shouldn’t be too hard for any motivated Cocoa coder.

Footnote

I know this has been annoying you all since the first paragraph in the section on HeaderDoc: */.

On Being a Software Person

Mike's Mum.On Wednesday I spoke at Qcon London, about “Mobile App Security and Privacy: You’re Doing It Wrong (and so am I)” as part of @akosma’s track on iOS and Android. The whole track was full of win: particularly, if you ever get the opportunity to hear @fraserspeirs talk about how his school is using iPads to change the way they teach, please do take the opportunity. You will find out a lot about how to write apps that people can (and will) use.

Thursday, I was part of a London iPhone Developer Group panel, alongside @akosma and @bmf, where we talked about $5 Xcode, when iOS and Mac OS X will finally converge, why you can’t sell software to an Android user. Oh, and we prototyped the UI for Photoshop for iPad.

Seriously, we did that. Why? Because there was sentiment in the discussion group that desktops couldn’t die because it was impossible to do Photoshop on an iPad. This is annoying. In order to show the group that doing Photoshop on the iPad might be possible, I made us do it. Now it turns out that once you have done it, it is possible.

[Update: @akosma since pointed out that Adobe has already done PhotoShop on iOS, and that this whole conversation was redundant before we began.]

What this really shows us is that in order to be a Software Person, you need to take seriously the idea that writing software might be possible. It often is possible, and if you try it you’re more likely to get a useful (and, of course, saleable) app out the end than if you don’t try it.

Conveniently, this is compatible with a meme that I started in my Qcon talk, which goes like this:

If you do not know x, then you cannot be a software engineer.

So, if you do not know tenacity, then you cannot be a software engineer. Conversely, you must also know when discretion is the better part of valour, and it is time to give up. No-one likes a death march project, when the cost of development has gone way beyond any likely return, or when the software you’re writing is no longer relevant.

The following is a list of things that you must know (or be able to hire someone to know for you) in order to be a software engineer. It is not exhaustive: these are merely the ones that I either came up with in my talk, or that Mike and I talked about while learning about evolution.

  • Tenacity
  • Discretion
  • Marketing
  • User experience
  • Estimation
  • Humility
  • When to be a dick
  • Testing
  • Human nature
  • Why phishing is successful
  • Requirements engineering
  • Why you would want to be a software engineer

Some people also add a task called “coding” to this list. Please add your own in the comments.

On Singleton(s)

I woke up this morning to a discussion on Twitter over how different implementations of the Singleton pattern compare. This is like comparing your Herpes: no matter whose is better or more efficient, you still have unsightly blisters.

Overview: wtf is a Singleton?

Singleton is one of the software design patterns originally collected in the famous Design Patterns: Elements of Reusable Object-Oriented Software by the “Gang of Four”. They describe the intent of the pattern thus:

Ensure a class only has one instance, and provide a global point of access to it.

The other useful source of design pattern information is Cocoa Design Patterns by Erik Buck and Don Yacktman. In addition to giving examples of Singleton found in the Foundation and AppKit frameworks, this book shows how to implement a Singleton class in Objective-C. The key features are:

  • A single access point to retrieve the single instance.
  • Initialization of the single instance.
  • Protection against accidentally creating another instance or deleting the single instance.

When would you use that?

Well here’s the thing: I won’t say that I “never would” use Singleton, but I will certainly say that it isn’t the most reached-for tool in my belt.

The usual reason is “this class models something of which there is only one thing”. This is the most absurd thing you’ll ever hear. There’s only one print spooler, so surely it must be a singleton. Right, that only works right up until the point where you need two print spoolers. When do you need a second print spooler? I’ll come onto that in the next section.

Similarly, just because there’s one filesystem, doesn’t necessarily mean that you only need one filesystem object. It certainly doesn’t mean you need to enforce there’s only one filesystem object.

Go on then, wise guy, when do you need the second singleton?

When you test the code that uses the first one. You don’t want your unit tests to talk to the real filesystem, or the real database, or the real print spool. You want your unit tests to use a Mock Object, which means they need the second, fake, instance of your filesystem object or whatever.

That’s the real problem with Singleton: often, in your app, it makes sense to provide a shared instance of an object (particularly one that has state relevant to the entire app). However, that’s not the same as requiring that no-one ever create another instance of the object.

So how would you do it, then?

Let’s assume I have a need for an application-wide Framistan instance. I have a couple of options:

  • Create the usual +[Framistan sharedFramistan] class method, as Singleton implementers would, but not all that refcount-avoiding cruft that normally goes with it.
  • Create the method -[[NSApp delegate] sharedFramistan]. After all, I have a need for an application-wide instance, and that’s where stuff associated with the application lives.

The option I go for depends on whether I think a process needs a shared instance of the class, or whether I think this app does. Usually, the latter option gets implemented first, and I change it later when I come to write the second app that uses the same class.

Either way, when I need to use the shared instance, I use it like this:

[someFrobnicator abdjulateWithFramistan: [Framistan sharedFramistan]];

Passing the shared instance in means that I still get to pass in other instances, for example in a test I can do [testFrobnicator abdjulateWithFramistan: mockFramistan];.

I don’t like that. Does anyone do something different?

Yes, some people solve Singleton…with another Singleton. The mind boggles. Anyway, what these people do is to create a FramistanFacade Singleton whose job is to manage access to the Framistan Singleton. Now your real Framistan can be an honest Singleton class, but clients talk to the (also-Singleton) FramistanFacade class, which decides what instance of what class it wants to talk to itself.

In Objective-C, the FramistanFacade can use message forwarding to act as a Proxy object, hiding the interaction between Façade and real object. Of course, now that you’re managing the real instance behind the Façade, the real class doesn’t need to be a Singleton because a different class is already managing how its instances are used.

Conclusion

The debates over how best to implement Singleton in Objective-C are redundant because the Singleton pattern is never what you need. Often you do need a shared instance of a class, but enforcing that no other instance ever get used it detrimental to testing.

When you do need a shared instance of a class in your app, ensure that you do not close the door to using alternate instances. The need comes up more often than you might expect.

On, or rather in, Seattle

I’ve never been to Washington before, so I’m looking forward to Voices That Matter: iPhone Developers Conference in April. Of course, you know I like the sound of my own voice enough to be speaking: my talk this year will be about Test-Driven Development of iOS apps. I heard from attendees in Philly that they wanted my unit testing talk to be more meat and less waffle, and that’s the talk I’ll be giving this time.

I also like the sound of other people’s voices, and the schedule this time looks spot on: fellow sideburn-wielders Mike Lee and Andy Ihnatko are talking, as is Aaron Hillegass: all worth listening to. I’m a bit nervous about talking to an empty room as I’m scheduled up against Cat Shive who gives a good talk too.

If you’re interested in attending, now is the time to register. Early bird pricing ends Friday, but whether you sign up now or next week you can use promo code SEASPK2 to get an extra $100 off. Right now, that means attendance is only $395.

By the way, I’m sounding out things to do while I’m in Seattle, so any locals who have expert knowledge on where to eat and what to see please feel welcome to comment. I’m definitely up for visiting 1, Microsoft Way in nearby Redmond. If you work for Microsoft, offer to run a tour ;-).

On repeatable builds

One of the key features of software engineering, as distinct from cowboy coding or hacking, is that it should be repeatable. That doesn’t mean that you should do the same project twice in identical ways from beginning to end: that would be a waste of time (and you can bet the requirements have moved the second time around). But it does mean that two people, investigating the same project at the same point in the project, should be able to reproduce the same results.

What does that mean for builds?

If you ask me to investigate a bug a customer reported with version 1.5.3 of your app, then I should be able to build version 1.5.3 of your app even if you’re now working on version 3.0.2. The product I end up with by building version 1.5.3 should be exactly the same as the one you’d end up with, and also exactly the thing the customer was using.

In practice, this means that your build should be as simple as possible. Given access to your source repository, any developer should be able to see what they need to check out, to correspond to a particular version of your app (a released version, a development branch, whatever). They should then hit the one button that gets that product built.

If there are any dependencies in the build process, these should be handled by the build process. If a particular version of your app uses a particular version of a framework or plug-in, that should be codified in the source for that version of your app. Leave nothing to chance, external configuration or intelligent interaction. Which are basically the same thing.

Why should I care?

Well, for a start, if you don’t know that the thing you’re debugging and the thing your customer’s using are the same, then you don’t know whether you can even find the customer’s bug, let alone fix it. There are ancillary benefits too.

Get new developers up to speed quickly

One of the things that I do to differentiate from other security consultants is offer a half-day “I just want you to fix this problem” service, which is a bit like Apple DTS or MSDN Technical Incidents for security (but with a security boffin on tap, and also available on Android ;->).

Now, if you hire me for half a day, you probably want me to spend that half-day solving your problem. I certainly want to spend my time that way. Neither of us wants me to spend a couple of hours trying to work out how your product’s built. Complicated build procedures are the top reason for failed/unreproducible builds. Every time a developer has to think about how to build the product, that’s a point where he can introduce a mistake.

Anyone can turn out a new release

You’ve just found out about a crashing bug in your million-dollar app, but you’re already on that Carribean cruise you’ve bought pre-emptively with the profits. No problem, you use the satellite phone to call a junior developer, and tell him to fix the bug and submit a new build to iTunes Connect.

…can he do it? Will he get it right? If not, can your company last the two weeks until you get back from burning the profits?

You can migrate to new hardware

Maybe you finally got that new MacBook Air you’ve been lusting after, or the Mac Pro to go in your evil lair. Or your dev computer just died, and you need to get a new one up and running. And your backups failed.

If you forget how to set up your dev environment, or you get it a bit wrong, you’ll waste time and get angry. Then you’ll get it more wrong.

You can automate your builds

You’ve seen those open source projects like WebKit that do nightly builds? I do that too. In fact, I have a build triggered whenever I commit source code to version control. If the build succeeds, it runs some tests. If any of that fails, I get an e-mail.

It’s called Continuous Integration, and provides very rapid feedback on the “health” of a project. But to get it done, you need for your builds to be unattended.

OK, I’m convinced. How do I do it?

Let’s take a look at the independent parts of your typical iOS app, and see what it would take to ensure that we always get the same version of them for every build.

Source code

You already use version control, right? Right. Do you tag your releases?

A tag in git or subversion (or any other version control system that supports them, but life’s too short) is just a way to name a particular commit. In git, it really is just a name (and optionally some other info) attached to a certain commit. In subversion, it’s actually a new branch, but one that you shouldn’t commit any changes to.

So let’s say you’ve just finished version 1.0, it doesn’t contain any known bugs, and it’s time to submit it to the store. You should tag the version of the source that corresponds to what you submit as the version 1.0 candidate.

Now, if you ever need to go back to version 1.0, to address an issue reported by the store reviewers, or to investigate a bug reported by a customer, you just check out that tagged version of your source. If you need another developer to look into something related to version 1.0, then she checks out that tagged version.

Of course, checking out a particular tag only gets you the same built product if the source under version control represents a complete definition of the product. Let’s see a few cases where that might not be true.

Subprojects

Apps sometimes contain classes that have been developed as a separate project, particularly collections of helper or utility classes.

Building version 1.0 of the app requires building whatever version of the helpers was used in version 1.0. If you change the two independently, and don’t have a way to keep both in sync, then you don’t have a way to build the same product again. That sucks.

Building a particular version of your app should automatically bring along the correct version of a subproject. The easiest way to do that is with git submodules or its equivalent in your preferred SCM. If that’s not possible for whatever reason, then you can use a shell script as part of the build process. Keep the shell script in version control so you get the correct version of the script that checks out the correct version of the subproject.

Building your app should automatically build the subproject. There is nothing worse than sitting and waiting for a build, only to find random link errors at the end. You send an email to the project maintainer, then after the weekend’s over she sends a reply saying “yeah, you need to build this project first”. Gah!

In Xcode, adding a subproject is as simple as dragging the subproject’s file into the list of files in your main project. You then edit your app target, and add the target for the subproject as a dependency for the app target. Now Xcode checks when you build the app whether it needs to build the subproject first. You’re guaranteed to link against an up-to-date version, not whatever old cruft Xcode found in your search paths.

Third-party frameworks/libraries

Sometimes, your project depends on someone else’s framework or library. You have that library, but you don’t have the source to it.

Check the headers and the binary for the library into version control. Yes, it’s icky. But now, when someone else checks out a particular revision of your source, they automatically get the correct revision of the library too, and compile with the correct headers.

Set up the header/library search paths for your Xcode target such that Xcode searches the folder where the revision-controlled library too, not whatever nasty old version you, I or someone else happens to have knocking around /usr/local/. This path should be relative to $(SRCROOT), i.e. it should say how to get from where your project is to where that library is.

It should not be an absolute path. Other developers don’t have the same path structure as you. They probably have a different username, for a start, so /Users/leeg/Library/Frameworks doesn’t exist for them. They may keep your source in a disk image. If you’re using continuous integration, the source probably gets checked out into a temporary workspace that moves every time. They key thing is don’t rely on the environment being the same for every build, rely on the checked-out source being sufficient to completely describe the build.

Interface Builder plug-ins

More common in the Mac world than iOS, IB plugins provide access to custom objects in Interface Builder and allow you to add them to your XIBs, inspect their properties and so on. If Xcode can’t find an ibplugin needed to compile a XIB, then builds start failing.

Again, check the IB plugin (or the source needed to build it) into version control. If you commit the source, don’t forget to make it a dependency of your app target so it definitely gets built when people try to build your app.

You don’t need an IB Plugins installed to build an Xcode project that relies on it! In your app’s target editor, you can set the paths to the required plug-ins or the search paths for ibtool to find plug-ins when it needs them. Remember to make these paths relative.

A developer trying to edit your XIBs will need to install the plug-in, but now he knows where to find it: he takes the version that was checked out when he checked out your app.

That all sounds hard to set up.

Not really. Create a new user account on your Mac, check out your app, and build it. Fix the problems until there are none. If you can, find a Mac you’ve never coded on before and repeat the same process.

Conclusion

You can avoid wasting a lot of time, and having failed or incorrect builds, by instituting a simple, repeatable build process. There should be exactly one thing to check out of version control, and important versions of that thing should be tagged. Opening the Xcode project that was in version control and hitting ‘build’ should be all that’s needed to get your product built, no matter how complicated the project and how many libraries it depends on. Hitting that ‘build’ button on one version of the source should always build the same product, on anybody’s computer.

On squeezing out that last ounce of performance

As I get confused by a component of an application that should be network-bound actually being limited by CPU availability, I get reminded of the times in my career that I’ve dealt with application performance.

I used to work on a platform for distributing MMS and SMS messages, written using GNUstep, Linux and PostgreSQL. I had a duplicate of the production hardware stack sitting in a data centre in our office, which ran its own copy of the production software and even had a copy of the production data. I could use this to run simulations or even replays of real events, finding the locations of the slowdowns and trying various hypotheses to remove the bottlenecks.

My next job was working on antivirus. The world of antivirus evaluations is dominated by independent testers, who produce huge bake-off articles comparing the various products. For at least one decade the business of actually detecting the stuff has been routine, so awards like VB100 are meaningless. In addition to detection stats, analysts like VB and AV-comparative measure resource consumption, and readers take those measurements seriously.

That’s because they don’t want to use anti-virus, and they didn’t pay for their RAM and CPUs to waste it on software they don’t want to use. So given a bunch of apps that all do the same thing, they’ll look at which does it with less impact on everything else. This means that performance is an important requirement of new projects in AV software: on the product I worked on we had a defined set of performance tests. A new project release could not— regardless of how shiny the new features were—ship if the tests took 5% or more time or RAM than the current shipping version. On like hardware. What that really means is that due to developments in hardware, AV software was getting monotonically faster up until a few years ago.

Since then, my relationship with performance optimisation has been more sporadic. I’ve worked on contracts to speed up iOS apps, and even almost took a performance analysis and improvement job on a mobile phone operating system team. But what I usually do is make software work (and make it secure), with making it work in such a way that people can actually use it being a part of that. Here, then, are my Reflections On Making Efficient Software™.

Start at the beginning.

You may have heard the joke about a man on a driving holiday who gets lost and asks a local for directions. The local thinks for a bit, and says “well to get to where you’re going, I wouldn’t start from here if I were you”.

Performance analysis can be like that, too. If you build up all of the functionality first, and optimise it later, you will almost certainly not get a well-performing product. Furthermore, fixing it will be very expensive. You may be able to squeeze a few kilobytes out here, or get a couple of percent speed increase there, but basically the resources used by your app will not change much.

The reason is simple: most of the performance characteristics of your app are baked into the top-level architecture. You need to start thinking about how your app will perform when you start to design what the various parts do and how they fit together. If you don’t design out the architectural bottlenecks, you’ll be stuck with them no matter how good your implementation is.

I’ve been involved with projects like this. It gets to near the ship date, and the app works but is useless because it consumes all of the RAM and/or takes too long to do any work. The project manager gets a developer in (hello!) to address the performance issues. After a few weeks, the developer has managed to understand the code, do some analysis, and improved things by a couple of percent. It’s gone from “sucky” to “quite sucky”, and the ship date isn’t any further away.

An example: If you build a component that can process, say, ten objects per second, then hands them on to another component that can display results at 100 objects per second, you’re always limited to 10 Hz give or take. You might get 12, you’ll never get 100 without replacing the first component or the whole app. Both of these options are more expensive after you’ve written the component than before. Which leads me on to the next top tip…

Simulate, simulate, simulate

So how are you supposed to know how fast your putative architecture is going to run on paper? That’s easy: simulate each component. If you believe that you’ll usually get data from the network at, say, 100 objects/sec, then write a driver that sends fake objects at about 100 Hz. If you think that might spike at 10,000 objects/sec, then simulate that spike too. You’ll be able to see what it takes to develop an app that can respond to those demands.

What’s more, you’ll be able to drop your real components into the simulated environment, and see how they really handle the situations you cook up. You can even use these harnesses as an integration test framework at the intermediate level (i.e. larger than classes, smaller than the whole app).

Your simulated components should use the same interfaces to the filesystem and each other that the real code will use, and the same frameworks or libraries. But they shouldn’t do any real work. E.g if you’ve got a component that should read a JSON stream from the network, break it into objects, do around 10ms of work on each object and post a notification after each one is finished, you can write a simulation to do that using the JSON library you plan on deploying, the sleep() function and NSNotificationCenter. Then you can play around with its innards, trying out operation queues, dispatch queues, caching and other techniques to see how the system responds.

Performance isn’t all about threads

Yes, Apple has a good concurrency programming guide. Yes, dispatch queues are new and cool. But no, not everything is sped up by addition of threads. I’m not going to do the usual meaningless micrometrics of adding ten million objects to a set, because no-one ever does that in real code.

The point is that doing stuff in the background is great for exactly one thing: getting that stuff off of the UI thread. For any other putative benefits, you need to measure. Perhaps threading will speed it up. Perhaps there aren’t any good concurrent algorithms. Perhaps the scheduling overhead will get in the way.

And of course you need to measure performance in an approximation to the customer’s environment. Your 12-core Mac Pro probably runs your multithreaded code in a different way than your user’s MacBook Air (especially if the Pro has a spinning disk). And the iPad is nothing like the simulator, of course.

Speed, memory, work: choose any two

You can make it faster and use less RAM by not doing as much. You can make it faster and do the same amount of work by caching. You can use less RAM and do the same amount of work by reducing the working set. Only very broken code can gain advances in speed and memory use while not changing the outcome.

Measure early and measure often

As I said at the top, there’s no point baking the app and then sprinkling with performance fairy dust at the end. You need to know what you’re aiming for, whether it’s achievable, and how it’s going throughout development.

You must have some idea of what constitutes acceptable performance, so devise tests that discover whether the app is meeting those performance requirements. Run these tests periodically throughout development. If you find that a recent change slowed things down, or caused too much memory to be used, now is a good time to fix that. This is where that simulation comes in useful, so you can get an idea about the system’s overall performance even before you’ve written it all.

On Timeless Programming Books

Recently, the Dog Spanner wrote about Programming With Quartz, a book written at the tail end of 2005 but which is still useful to Mac developers everywhere. I have to agree, this book is still on my shelf and gets an airing every now and then when I need to do battle with custom drawing (even on iOS).

Today, I had a related moment of epiphany related to the immortal nature of a book, as I had a problem with memory allocation that saw me reaching to Mac OS X Internals: a Systems Approach by Amit Singh. This book was released just before Apple’s transition to Intel hardware, but is still for me a definitive reference on how Mac OS X works. If I need to know about the innards of HFS+, the memory allocator or anything at a similar level, it is to this book I turn.

There’s never really been anything that investigates the higher level components of Mac OS X in quite the same depth. When I need to refresh my knowledge of the UNIX APIs, I turn first to Advanced UNIX Programming, a book first published in 1985(!); but it is the 2004 edition that is still a canonical description of programming in the UNIX environment.

Notice that each of these books is around five years old, and yet still I find myself referring to them frequently. In the fast-changing world of software development, where books on the iPhone 3 SDK are woefully outdated, that’s a great achievement.

Honourable mention: a book I have on my shelf that deserves discussion here is Object Oriented Programming: An Evolutionary Approach by Brad Cox. Written in 1986, this lays out his vision for component-based software development using object oriented programming languages. In it he describes a little language he created called Objective-C and uses it to explain his vision. This book demonstrates that not all computing principles are long-lived. Today’s programming practices look nothing like the “Software ICs” of OOP, although we’re still stuck using Objective-C.

On the broken(?) Mac App Store

A day after the Mac App Store was launched, people are reporting that it has been cracked. There are two separate stories here, a vapourware circumvention of the FairPlay DRM used to generate the receipts and a report that certain apps aren’t validating the receipts properly. We can ignore the first case for the moment: it’s important, and if it’s true then Apple needs to fix it (and co-ordinate updating the validation code with us third-party developers). But for the moment, it’s more important that developers are implementing the protections that are in place in their applications – it’s those applications that are supposed to be protected.

Let’s skip, for the moment, the question of whether DRM or anti-cracking mechanisms are ethically right, worthwhile, or how much effort you want to put into them. Apple have done most of the legwork, in providing a vendor-signed receipt that’s part of your signed app bundle. What you need to do is:

  • Check whether you have a receipt
  • Check whether Apple signed the receipt you have
  • Check whether the receipt is valid for your product
  • Check whether the receipt is valid for this version of your product
  • Check whether the receipt is valid for this computer

That’s it in a nutshell. Of course, some nutshells surround very big and complex nuts, and that’s true in this case:

  • There’s some good example code for receipt validation at github/roddi/ValidateStoreReceipt, if you’re going to use it then don’t just paste it wholesale. If everyone uses the same code then it’s super-easy for someone to detect and strip that code from each instance.
  • Check at runtime, as well as before startup. If you just check at startup, then all an attacker needs to do is patch main() to jump straight into NSApplicationMain() and your app runs for free.
  • Code obfuscation is not a very effective tool. Having worked in anti-virus, I know it’s much easier to classify code based on what it does than what it is, and it’s quite easy to find the code that opens the receipt file, or calls exit(173). That said, some of the commercial obfuscation companies offer a guaranteed service, so you can still protect your revenue after the app gets cracked.
  • Update I have been advised privately and seen in a blog post that people are recommending hard-coding their app bundle IDs and version numbers into the binary rather than using Info.plist, because that file can be edited. Well, so can the app binary…and in either case you’d need to re-sign the product with a valid certificate to continue, because Apple have used the kill flag:
    heimdall:~ leeg$ codesign -dvvvv /Applications/Twitter.app/
    Executable=/Applications/Twitter.app/Contents/MacOS/Twitter
    Identifier=com.twitter.twitter-mac
    Format=bundle with Mach-O universal (i386 x86_64)
    CodeDirectory v=20100 size=12452 flags=0x200(kill) hashes=616+3 location=embedded
    CDHash=8e0736639d79a108a5a1ebe89f928d1da0d49d94
    Signature size=4169
    Authority=Apple Mac OS Application Signing
    Authority=Apple Worldwide Developer Relations Certification Authority
    Authority=Apple Root CA
    Info.plist entries=21
    Sealed Resources rules=4 files=78
    Internal requirements count=2 size=344
    

    Changing a hard-coded string in a binary file is not difficult. You can of course obfuscate the string, but the motivated cracker still finds the point where the comparison is made (particularly easily if you use NSStrings). Really, how far you want to go depends on how much you’re willing to spend.

Of course, Fuzzy Aliens Ltd has already been implementing receipt validation for customers, so if this is too hard for you or you don’t have the time… ;-)

A last word on publicising receipt-validation vulnerabilities

You and I both make our living by selling software, or by selling services to people who sell software. Crowing on the interwebs about how this application or that application doesn’t validate its receipts properly is not cool, because you are shitting on your own doorstep. There is no public benefit to public disclosure that class of vulnerability, because DRM is not a user security feature. Don’t do that. Send the developer a private message explaining your findings. Give them a chance to put extra effort into protecting their product, if that’s what they want to do.

Protecting source code

As I mentioned on the missing iDeveloper.tv Live episode, one of the consequences of the Gawker hack was that their source code for their internal software was leaked into the Internet. I doubt any of my readers would want that to happen to their code, so I’m going to share the details of how I protect my clients’ code when I’m working. Maybe some of this will work for you.

In the office, I work at a desktop iMac. This has an external time machine backup disk and a DropBox for off-site storage. However, client code does _not_ go onto the DropBox. Instead I keep a separate, encrypted sparse disk image for each project I’m working on. The password for each is different. As well as protecting against snooping, this helps stop cross-contamination. I rarely have two such images mounted at once. Note that it’s not just source that goes into these images: build products, notes, Instruments traces, and images all go into the encrypted containers.

Obviously that means a lot of passwords, and no I can’t remember them all. I use a keychain. It locks automatically when not in use, and has a passphrase that’s different from my login passphrase.

The devices I test on are all encrypted where available (if a client needs me to test on an iPhone 3G, then I can, but it isn’t encrypted). They are passphrase locked, set to require passphrase immediately. And I NEVER take them away from the desk before deleting any developer builds, unless I need to do something special like a real-world location services test.

I rarely do coding work on the laptop, but when I do I copy the appropriate encrypted image onto it. The laptop additionally has FileVault configured, though I’m evaluating full-disk encryption options. Keychain configuration as above, additionally with a password required on wake from sleep or screensaver, and a firmware password.

For pushing work back to the clients, most clients use github or bitbucket which offer SSL-encrypted connections to the repositories. Personally, I have a self-run repo host available over HTTPS or SSH, but will probably move that to a github-like service because life’s too short. Their security policy seems acceptable to me.