More Excel-lent Adventures

I previously wrote about Excel as the most successful IDE:

Now what makes a spreadsheet better as a development environment is difficult to say; I’m unaware of anyone having researched it.

That research is indeed extant, and the story is well-told in A Small Matter of Programming. While Professor Nardi’s focus is on end-user programming, reading her book raises questions about gaps in professional programmer tools.

Specifically, in the realm of collaboration. Programming tools support a few limited forms of collaboration:

  • individual work
  • a team of independent individuals working on a shared project
  • pair programmers
  • code review

For everything else, there’s a whiteboard. Why? What’s missing?

Contractually-obligated testing

About a billion years ago, Bertrand Meyer (he of Open-Closed Principle fame) introduced a programming language called Eiffel. It had a feature called Design by Contract, that let you define constraints that your program had to adhere to in execution. Like you can convince C compilers to emit checks for rules like integer underflow everywhere in your code, except you can write your own rules.

To see what that’s like, here’s a little Objective-C (I suppose I could use Eiffel, as Eiffel Studio is in homebrew, but I didn’t). Here’s my untested, un-contractual Objective-C Stack class.

@interface Stack : NSObject

- (void)push:(id)object;
- (id)pop;

@property (nonatomic, readonly) NSInteger count;

@end

static const int kMaximumStackSize = 4;

@implementation Stack
{
    __strong id buffer[4];
    NSInteger _count;
}

- (void)push:(id)object
{
    buffer[_count++] = object;
}

- (id)pop
{
    id object = buffer[--_count];
    buffer[_count] = nil;
    return object;
}

@end

Seems pretty legit. But I’ll write out the contract, the rules to which this class will adhere provided its users do too. Firstly, some invariants: the count will never go below 0 or above the maximum number of objects. Objective-C doesn’t actually have any syntax for this like Eiffel, so this looks just a little bit messy.

@interface Stack : ContractObject

- (void)push:(id)object;
- (id)pop;

@property (nonatomic, readonly) NSInteger count;

@end

static const int kMaximumStackSize = 4;

@implementation Stack
{
    __strong id buffer[4];
    NSInteger _count;
}

- (NSDictionary *)contract
{
    NSPredicate *countBoundaries = [NSPredicate predicateWithFormat: @"count BETWEEN %@",
                                    @[@0, @(kMaximumStackSize)]];
    NSMutableDictionary *contract = [@{@"invariant" : countBoundaries} mutableCopy];
    [contract addEntriesFromDictionary:[super contract]];
    return contract;
}

- (void)in_push:(id)object
{
    buffer[_count++] = object;
}

- (id)in_pop
{
    id object = buffer[--_count];
    buffer[_count] = nil;
    return object;
}

@end

I said the count must never go outside of this range. In fact, the invariant must only hold before and after calls to public methods: it’s allowed to be broken during the execution. If you’re wondering how this interacts with threading: confine ALL the things!. Anyway, let’s see whether the contract is adhered to.

int main(int argc, char *argv[]) {
    @autoreleasepool {
        Stack *stack = [Stack new];
        for (int i = 0; i < 10; i++) {
            [stack push:@(i)];
            NSLog(@"stack size: %ld", (long)[stack count]);
        }
    }
}

2014-08-11 22:41:48.074 ContractStack[2295:507] stack size: 1
2014-08-11 22:41:48.076 ContractStack[2295:507] stack size: 2
2014-08-11 22:41:48.076 ContractStack[2295:507] stack size: 3
2014-08-11 22:41:48.076 ContractStack[2295:507] stack size: 4
2014-08-11 22:41:48.076 ContractStack[2295:507] *** Assertion failure in -[Stack forwardInvocation:], ContractStack.m:40
2014-08-11 22:41:48.077 ContractStack[2295:507] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException',
 reason: 'invariant count BETWEEN {0, 4} violated after call to push:'

Erm, oops. OK, this looks pretty useful. I’ll add another clause: the caller isn’t allowed to call -pop unless there are objects on the stack.

- (NSDictionary *)contract
{
    NSPredicate *countBoundaries = [NSPredicate predicateWithFormat: @"count BETWEEN %@",
                                    @[@0, @(kMaximumStackSize)]];
    NSPredicate *containsObjects = [NSPredicate predicateWithFormat: @"count > 0"];
    NSMutableDictionary *contract = [@{@"invariant" : countBoundaries,
             @"pre_pop" : containsObjects} mutableCopy];
    [contract addEntriesFromDictionary:[super contract]];
    return contract;
}

So I’m not allowed to hold it wrong in this way, either?

int main(int argc, char *argv[]) {
    @autoreleasepool {
        Stack *stack = [Stack new];
        id foo = [stack pop];
    }
}

2014-08-11 22:46:12.473 ContractStack[2386:507] *** Assertion failure in -[Stack forwardInvocation:], ContractStack.m:35
2014-08-11 22:46:12.475 ContractStack[2386:507] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException',
 reason: 'precondition count > 0 violated before call to pop'

No, good. Having a contract is a bit like having unit tests, except that the unit tests are always running whenever your object is being used. Try out Eiffel; it’s pleasant to have real syntax for this, though really the Objective-C version isn’t so bad.

Finally, the contract is implemented by some simple message interception (try doing that in your favourite modern programming language of choice, non-Rubyists!).

@interface ContractObject : NSObject
- (NSDictionary *)contract;
@end

static SEL internalSelector(SEL aSelector);

@implementation ContractObject

- (NSDictionary *)contract { return @{}; }

- (NSMethodSignature *)methodSignatureForSelector:(SEL)aSelector
{
    NSMethodSignature *sig = [super methodSignatureForSelector:aSelector];
    if (!sig) {
        sig = [super methodSignatureForSelector:internalSelector(aSelector)];
    }
    return sig;
}

- (void)forwardInvocation:(NSInvocation *)inv
{
    SEL realSelector = internalSelector([inv selector]);
    if ([self respondsToSelector:realSelector]) {
        NSDictionary *contract = [self contract];
        NSPredicate *alwaysTrue = [NSPredicate predicateWithValue:YES];
        NSString *calledSelectorName = NSStringFromSelector([inv selector]);
        inv.selector = realSelector;
        NSPredicate *invariant = contract[@"invariant"]?:alwaysTrue;
        NSAssert([invariant evaluateWithObject:self],
            @"invariant %@ violated before call to %@", invariant, calledSelectorName);
        NSString *preconditionKey = [@"pre_" stringByAppendingString:calledSelectorName];
        NSPredicate *precondition = contract[preconditionKey]?:alwaysTrue;
        NSAssert([precondition evaluateWithObject:self],
            @"precondition %@ violated before call to %@", precondition, calledSelectorName);
        [inv invoke];
        NSString *postconditionKey = [@"post_" stringByAppendingString:calledSelectorName];
        NSPredicate *postcondition = contract[postconditionKey]?:alwaysTrue;
        NSAssert([postcondition evaluateWithObject:self],
            @"postcondition %@ violated after call to %@", postcondition, calledSelectorName);
        NSAssert([invariant evaluateWithObject:self],
            @"invariant %@ violated after call to %@", invariant, calledSelectorName);
    }
}

@end

SEL internalSelector(SEL aSelector)
{
    return NSSelectorFromString([@"in_" stringByAppendingString:NSStringFromSelector(aSelector)]);
}

On a re-read you realise this isn’t really about Swift

It’s a bit early to have formed an opinion on a recently-announced programming language, but as the requisite number of people have asked what mine is (i.e. at least zero) I thought I’d type and see what happens.

Rules in programming tend to be bullshit. This is about one-third of a talk I’m giving later in the year, so I’ll leave that train of thought alone in case anyone’s going to be in the audience.

Anyway, knowing this, we can observe the exceptions to any rule people tend to throw at us about programming languages. For example: “Static types for good engineering, dynamic types for exploration“. Make sure your useing you’re type’s good, and notice how many engineering practices come from programmers who know dynamic languages.

We could add “you can’t do good tooling on dynamic languages”. O RLY.

Having thus realised that the rules are nonsense, and that some expert actually sat down and thought about what they wanted to get out of a language in the contexts in which they thought they’d use it. When I did that, I decided on something that goes in entirely the opposite direction to Swift. Starting with ObjC, I’m more likely to end up with Self than with Swift (but then I’ve already told you that I think Sun Microsystems did awesome shit, so that may be no surprise). It probably just means that I’m not an expert, though.

If you’re going to use types, may as well go all-in on using tuples. You’ll probably see plenty of definitions of a tuple, so let me confuse things by adding another:

A tuple is an element drawn from the cartesian product of its member types.

What this means is that unlike boring old-fashioned ordered collections, there’s a known number of things in a tuple and each is of a known type. This is useful for the vexing question of signalling errors, as you can just return something like a (Result?, Error?) like an industrialised society ought to be doing. It also means that almost every situation in which a method’s interface accepts or returns a collection can be replaced with a situation in which it accepts a known number of things of known type.

But anyway, for the most part our programming languages allow us to accidentally introduce problems that we shouldn’t need to solve, but the biggest problems we actually have to solve lie elsewhere.

I use mocks and I’m happy with that

Both Kent Beck and Martin Fowler have said that they don’t use mock objects in their test-driven development. I do. I use them mostly for the sense described first in my BNR blog post on Mock Objects, namely to stand in for a thing that can receive messages I want to send, but that does not yet exist.

If you look at the code in Test-Driven iOS Development, you’ll find that it uses plenty of test doubles but none of them is a mock object. What has changed in my worldview to move from not-mocking to mocking in that time?

The key information that gave me the insight was this message, pardon the pun! from Alan Kay on object-oriented programming:

The big idea is “messaging” – that is what the kernal[sic] of Smalltalk/Squeak
is all about (and it’s something that was never quite completed in our
Xerox PARC phase). The Japanese have a small word – ma – for “that which
is in between” – perhaps the nearest English equivalent is “interstitial”.
The key in making great and growable systems is much more to design how its
modules communicate rather than what their internal properties and
behaviors should be.

What I’m really trying to do is to define the network of objects connected by message sending, but the tool I have makes me think about objects and what they’re doing. To me, mock objects are the ability to subvert the tool, and force it to let me focus on the ma.

Code longevity

I recently wrote about the impending centenary of applied computing; a time when we could reflect on the first hundred years to make it easier for people to progress beyond our position into the second hundred years. This necessitates looking at the things we’ve tried, the things that succeeded and the things that failed. It involves recalling and describing the good ideas and the bad ideas.

So, did the bad ideas fail and the good ideas succeed? Can we declare that because something worked, it must have been a success? Is length of service a great proxy for quality of principle?

Let’s start by looking at the lifetime of some of the trappings of applied computing. I’m writing this on the smartphone shown in the picture below. It is, among the many computers I own that claim to be computers and could reasonably be described as modern, one of only two that is not running a recent variant of a minicomputer game–loading system.

Surface RT and Lumia 925

Now is that a fair assessment? Certainly all the Macs, iOSes, Androids (and even routers and television streamy box things) in the house are based on Unix, and Unix is the thing of the 1970s minicomputer. I’ve even used that idea to explain why we still have to deal with PDP-8 problems in iPhones. But is it fair to assume that because the name has lasted, then the idea has been preserved? Did Unix succeed, or has it been replaced by different things with the same name? That happens a lot; is today’s ethernet really the same ethernet that Bob Metcalfe and colleagues at PARC invented? Conversely, just because the name changed is everything new? Does Windows NT really represent a clean break in 1993?

There’s certainly some core, a kernel (f’nar) of the modern Unix that, whether in code or philosophy, can be traced back to the original system (and indeed beyond). But is that there because it’s still a good idea, or because there’s no impetus to remove it? Or even because it’s a bad idea, but removing it would be expensive?

As we’re already talking about Unix, let’s talk about C. In his talk Null References: The Billion-Dollar Mistake, Tony Hoare describes his own mistake as being the introduction of a null reference. He then says that C’s mistake (C follows Algol in having null references, but it also lacks have subscript bounds checking) is an order of magnitude worse. In fact, Hoare also identified a third problem: he says that it’s a good idea to permit a program failure to be diagnosed just from the error message and the high-level program source text. However, runtime failures in C usually end up with a core dump and/or a stack trace through the instructions of the target machine environment.

We can easily wonder just how much (expensive) programmer time has been lost disassembling stack traces, matching up debugger symbols and interpreting core dumps, but without figures for that I’ll generously assume that it’s an order of magnitude smaller than the losses due to buffer overflows. Now that’s only a tens-of-billions-of-dollars value of mistake, and C is the substrate for trillions of dollars of value of industry. So do we say that on balance, C is 99% a Good Thing™? Is it a bad idea that nonetheless enabled plenty of good ones?

[Incidentally, and without wanting to derail the central thesis of this post, I disagree with Hoare’s numbers. Symantec is merely one of the largest companies in the information security sector, with annual revenue in their most recent report of $6.9B. That’s a small part of the total value sunk into that sector, which I’ll guess has an annual magnitude of multiple tens of billions. A large fraction of the problems addressed by infosec can be attributed to C’s lack of bounds checking, so that there’s probably just an annual impact of around ten billion dollars working on fixing the problem. Assuming those businesses have sustainable revenues over multiple years, the integrated cost is well into the hundreds of billions. That only revises the estimated impact on the C software industry from ‘fractions of a per cent’ to ‘a per cent’ though.]

Perhaps it’s fair to say that C was a good idea when it arose, and that it’s since been found to have deficiencies that haven’t yet become expensive enough to warrant decommissioning it. There’s an assumption of rational action in there that I think it’s fair to question, though: am I assuming that C is not worth replacing just because it has not been replaced? Might there actually be other factors involved?

Yes, there might. It’s possible that there are organisations out there for whom C is more expensive than its worth, but where the sunk cost fallacy stops them from moving on. Or organisations who stick with C because their platform vendor gives them a C toolset, even where free or paid alternatives would be cheaper [in fact that would point to a difficulty with any holistic evaluation: that the cost to the people who provide development environments and the cost to the people who consume development environments depends on different factors, and the power in the market is biased towards a few large providers. Welcome to economics]. Or organisations who stick with C because of a perception of a large community of users, which is (perceived to be) more useful than striking out alone with better tools.

It’s also possible that moves in the other direction are based on non-rational factors: organisations that seek novelty rather than improvement, or who move away from C because a vendor convinces them that their alternative is better regardless of objective truth.

It turns out that the simple question we wanted to ask about applied computing: “What works?” leads to such a complex and maybe even chaotic system of forces acting in multiple dimensions that answering it will be very difficult. This doesn’t mean that an answer should not be sought, but that finding the answer will combine expertise from many different fields. Particularly, something that survives for a long time doesn’t necessarily work: it could just be that people are afraid of the alternatives, or haven’t really considered them.

How much programming language is enough?

Many programmers have opinions on programming languages. Maybe, if I present an opinion on programming languages, I can pass off as a programmer.

An old debate in psychology and anthropology is that of nature vs nurture, the discussion over which characteristics of humans and their personalities are innate and which are learned or otherwise transferred.

We can imagine two extremists in this debate turning their attention to programming languages. On the one hand, you might imagine that if the ability to write a computer program is somehow innate, then there is a way of expressing programming concepts that is closely attuned to that innate representation. Find this expression, and everyone will be able to program as fast as they can think. Although there’ll still be arguments over bracket placement, and Dijkstra will still tell you it’s rubbish.

On the other hand, you might imagine that the mind is a blank slate, onto which can be writ any one (or more?) of diverse patterns. Then the way in which you will best express a computer program is dependent on all of your experiences and interactions, with the idea of a “best” way therefore being highly situated.

We will leave this debate behind. It seems that programming shares some brain with learning other languages, and when it comes to deciding whether language is innate or learned we’re still on shaky ground. It seems unlikely on ethical grounds that Nim Chimpsky will ever be joined by Charles Babboonage, anyway.

So, having decided that there’s still an open question, there must exist somewhere into which I can insert my uninvited opinion. I had recently been thinking that a lot of the ceremony and complexity surrounding much of modern programming has little to do with it being difficult to represent a problem to a computer, and everything to do with there being unnecessary baggage in the tools and languages themselves. That is to say that contrary to Fred Brooks’s opinion, we are overwhelmed with Incidental Complexity in our art. That the mark of expertise in programming is being able to put up with all the nonsense programming makes you do.

From this premise, it seems clear that less complex programming languages are desirable. I therefore look admirably at tools like Self, io and Scheme, which all strive for a minimum number of distinct parts.

However, Clemens Szyperski from Microsoft puts forward a different argument in this talk. He works on the most successful development environment. In the talk, Szyperski suggests that experienced programmers make use of, and seek out, more features in a programming language to express ideas concisely, using different features for different tasks. Beginners, on the other hand, benefit from simpler languages where there is less to impede progress. So, what now? Does the “less is more” principle only apply to novice programmers?

Maybe the experienced programmers Szyperski identified are not experts. There’s an idea that many programmers are expert beginners, that would seem to fit Szyperski’s model. The beginner is characterised by a microscopic, non-holistic view of their work. They are able to memorise and apply heuristic rules that help them to make progress.

The expert beginner is someone who has simply learned more rules. To the expert beginner, there is a greater number of heuristics to choose from. You can imagine that if each rule is associated with a different piece of programming language grammar, then it’d be easier to remember the (supposed) causality behind “this situation calls for that language feature”.

That leaves us with some interesting open questions. What would a programming tool suitable for experts (or the proficient) look like? Do we have any? Alan Kay is fond of saying that we’re stuck with novice-friendly user experiences, that don’t permit learning or acquiring expertise:

There is the desire of a consumer society to have no learning curves. This tends to result in very dumbed-down products that are easy to get started on, but are generally worthless and/or debilitating. We can contrast this with technologies that do have learning curves, but pay off well and allow users to become experts (for example, musical instruments, writing, bicycles, etc. and to a lesser extent automobiles).

Perhaps, while you could never argue that common programming languages don’t have learning curves, they are still “generally worthless and/or debilitating”. Perhaps it’s true that expertise at programming means expertise at jumping through the hoops presented by the programming language, not expertise at telling a computer how to solve a problem in the real world.

ClassBrowser: warts and all

I previously gave a sneak peak of ClassBrowser, a dynamic execution environment for Objective-C. It’s not anything like ready for general use (in fact it can’t really do ObjC very well at all), but it’s at the point where you can kick the tyres and contribute pull requests. Here’s what you need to know:

Have a lot of fun!

ClassBrowser is distributed under the terms of the University of Illinois/NCSA licence (because it is based partially on code distributed with clang, which is itself under that licence).

A sneaky preview of ClassBrowser

Let me start with a few admissions. Firstly, I have been computering for a good long time now, and I still don’t really understand compilers. Secondly, work on my GNUstep Web side-project has tailed off for a while, because I decided I wanted to try something out to learn about the compiler before carrying on with that work. This post will mostly be about that something.

My final admission: if you saw my presentation on the ObjC runtime you will have seen an app called “ClassBrowser” where I showed that some of the Foundation classes are really in the CoreFoundation library. Well, there were two halves to the ClassBrowser window, and I only showed you the top half that looked like the Smalltalk class browser. I’m sorry.

So what’s the bottom half?

This is what the bottom half gives me. It lets me go from this:

ClassBrowser before

via this:

Who are you calling a doIt?

to this:

ClassBrowser after

What just happened?

You just saw some C source being compiled to LLVM bit code, which is compiled just-in-time to native code and executed, all inside that browser app.

Why?

Well why not? Less facetiously:

  • I’m a fan of Smalltalk. I want to build a thing that’s sort of a Smalltalk, except that rather than being the Smalltalk language on the Objective-C runtime (like F-Script or objective-smalltalk), it’ll be the Objective-C(++) language on the Objective-C runtime. So really, a different way of writing Objective-C.
  • I want to know more about how clang and LLVM work, and this is as good a way as any.
  • I think, when it actually gets off the ground, this will be a faster way of writing test-first Objective-C than anything Xcode can do. I like Xcode 5, I think it’s the better Xcode 4 that I always wanted, but there are gains to be had by going in a completely different direction. I just think that whoever strikes out in such a direction should not release their research project as a new version of Xcode :-).

Where can I get me a ClassBrowser?

You can’t, yet. There’s some necessary housekeeping that needs to be done before a first release, replacing some “research” hacks with good old-fashioned tested code and ensuring GNUstep-GUI compatibility. Once that’s done, a rough-and-nearly-ready first open source release will ensue.

Then there’s more work to be done before it’s anything like useful. Particularly while it’s possible to use it to run Objective-C, it’s far from pleasant. I’ve had some great advice from LLVM IRC on how to address that and will try to get it to happen soon.

By your _cmd

This post is a write-up of a talk I gave at Alt Tech Talks: London on the Objective-C runtime. Seriously though, you should’ve been there.

The Objective-C runtime?

That’s the name of the library of C functions that implement the nuts and bolts of Objective-C. Objects could just be represented as C structures, and methods could just be implemented as C functions. In fact they sort of are, but with some extra capabilities. These structures and functions are wrapped in this collection of runtime functions that allows Objective-C programs to create, inspect and modify classes, objects and methods on the fly.

It’s the Objective-C runtime library works out what methods get executed, too. The [object doSomething] syntax does not directly resolve a method and call it. Instead, a message is sent to the object (which gets called the receiver in this context). The runtime library gives objects the opportunity to look at the message and decide how to respond to it. Alan Kay repeatedly said that message-passing is the important part of Smalltalk (from which Objective-C derives), not objects:

I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea.

The big idea is “messaging” – that is what the kernal[sic] of Smalltalk/Squeak is all about (and it’s something that was never quite completed in our Xerox PARC phase). The Japanese have a small word – ma – for “that which is in between” – perhaps the nearest English equivalent is “interstitial”. The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be.

Msg send

Indeed in one article describing the Smalltalk virtual machine the programming technique is called the message-passing or messaging paradigm. “Object-Oriented” is used to describe the memory management system.

Throughout this talk and post I’m talking about the ObjC runtime, but there are many. They all support an object’s introspective and message-receiving capabilities, but they have different features and work in different ways (for example, Apple’s runtime sends messages in one step, but the GNU runtime looks up messages and then invokes the discovered function in two steps). All of the discussion below relates to Apple’s most modern runtime library (the one that is delivered as part of OS X since 10.5 and iOS).

In the talk, I decided to examine a few specific areas of the runtime library’s behaviour. I looked for things that I wanted to understand better, and came up with questions I wanted to answer as part of the talk.

Dynamic class creation

Can I implement Key-Value Observing?

While I was preparing the talk, a post called KVO considered harmful started to get a lot of coverage. That post raises a lot of valid criticisms of the Key-Value Observing API, but rather than throw away the Observer pattern I wanted to explore a new implementation.

The observed (pardon the pun) behaviour of KVO is to privately subclass the observed object’s class, so that it can customise the object’s behaviour to call the KVO callback. That’s done through a function called objc_duplicateClass, unfortunately the documentation tells us that we should not call this function ourselves.

It’s still possible to implement an Observer pattern that uses the same secret-subclass behaviour, by allocating and registering a “class pair”. What’s a class pair? Well each class in Objective-C is really two classes: the class object defines the instance methods, and the “metaclass” defines the class methods. So each class is really a singleton instance of its metaclass.

The ObserverPattern implementation shows how this works. When you add an observer to an object, the receiver first works out whether it’s an instance of the observable class. If it needs to create that class, it does so: adding our own implementations of -dealloc to clean up after ourselves, and -class so that, like KVO observable objects, the generated class name doesn’t appear when you ask an observed object its type.

Having created the class, the code goes on to add a setter for the conventional Key-Value Coding selector name for the property: this setter grabs the old and new values of the property and invokes the callback which was supplied as a block object. Because we can, the block is dispatched asynchronously.

Notice that the -addObserverForKey:withBlock: method uses object_setClass() to replace the receiver’s class with the newly-constructed class. The main effect of this is to change the way messages are resolved onto methods, but you need to be careful that the original and replaced class have the same instance variable layout too. Instance variables are looked up via the runtime too, and changing the class could alter where the runtime thinks the bytes are for any given variable.

We have a little extra hurdle to overcome in storing the collection of observer tokens, because there’s nowhere to put them. Adding an instance variable to the ObserverPattern[…] class would not work, as instances of that class are never actually allocated. The objects involved have the instance variables of their initial class, which won’t include space for the observers.

The Objective-C runtime provides for this situation by giving us associated objects. Any object can have what is, conceptually, a dictionary of other objects maintained by the runtime. Observed objects can store and retrieve their observer tokens via associated references, and no extra instance variables are needed.

A little problem in the ObserverPattern implementation will become clear if you run it enough times. The observation callbacks are sent asynchronously, and can be delivered out of sequence. That means the observer can’t actually tell what the final state of the observed key is, because the “new value” received in the callback might have already been replaced. I left this fun issue in to demonstrate that KVO’s synchronous implementation is a feature, not a bug.

Creating objects

What are those extra bytes for?

When you create an Objective-C object, the runtime lets you allocate some extra storage at the end of the space reserved for its instance variables. What’s the point of that? All you can do is get a pointer to the start of the space (using object_getIndexedIvars)…hmm, indexed ivars. Well, I suppose an array is a pretty obvious use of indexed ivars…

Let’s build NSArray! There are two things to see in SimpleArray: the most obvious is the use of the class cluster pattern. The reason is that the object returned from +alloc—where we’d normally allocate space for the object—cannot know how big it’s going to be. We need to use the arguments to -initWithObjects:count: to know how many objects there are in the array. So +alloc returns a placeholder, which is then able to allocate and return the real array object.

One obvious question to ask is why we’d do this at all. Why not just use calloc() to grab an appropriately-sized buffer in which to store the object pointers? The answer is to do with a low-level performance concern called locality of reference. We know from the design of the array class that pretty much every time the array pointer is used, the buffer pointer will be used too. Putting them next to each other in RAM means we don’t have to look off at some dereferenced pointer just to find another pointer.

Message dispatch

Just how does message forwarding work?

One of the powerful features of Objective-C is that an object doesn’t have to implement a method when it’s compiled to be able to respond to messages with that selector name. It can lazily resolve the methods, or it can forward them to another object, or it can raise an error, or it can do something else. But something about this feature was bugging me: message forwarding (which happens in the runtime) calls -forwardInvocation:, passing it an NSInvocation object. But NSInvocation is defined in Foundation: does the runtime library need to “know” about Foundation to work?

I tracked down what was going on and found that no, it does not need to know about Foundation. The runtime lets applications define the forwarding function, that gets called when objc_msgSend() can’t find the implementation for a selector. On startup, CoreFoundation[+] injects the forwarding function that does -forwardInvocation:. So presumably my application can do its own thing, right?

Let’s build Ruby! OK, not all of Ruby. But Ruby has a #method_missing function that gets called when an object receives a message it doesn’t understand, which is much more similar to Smalltalk’s approach than to Objective-C’s. Using objc_setForwardHandler, it’s possible to implement methodMissing: in our Objective-C classes.

Conclusion

The Objective-C runtime is a powerful way to add a lot of dynamic behaviour to an application for very little work. Some developers don’t use it much beyond swizzling methods for debugging, but it has facilities that make it a powerful tool for real application code too.


[+]CoreFoundation and Foundation are really siblings, and they each expose pieces of the other’s implementation, but one has a C interface and the other an Objective-C interface. Various Objective-C classes are actually part of CoreFoundation, including NSInvocation and the related NSMethodSignature class. NSObject is not in either of these libraries: it’s now defined in the runtime itself, so that the runtime’s memory management functions know about -retain, -release and so on[++]. On the other hand, most of the *behaviour* of NSObject is implemented by categories higher up. And, of course, this is all implementation detail and the locations of these classes could be (and are) moved between versions of the frameworks.

[++]Other languages like Smalltalk and Ruby have a simple base class that does nothing except know how to be an object, called ProtoObject or BaseObject. You could imagine the runtime supplying—and being coupled to—ProtoObject, and (Core)Foundation supplying NSObject and NSProxy as subclasses of ProtoObject.