Object-Oriented Programming in Objective-C

UIKonf 1995 Keynote : Object-Oriented Programming in Objective-C


Welcome to the keynote for UIKonf 1995. I’m really excited for what 1995 will bring. Customers are upgrading to last year’s OpenStep release, which means that we get to use the new APIs and the best platform around. And really, there are no competitors. OS/2 Warp doesn’t seem to be getting any more traction than previous versions, and indeed Microsoft seems to be competing against its own products with Windows. Their biggest release this year, the delayed Windows 93, looks like being a warmed-over version of MS-DOS and Windows 3, which certainly can’t be compared with a full Unix system like OpenStep. 1995 will go down in history as the year of NeXT on the desktop.

I want to talk about a crisis in software design, and that is the object-oriented crisis. Well, I suppose it isn’t really, it’s the procedural crisis again. But now we pretend that our procedural code is object-oriented, and Objective-C is the weapon that enacts this travesty.

What’s the main benefit of Objective-C over Smalltalk? It’s C. Rather than attempt to graft some foreign function interface onto Smalltalk and make us write adaptors for all of our C code, Brad Cox had the insight that he could write a simple message-sending library and a syntax preprocessor on the C language, and let us add all of the object-oriented programming we’d get from Smalltalk on top of our existing C code.

What’s the main drawback of Objective-C over Smalltalk? It’s C. Rather than being able to rely on the object-oriented properties of programs to help us understand them, we can just write a load of C code that we wrap up in methods and call it “object-oriented”. This is the source of our crisis. In 1992, Brad Cox claimed that Object-Oriented Programming (or Message/Object Programming as he also called it) was “the silver bullet” that Fred Brooks claimed didn’t exist. That it would help us componentise software into isolated units that can be plugged together, like integrated circuits bought from a catalogue and assembled into a useful product on a circuit board.

This idea of encapsulated software components is not new, and was presented at the NATO conference on software engineering in 1968. At that conference, Doug McIlroy presented an invited paper called “Mass-Produced Software Components” in which he suggested that the software industry would not be “industrialised” until it supported a subindustry of comprehensible, interchangeable, well-specified components. Here’s the relevant quote, the gendered pronouns are from the original.

The most important characteristic of a software components industry is that it will offer families of routines for any given job. No user of a particular member of a family should pay a penalty, in unwanted generality, for the fact that he is employing a standard model routine. In other words, the purchaser of a component from a family will choose one tailored to his exact needs. He will consult a catalogue offering routines in varying degrees of precision, robust- ness, time-space performance, and generality. He will be confident that each routine in the family is of high quality — reliable and efficient. He will expect the routine to be intelligible, doubtless expressed in a higher level language appropriate to the purpose of the component, though not necessarily instantly compilable in any processor he has for his machine. He will expect families of routines to be constructed on rational principles so that families fit together as building blocks. In short, he should be able safely to regard components as black boxes.

Is Object-Oriented Programming actually that silver bullet? Does it give us the black-box component catalogue McIlroy hoped for? Most of us will never know, because most of us aren’t writing Object-Oriented software.

I think this gulf between Object-Oriented design and principles as described during its first wave by people like Alan Kay, Brad Cox, and Bertrand Meyer is only going to broaden, as we dilute the ideas so that we can carry on programming in C. When I heard that a team inside Sun had hired a load of NeXT programmers and were working on a post-Objective-C OO programming environment, I was excited. We know about the problems in Objective-C, the difficulties with working with both objects and primitive types, and the complexity of allowing arbitrary procedural code in methods. When the beta of Oak came out this year I rushed to request a copy to try.

Java, as they’ve now renamed it, is just an easier Objective-C. It has garbage collection, which is very welcome, but otherwise is no more OO than our existing tools. You can still write C functions (called static methods). You can still write long, imperative procedures. There are still primitive types, and Java has a different set of problems associated with matching those up to its object types. Crucially, Java makes it a lot harder to do any higher-order messaging, which means that if it becomes adopted we’ll see a lot more C and a lot less OO.

The meat

I thought I’d write an object-oriented program in modern, OPENSTEP Objective-C, just to see whether it can even be done. I won’t show you all the code, but I will show you the highlights where the application looks strongly OO, and the depths where it looks a lot like C. I don’t imagine that you’ll instantly go away and rewrite all of your code, indeed many of you won’t like my style because it’s so different from usual Objective-C. My hope is that it gives you a taste of what OOP can be about, inspires you to reflect on your own approach, and encourages you to take more from other ideas about programming than some sugar that superficially wraps C.

Despite a suggestion from Marvin, the paranoid android, I’m going to talk to you about Life.


Always be returning

Every method in this application returns a value, preferably an object, except where that isn’t allowed due to assumptions made by the OpenStep frameworks. Where the returned object is of the same type, the client code should use the returned object, even if the implementation is to return self. Even things that look like mutators follow this pattern: with the effect that it doesn’t matter to a client whether an object is mutable or immutable because it’ll use it the same way. Compare a mutable object:

  foo = aFoo;
  return self;

with an immutable one:

  return [self copyWithReplacementFoo:aFoo];

and the client doesn’t know which it’s using:

[[myThing setFoo:aDifferentFoo] doSomeStuff];

That means it’s easy to change the implementation between these two choices, without affecting the client code (which doesn’t need to assume it can keep messaging the same object). In Life, I built a mutable Grid class, but observed that it could be swapped for an immutable one. I didn’t need to change the app or the tests to make it happen.

This obviously has its limitations, but within those limitations is a great approach. If multiple objects in a system share a collaborator, then you need to choose whether that collaborator is mutable (in which case all correlated objects see all updates, even those made elsewhere in the system) or immutable (in which case collaborating objects will only see their own updates, unless some extra synchronisation is in place). Using the same interface for both lets you defer this decision, or even try both to understand the implications.

Always be messaging

In this Life application there are no C control statements: no if, no for, no while. Everything is done by sending messages to objects. Two examples are relevant, and both come from the main logic of the “game”. If you’ve seen the rules of Life, you’ve probably seen them expressed in a form like this:

If a cell is dead and has X neighbours, it becomes living otherwise it remains dead. If a cell is living and has Y neighbours, it remains living otherwise it becomes dead.

It seems that there are three if statements to be written: one to find out whether a cell is living or dead, and another in each branch to decide what the next state should be.

The first can be dealt with using the class hierarchy. A class (at some theoretical level what I mean here is “a constructor”, but that term is not really used in Objective-C so I’m going to say “class” which is the closest term) can often be a substitute for an if statement, by replacing the condition with polymorphism.

In this case, the question “is a cell living or dead” can be answered by sending a message to either a DeadCell or a LivingCell. The question disappears, because it was sent to an object that pre-emptively knew the answer.

@interface LivingCell : Cell

@interface DeadCell : Cell

Now each can answer the question “what is my next state?” without needing to test what its current state is. But how to do that without that other if statement? There’s a finite number of possible outcomes, keyed on an integer (the number of living neighbours the cell has), which means it’s a question that can be answered by sending a message to an array. Here’s how the Cell does it.

-tickOnGrid:grid atX:(NSInteger)x y:(NSInteger)y;
  return [[self potentialStates]
         [self neighboursOnGrid:grid atX:x y:y]];

Each of the two subclasses of Cell knows what its potential states are:

static id nextStatesFromLiving;

  nextStatesFromLiving = [[NSArray alloc] initWithObjects:

-potentialStates { return nextStatesFromLiving; }

Why write the program like this? To make it easier to understand. There’s an idea in the procedural programming world that the cyclomatic complexity – the number of conditions and loops in a function – should be minimised. If all a method is doing is messaging other objects, the complexity is minimal: there are no conditional paths, and no loops.

My second example is how a loop gets removed from a method: with recursion, and appropriate choice of target object. One of the big bones of contention early on in NeXT’s use of Objective-C was changing the behaviour of nil. In Smalltalk, this happens:

nil open.
MessageNotUnderstood: receiver of "open" is nil

In Objective-C, nothing happens (by default, anyway). But this turns out to be useful. Just as the class of an object can stand in for conditional statements, the nil-ness of an object can stand in for loop termination. A method does its thing, and sends itself to the receiver of the next iteration. When that receiver becomes nil, the message is swallowed and the loop stops.

All loops in Life that are based on doing something a certain number of times are based on a category method on NSNumber.

@implementation NSNumber (times)

-times:target perform:(SEL)action
  return [self times:target perform:action withObject:nil];

-times:target perform:(SEL)action withObject:object
  return [[self nonZero] realTimes:target

-realTimes:target perform:(SEL)action withObject:object
  [target performSelector:action withObject:object];
  return [[NSNumber numberWithInteger:[self integerValue] - 1]
       times:target perform:action withObject:object];

  return ([self integerValue] != 0)?self:nil;


Notice that the conditional expression doesn’t violate the “no if statements” rule, because it’s an expression not a statement. There’s one thing that happens, that yields one of two values. The academic functional programming community recently rallied around the Haskell language, which provides the same distinction: no conditional statements, easy conditional expressions.

Never be sequencing

Related to the idea of keeping methods straightforward to understand is ensuring that they don’t grow too long. Ideally a method is one line long: it returns the result of messaging another object. Then there are a couple of slightly larger cases:

  • a fluent paragraph, in which the result of one message is the receiver for another message. If this gets too deep, though, your method probably has too many coupled concerns associated with a Law of Demeter violation.
  • something that would be a fluent paragraph, but you’ve introduced a local variable to make it easier to read (or debug; Objective-C debuggers are not good with nested messages).
  • a method sends a message, then returns another more relevant object. Common examples replace some collection object then return self.

Finally, there are times when you just can’t do this because you depend on some API that won’t let you. A common case when working with AppKit is methods that return void; you’re forced to sequence them because you can’t compose them. The worst offender in Life is the drawing code which uses AppKit’s supposedly object-oriented API, but which is really just a sequence of procedures wrapped in message-sending sugar (that internally use some hidden context that was set up on the way into the drawRect: method).

  float beginningHorizontal = (float)x/(float)n;
  float beginningVertical = (float)y/(float)n;
  float horizontalExtent = 1.0/(float)n;
  float verticalExtent = 1.0/(float)n;
  NSSize boundsSize = [view bounds].size;
  NSRect cellRectangle = 
           beginningHorizontal * boundsSize.width,
           beginningVertical * boundsSize.height,
           horizontalExtent * boundsSize.width,
           verticalExtent * boundsSize.height
  NSBezierPath *path =  [NSBezierPath bezierPathWithRect:cellRectangle];
  [[NSColor colorWithCalibratedWhite:[denizen population] alpha:1.0] set];
  [path stroke];
  [[NSColor colorWithCalibratedWhite:(1.0 - [denizen population])   alpha:1.0] set];
  [path fill];

The underlying principle here is “don’t make a load of different assignments in one place”, whether those are explicit with the = operator, or implicitly hidden behind message-sends. And definitely, as suggested by Bertrand Meyer, don’t combine assignment with answering questions.

Higher-order messages

You’ve already seen that Life’s loop implementation uses the target-action pattern common to AppKit views, and so do the views in the app. This is a great way to write an algorithm once, but leave it open to configuration for use in new situations.

It’s also a useful tool for reducing boilerplate code and breaking up complex conditional statements: if you can’t represent each condition by a different object, represent them each by a different selector. An example of that is the menu item validation in Life, which is all implemented on the App delegate.

  id action = NSStringFromSelector([menuItem action]);
  SEL validateSelector = NSSelectorFromString([@"validate"
  return [[self performSelector:validateSelector withObject:menuItem]

  return @(timer == nil);

  return @(timer != nil);

  return @(timer == nil);

  return @(YES);

We have four simple methods that document what menu item they’re validating, and one pretty simple method (a literate paragraph that’s been expanded with local variables) for choosing one of those to run.

There’s no message-forwarding needed in the Life app, and indeed you can do some good higher-order messaging without ever needing to use it. You could imagine a catch-all for unhandled -validate<menuitem>: messages in the above code, which might be useful.

Objective-C’s type checks are wrong for Objective-C

Notice that all of the examples I’ve shown here have used the NeXTSTEP <=3.3 style of method declaration, with (implicit) id return and parameter types. That’s not because I’m opposed to type systems, although I am opposed to using one that introduces problems instead of solving them.

What do I want to do with objects? I want to send messages to receivers. Therefore, if I have a method that sends a message to an object, what I care about is whether the recipient can handle that message (I don’t even care whether the handling is organised upfront, or at runtime). The OPENSTEP compiler’s type checks let me know whether an object is of a particular class, or conforms to a particular protocol, but both of these are more restrictive than the tests that I actually need. Putting the types in leads to warnings in correct code more than it leads to detection of incorrect code, so I leave them out.

Of course, as with Objective-C’s successes, Objective-C’s mistakes also have predecessors in the Smalltalk world. As of last year (1994) the Self language used runtime type-feedback to inline message-sending operations (something that Objective-C isn’t doing…yet, though I note that in Java all “message sends” are inline function calls instead) and there’s an ongoing project to rewrite the “Blue Book” library to allow the same in a Smalltalk environment.

What is really needed is a form of “structural type system”, where the shape of a parameter is considered rather than the name of the parameter. I don’t know of any currently-available object-oriented languages that have something similar to this, but I’ve heard INRIA are working on an OO variant of Caml that has structural typing.

But what about performance?

It’s important to separate object-oriented design, which is what I’ve advocated here, from object-oriented implementation, which is the code I ended up with as a result. What the design gave me was:

  • freedom to change implementations of various algorithms (there are about twice as many commits to production code in Life as there are to test code)
  • hyper-cohesive and decoupled objects or categories
  • very short and easy to understand methods

The implementation ends up in a particular corner of the phase space:

  • greater stack depths
  • lots of method resolutions
  • lots of objects
  • lots of object replacements

but if any of this becomes a problem, it should be uninvasive to replace an object or a cluster of objects with a different implementation because of the design choices made. This is where Objective-C’s C becomes useful: you can rip out all of that weird nested message stuff and just write a for loop if that’s better. Just because you shouldn’t start there, doesn’t mean you can’t end up there.

An example of this in Life was that pretty early in development, I realised the Cell class didn’t actually need any state. A Grid could tell a Cell what the surrounding population was, and the Cell was just the algorithm that encapsulated the game’s rules. That meant I could turn a Grid of individual Cells into a Flyweight pattern, so there are only two instances of Cell in the whole app.


I’m not so much worried for OOP itself, which is just one way of thinking about programs without turning them into imperative procedures. I’m worried for our ability to continue writing quality software.

Compare my list of OO properties with a list of functional programming properties, and you’ll see the same features: immutable values, composed functions, higher-order functions, effectless reasoning over effectful sequencing. The crisis of 1995 shows that we took our existing procedural code, observed these benefits of OOP, then carried on writing procedural code while calling it OOP. That, unsurprisingly, isn’t working, and I wouldn’t be surprised if people blame it on OOP with its mutable values, side-effecting functions, and imperative code.

But imagine what’ll happen when we’ve all had enough of OOP. We’ll look to the benefits some other paradigm, maybe those of functional programming (which are, after all, the same as the benefits of OOP), and declare how it’s the silver bullet just as Brad Cox did. Then we’ll add procedural features to functional programming languages, or the other way around, to reduce the “learning curve” and make the new paradigm accessible. Then we’ll all carry on writing C programs, calling them the new paradigm. Then that won’t work, and we’ll declare functional programming (or whatever comes next) as much of a failure as object-oriented programming. As we will with whatever comes after that.

Remember, there is no silver bullet. There is just complexity to be governed.

Getting better at doing it wrong

For around a month at the end of last year, I kept a long text note called “doing doing it wrong right”. I was trying to understand error handling in programming, look at some common designs and work out a plan for cleaning up some error-handling code I was working with myself (mercifully someone else, with less analysis paralysis, has taken on that task now).

Deliciously the canonical writing in this field is by an author with the completely apt name Goodenough. His Structured Exception Handling and Exception Handling: Issues and a Proposed Notation describe the problem pretty completely and introduce the idea of exceptions that can be thrown on an interesting condition and caught at the appropriate level of abstraction in the caller.

As an aside, his articles show that exception handling can be used for general control flow. Your long-running download task can throw the “I’m 5% complete now” exception, which is caught to update the UI before asking the download to continue. Programming taste moved away from doing that.

In the Cocoa world, exceptions have never been in favour, probably because they’re too succinct. In their place, multi-if statement complex handling code is introduced:

NSError *error = nil;
id thing = [anObject giveMeAThing:&error];
if (!thing) {
  [self handleError:error];
id otherThing = [thing doYourThing:&error];
if (!otherThing) {
  [self handleError:error];
id anotherThing = [otherThing someSortOfThing:&error];

…and so it goes.

Yesterday in his NSMeetup talk on Swift, Saul Mora reminded me of the nil sink pattern in Objective-C. Removing all the error-handling from the above, a fluent (give or take the names) interface would look like this:

id anotherThing = [[[anObject giveMeAThing] doYourThing] someSortOfThing];

The first method in that chain to fail would return nil, which due to the message-sink behaviour means that everything subsequent to it preserves the nil and that’s what you get out. Saul had built an equivalent thing with option types, and a function Maybe a -> (a -> Maybe b) -> Maybe b to hide all of the option-unwrapping conditionals.

Remembering this pattern, I think it’s possible to go back and tidy up my error cases:

NSError *error = nil;
id anotherThing = [[[anObject giveMeAThing:&error]
if (!anotherThing) {
  [self handleError:error];

Done. Whichever method goes wrong sets the error and returns nil. Everything after that is sunk, which crucially means that it can’t affect the error. As long as the errors generated are specific enough to indicate what went wrong, whether it’s possible to recover (and if so, how) and whether anything needs cleaning up (and if so, what) then this approach is…good enough.

The Design of the Bazaar

In The Design of Design, Fred Brooks makes an interesting point about ESR’s description of the Bazaar model of Linux (and, by extension, “Open Source”) development.

Linux was actually designed in a cathedral. The design was supplied by Unix, where Linux was to be a work-alike replacement for a particular component. There was even a functional specification: the GNU utilities already existed and the kernel had to support them.

Hiding behind messages

A problem I think about every so often is how to combine the software design practice of hiding implementations behind interfaces with the engineering practice of parallel execution. What are the trade-offs between making parallelism explicit and information hiding? Where are some lines that can be drawn?

Why do we need abstractions for this stuff, when the libraries we have already work? Those libraries make atomic features like threads and locks explicit, or they make change of control flow explicit, and those are things we can manage in one place for the benefit of the rest of our application. Nobody likes to look at the dispatch terrace:


Previous solutions have involved object confinement, where every object has its own context and runs its code there, and the command bus, where you ask for work to be done but don’t get to choose where.

Today’s solution is a bit more explicit, but not much. For every synchronous method, create an asynchronous version:

@interface MyObject : Awaitable

- (NSString *)expensiveCalculation;
- async_expensiveCalculation;


@implementation MyObject

- (NSString *)expensiveCalculation
  return @"result!";


int main(int argc, const char * argv[]) {
  @autoreleasepool {
    MyObject *o = [MyObject new];
    id asyncResult = [o async_expensiveCalculation];
    // you could explicitly wait for the calculation...
    NSLog(@"Result initial: %@", [[asyncResult await]   substringToIndex:1]);
    // ...but why bother?
    NSLog(@"Shouty result: %@", [asyncResult uppercaseString]);
  return 0;

This is more of a fork-and-join approach than fire and forget. The calling thread carries on running until it actually needs the result of the calculation, at which point it waits (if necessary) for the callee to complete before continuing. It’ll be familiar to programmers on other platforms as async/await.

The implementation is – blah blah awesome power of the runtime – a forwardInvocation: method that looks for messages with the async marker, patches their selector and creates a proxy object to invoke them in the background. That proxy is then written into the original invocation as its return value. Not shown: a pretty straightforward category implementing -[NSInvocation copyWithZone:].

@implementation Awaitable

- (SEL)suppliedSelectorForMissingSelector:(SEL)aSelector
  NSString *selectorName = NSStringFromSelector(aSelector);
  NSString *realSelectorName = nil;
  if ([selectorName hasPrefix:@"async_"]) {
    realSelectorName = [selectorName substringFromIndex:6];
  return NSSelectorFromString(realSelectorName);

- (NSMethodSignature *)methodSignatureForSelector:(SEL)aSelector
  NSMethodSignature *aSignature =   [super methodSignatureForSelector:aSelector];
  if (aSignature == nil) {
    aSignature =    [super methodSignatureForSelector:[self suppliedSelectorForMissingSelector:aSelector]];
  return aSignature;

- (void)forwardInvocation:(NSInvocation *)anInvocation
  SEL trueSelector =    [self suppliedSelectorForMissingSelector:[anInvocation selector]];
  NSInvocation *cachedInvocation = [anInvocation copy];
  [cachedInvocation setSelector:trueSelector];
  CBox *box = [CBox cBoxWithInvocation:cachedInvocation];
  [anInvocation setReturnValue:&box];


Why is the proxy object called CBox? No better reason than that I built this while reading a reflection on Concurrent Smalltalk where that’s the name of this object too.

@interface CBox : NSProxy

+ (instancetype)cBoxWithInvocation:(NSInvocation *)inv;
- await;


@implementation CBox
  NSInvocation *invocation;
  NSOperationQueue *queue;

+ (instancetype)cBoxWithInvocation:(NSInvocation *)inv
  CBox *box = [[self alloc] init];
  box->invocation = [inv retain];
  box->queue = [NSOperationQueue new];
  NSInvocationOperation *op = [[NSInvocationOperation alloc]    initWithInvocation:inv];
  [box->queue addOperation:op];
  return [box autorelease];

- init
  return self;

- await
  [queue waitUntilAllOperationsAreFinished];
  id returnValue;
  [invocation getReturnValue:&returnValue];
  return [[returnValue retain] autorelease];

- (void)dealloc
  [queue release];
  [invocation release];
  [super dealloc];

- forwardingTargetForSelector:(SEL)aSelector
  return [self await];


You don’t always need some huge library to clean things up. Here are about 70 lines of Objective-C that abstract an implementation of concurrent programming and stop my application code from having to interweave the distinct responsibilities of what it’s trying to do and how it’s trying to do it.

On opinionation

I’ve realised that when I read that a tool or framework is “opinionated”, I interpret that as meaning that I’m going to have to spend time on working out how to express my solution in its terms. I have enough trouble trying to work out how to express my solution in terms of my problem, so I’m probably going to avoid that tool or framework.

…and in the end there will be the command line.

You’re pretty happy with the car that the dealer is showing you. It looks comfortable, stylish, and has all of the features you want. There’s a lot of space in the trunk for your luggage. The independent reviews that you’ve seen agree with the marketing literature: once this vehicle gets out onto the open road, it’s nippy and agile and a joy to drive.

You can’t help but think that she isn’t being completely open with you though. To get into the roomy interior and luxurious driver’s seat, you have to climb over a huge black box, twice the height of the cabin itself and by far the longest part of the car. Not to detract from the experience, the manufacturers have put in an automatic platform that lifts you from the ground to the door and returns you gently to earth. But the box is still there.

You ask the dealer about this box, and initially she deflects your questions by talking about the excellent mileage, which is demonstrated by the SpecRoad 2000 report. Then she tells you how great the view of the road is from the high situation of the driver’s seat. Eventually, you ask enough times, and she relents.

“That’s just the starter,” she explains, fiddling with a catch on the door in the rear of the box. “It’s just used to get the petrol motor going, but you don’t need to worry about it. Well, not much.”

Finally, she frees the catch and opens the box. To your astonishment, inside the box are four horses, sullenly eating grain from their nosebags and pawing their hooves on the ground. You can see that they are reined into a system that pulls the rear axle of the car as if it were an old-style carriage. The dealer continues.

“As I said, these cars just use the horses to initially pull the car along until the engine starts up and takes over. It’s how we’ve always built our cars, by layering the modern components over the traditional carriage system. Because the horse-and-carriage arrangement is so stable having been perfected over decades, we can use it as a solid base for our high-tech automobiles. You really won’t notice that it’s there. We send out new grain and clear up any, um, exhaust automatically, so it’s completely invisible. OK every so often one of the horses gets sick or needs re-shoeing and then you can’t use your car at all, but that’s pretty rare. Mostly.”

Again, your curiosity is getting the better of you. In the front of the horses’ cabin, leather reins run from the two leading horses to another boxed-off area. The dealer sees you looking at it, and tries to lead you back out to the showroom, but you persist. With a resigned sigh, she opens yet another hatch into this deeper chamber.

Inside, you are astonished to see a man holding the reins, ready to pull the horses along. “Something has to get the horses started,” the dealer explains, “and this is how we’ve always done it. Our walking technology is even more robust than our horse-drawn system. Don’t worry about any of that though, let me show you the independent temperature zones in the car’s climate control system.”

That’s how it works

In the dim and distant past, barely 672 days after time itself began, the Unix time-sharing system was introduced to the world. It’s a thing for big computers that lets multiple people use them at the same time, without getting in each others’ way. It might not have been the most capable system (which would’ve been Multics, the system which Unix was based on), but due to the fact that AT&T weren’t allowed to sell it, Unix did become popular. By the time this happened, Unix had been rewritten in C so the combination of C, Unix, and tools written atop like roff were what became popular.

Eventually, as small computers became more powerful, they became capable of running C and Unix too. And so they did. People designed processors that were optimised for Unix, other people designed computers that used these processors, and other people brought Unix to these computers. Each workstation itself may have only had a single user, but they were designed to be used together on a network. As the designers had decided that the network is the computer, and the network did have multiple users, it was still a multi-user system, and so the quotas and protections of a time-sharing system still made sense.

Onward and downwards, Unix marched inexorably. As it did so, it dragged its own history with it. As the extremities of Europe became the backdrop to large stone columns with Latin-inscribed capitals, so ever-smaller computers found themselves the backdrop to the Unix kernel and shell. To get there, the biological and technological distinctiveness of each new environment had to be added to Unix’s own.

Compare the Unix workstation to the personal computer. A Unix workstation was designed to run Unix, so its ROM program could look for file systems, find one with the /vmunix program on it, and run that program. The PC was designed…well, it’s not clear what it was designed for, though it was likely to do the same things that CP/M could do on other small computers. If you don’t have an operating system, many of them will give you the infamous NO ROM BASIC message.

Regardless, the bootstrap program in a PC’s ROM certainly isn’t looking for a Unix, or an NT OS kernel, or anything else in particular. It just wants to run whatever comes next. So it looks for a program called the secondary bootloader, and runs that. Then the secondary bootloader itself looks around for the filesystem with /vmlinuz or whatever the Unix (or Unix-like) boot file is called, and runs that.

Magnify and Enhance

The story doesn’t end at the kernel. Getting there, the kernel discovers the hardware available (even though this has been done once or twice already) and then gets on with one of its functions, which is to be a bootloader for a Unix program. Whether that program is initor some newer replacement, that has to start before the computer is properly running a Unix.

One of init‘s tasks is to start up the Unix programs that you want running on the computer, the launch procedure is still not complete. init might follow the instructions in a script called rc, or it could use all the scripts in a folder called init.d or SystemStarter, or it could launch svc.startd and let that decide what to start, or maybe something different happens. Once that procedure has run to completion, the computer is probably doing whatever it was that you bought it for, or at least waiting for you to tell it what that is.


So many different computers go through that complex process – servers, desktops, laptops, mobile phones, tablets, network routers, watches, television receivers, 3D printers. If you have an idea for a novel application of computing hardware, the first step is to stand back and protect your ears from the whomp of four decades of history being dumped in a huge black box on the computer, then you can get cracking.

You want to make a phone? A small device to be used for real-time communication by a single person? whomp comes the megalith.

You want to make a web server? A computer usually dedicated to running three functions (converting input into database requests, converting database responses into output, and tracking which input deserves which output)? whomp comes the megalith.

You want a network appliance? Something that nobody’s going to use at all, that sits in the corner turning 802.11 datagrams into 802.3 datagrams? whomp comes the megalith.

There’s not much point looking at Unix as an architecture or a system of interdependent components in these applications. whomp. It’s a big black box that can be used to get other boxes moving, like the horses used to start a car’s engine. In the 1980s and into the beginning of the 1990s, there were arguments about whether monolithic kernels were better than microkernels. Now, these arguments are redundant: the whole of Unix is itself a megakernel for OS X, Android, iOS, Firefox OS, your routers, network switches and databases.

But the big black box is black because of what’s found at the top of the megalith. It’s a tar pit, sucking in the lower layers of whatever’s perched above. Yesterday, a Unix system would’ve been programmed via the Bourne Shell, a sort of dynamic compromise for the lack of message-passing in C. Today, once the dust has cleared from the whomp, you can see that the Bourne shell is accompanied in the softer layers of tar by Tcl, Perl, Python, Ruby, and other once high-flying programs that got too close to the pit.

Why that’s good

The good news is that Unix isn’t particularly broken. Typically a computer based on Unix can remain working for at least long enough that either the batteries run out or a software update means you have to turn it off anyway.

Because Unix is everywhere, everybody knows Unix. Or they know something that was once built on Unix and has been subsumed into Unix, the remains of which can just be seen and touched in the higher strata of the tar. Maybe they only really know how to generate JSON structures in Ruby, but that’s OK because your next-generation doorbell will have a Ruby interpreter deposited with the whomp of the megalith.

And if something isn’t particularly broken, then there’s not much point in throwing it away for something new. Novelty for its own sake was the death of Taligent, the death of Be, and the death of countless startups and projects who want to do something like X, but newer.

Why that’s bad

The bad news is that Unix is horrendously broken. You can have a supposedly safe runtime environment for your program, but the bottom of this environment is sticking into the tar pit that is C and Unix. Your program can still get into trouble because it’s running on Java and Java is written in C and C is where the trouble comes from.

The idea is that you stay at the top of the megalith, and it just starts your computer and stops you from worrying about the low-altitude parts of the machine. That’s only roughly true though, and lower-down pieces of the megalith sometimes prove themselves to have crumbled under weathering and the pressure from the weight above. If your computer has experienced a kernel panic in the last year, it’s probably because the graphics driver wasn’t very well-written. That’s a prop that has to be inserted into the bottom of the megalith to keep it upright, but people make those props out of balsa wood and don’t check the size of the holes they need to fit the props into.

Treating Unix as the kernel of your modern system means ignoring the fact that Unix is itself a whole operating system, and that your UEFI boot process also loaded another other operating system just to get that other operating system to load your operating system. The outer system displays inner-system problems, being constrained by the same constraints that your Unix flavour imposes. Because Unix is hidden, these become arbitrary-seeming constraints that your developers simply know as always having been there.

What should be done

A couple of decades ago, there were people who knew that PC operating systems like Mac OS and MS-DOS weren’t particularly good, and needed replacing. Some of them looked with envy at the smooth megalith that was Unix, and whomp here it arrived on their desktop machines: MkLinux, Debian, NeXTStep, Solaris, 386BSD and others. Others thought that the best approach was to start again with systems designed to support the desktop paradigm and using modern design techniques and technology advances: they made BeOS, Windows NT and others.

Systems like this (including modern BeOS-inspired Haiku, and Amiga-inspired AROS) are typically described by their project politburos as “efficient”, “lean” and other words generally considered to be antonymical to “a GNU distribution”.

They also tend to have few users in comparison to Mac OS X, GNU and other systems. Partly this is just a marketing concern, that’s irrelevant when such systems are free: if the one that works for you works for you, it shouldn’t matter how many other people it also works for. In practice there is a serious consideration to the install base. The more people who use an operating system, the more people there are who want applications for that system and therefore (hopefully) more people will want to write applications for that system.

If Linus’s Law (that many eyes make bugs shallow; a statement of wishful thinking that should actually be attributed to Eric S. Raymond) actually held true, then one might expect that more popular systems would suffer fewer bugs. Perhaps more popular systems end up with higher expectations, and therefore gain newer features faster, thus gaining bugs faster than people could fix them?

Presumably as the only point to Unix these days is to be a stable stratum on which to layer other things, there are numerous companies and individuals who would benefit from it being stable. We can accept that all of this complexity is going nowhere except upward, and that the megalith will continue to grow inexorably as more components fall into the tar pit. With that being the case, all of the companies and individuals involved could standardise on a single implementation of the megalith. They could all shore up the same foundations and fix the same cracks.

What I think I want to do

I often choose to rank potential solutions to technical problems in a two-dimensional graph, because if you can reduce any difficult question down to four quadrants then you can make a killing as a consultant. In this case, the axes are political acceptability and technical quality.

+-------------------+-------------------+ T
|                   |                   | e
|  Awkward  genius  |     Slam-dunk     | c
|                   |                   | h
+-------------------+-------------------+ n
|                   |                   | i
|   Feverish rant   | Saleable band-aid | c
|                   |                   | a
+-------------------+-------------------+ l

A completely new system might be a great idea technically, but is unlikely to get any traction. There may be all sorts of annoying problems that make current systems a bit disappointing, but no-one’s suffering badly enough to consider a kill or cure option. The conditions for a radically novel system becoming snapped up by an incumbent to replace their existing technical debt don’t really exist, and haven’t for decades (Commodore bought Amiga to get their new system, but in the 1990s Apple just needed a system that was already a warmed-over workstation Unix).

In fact despite the view of the software sector being a high-tech industry, it’s both socially and technologically very conservative. It’s rare for completely new ideas to take hold, and what’s taken for progress can often be seen more realistically as a partially-directed form of Brownian motion. As already discussed, this isn’t completely bad, because it stops new risks being introduced. The counterpoint to that melody is that it stops old risks from being removed, too.

Getting a lot of developer traction around a single Unix system therefore has a higher likelihood, in fact it’s already happened. It’s not necessarily the best approach technically, because it means rather than replacing that huge megalith we just agreed was a (very large) millstone, we resign ourselves to patching up and stabilising the same megalith together. Given that one penguin-based megalith is already used in far more contexts than any other, this seems more likely to be acceptable to more people beset by the crumbly megalith problem.

There’s room in the world for both solutions, too. What I call a more acceptable solution is really just easier to accept now, and the conditions can change over time. Ignoring the crumbling megalith could eventually produce a crisis, and slicing the Gordian knot could then be an acceptable solution. Until that crisis hits, there will be the kernel, the command-line, and the continuing echos of that original, deafening whomp.

Layers of Distraction

A discussion I was involved in over on Facebook reminded me of some other issues I’d already drafted for this blog, so I stuck the two together and here we are.

Software systems can often be seen as aggregations of strata, with higher layers making use of the services in the lower layers. You’ll often see a layered architecture diagram looking like a flat and well-organised collection of boiled sweets.

As usual, it’s the interstices rather than the objects themselves that are of interest. Where two layers come together, there’s usually one of a very small number of different transformations taking place. The first is that components above the boundary can express instructions that any computer could run, and they are transformed into instructions suitable for this computer. That’s what the C compiler does, it’s what the x86 processor does (it takes IA-32 instructions, which any computer could run, and turns them into the microcode which it can run), it’s what device drivers do.

The second is that it turns one set of instructions any computer could run into another set that any computer could run. If you promise not to look too closely the Smalltalk virtual machine does this, by turning instructions in the Smalltalk bytecode into instructions in the host machine language.

The third is that it turns a set of computer instructions in a specific domain into the general-purpose instructions that can run on the computer (sometimes this computer, sometimes any computer). A function library turns requests to do particular things into the machine instructions that will do them. A GUI toolkit takes requests to draw buttons and widgets and turns them into requests to draw lines and rectangles. The UNIX shell turns an ordered sequence of suggestions to run programs into the collection of C library calls and machine instructions implied by the sequence.

The fourth is turning a model of a problem I might want solving into a collection of instructions in various computer domains. Domain-specific languages sit here, but usually this transition is handled by expensive humans.

Many transitions can be found in the second and third layers, so that we can turn this computer into any computer, and then build libraries on any computer, then build a virtual machine atop those libraries, then build libraries for the virtual machine, then build again in that virtual machine, then finally put the DOM and JavaScript on top of that creaking mess. Whether we can solve anybody’s problems from the top of the house of cards is a problem to be dealt with later.

You’d hope that from the outside of one boundary, you don’t need to know anything about the inside: you can use the networking library without needing to know what device is doing the networking, you can draw a button without needing to know how the lines get onto the screen, you can use your stock-trading language without needing know what Java byte codes are generated. In other words, both abstractions and refinements do not leak.

As I’ve gone through my computing career, I’ve cared to different extents about different levels of abstraction and refinement. That’s where the Facebook discussion came in: there are many different ways that a Unix system can start up. But when I’m on a desktop computer, I not only don’t care which way the desktop starts up, I don’t want to have to deal with it. Whatever the relative merits of SMF, launchd, SysV init, /etc/rc, SystemStarter, systemd or some other system, the moment I need to even know which is in play is the moment that I no longer want to use this desktop system.

I have books here on processor instruction sets, but the most recent (and indeed numerous) are for the Motorola 68k family. Later than that and I’ll get away with mostly not knowing, looking up the bits I do need ad hoc, and cursing your eyes if your debugger drops me into a disassembly.

So death to the trope that you can’t understand one level of abstraction (or refinement) without understanding the layers below it. That’s only true when the lower layers are broken, though I accept that that is probably the case.

The next phase in technological convergence will be harder than the last, because it can’t be solved with technology. Last time the devices converged, some phone makers just needed to buy a photoelectric detector, a lens, and licenses for some MP3 patents.

But how can the various tab-sized computers I carry – my bank cards, SIMs, passport, building door card, transport cards, office ID – be integrated, when they mean different things to different people? Technologically, it’s really bad to keep them separate because you can’t just hold a bag of RFID tokens up to a reader and expect the right thing to happen. Socially, it’s really bad to converge them; I can’t imagine my bank, employer, train conductor and government all agreeing to the same terms on identifying me, nor would one company be an acceptable clearing house for all of that so rather than N distinct tokens we’d end up with N tokens you can use Anywhere™*.

More on Layers

I was told yesterday that entity-relationship diagrams can be OK as high level descriptions of database schemata, but are not appropriate for designing a database. Enough information is missing that they are not able to model the problem.

Could the same be true of layer diagrams? Perhaps they’re OK as stratospheric guides to your software, but not useful for designing that software as too much information is missing.

I don’t think that’s the case, though. Layer diagrams tend to be pretty much interchangeable between systems, so that I can’t really tell which system I’m looking at from the layer cake. Add the difficulty that I probably can’t tell how the layers communicate, and certainly can’t tell how the subsystems within the layers are composed. All I can tell is that you like drawing boxes, but not too many boxes.