Conflicts in my mental model of Objective-C

My worldview as it relates to the writing of software in Objective-C contains many items that are at odds with one another. I either need to resolve them or to live with the cognitive dissonance, gradually becoming more insane as the conflicting items hurl one another at my cortex.

Of the programming environments I’ve worked with, I believe that Objective-C and its frameworks are the most pleasant. On the other hand, I think that Objective-C was a hack, and that the frameworks are not without their design mistakes, regressions and inconsistencies.

I believe that Objective-C programmers are correct to side with Alan Kay in saying that the designers of C++ and Java missed out on the crucial part of object-oriented programming, which is message passing. However I also believe that ObjC missed out on a crucial part of object-oriented programming, which is the compiler as an object. Decades spent optimising the compile-link-debug-edit cycle have been spent on solving the wrong problem. On which topic, I feel conflicted by the fact that we’ve got this Smalltalk-like dynamic language support but can have our products canned for picking the same selector name as some internal secret stuff in someone else’s code.

I feel disappointed that in the last decade, we’ve just got tools that can do the same thing but in more places. On the other hand, I don’t think it’s Apple’s responsibility to break the world; their mission should be to make existing workflows faster, with new excitement being optional or third-party. It is both amazing and slightly saddening that if you defrosted a cryogenically-preserved NeXT application programmer, they would just need to learn reference counting, blocks and a little new syntax and style before they’d be up to speed with iOS apps (and maybe protocols, depending on when you threw them in the cooler).

Ah, yes, Apple. The problem with a single vendor driving the whole community around a language or other technology is that the successes or failures of the technology inevitably get caught up in the marketing messages of that vendor, and the values and attitudes ascribed to that vendor. The problem with a community-driven technology is that it can take you longer than the life of the Sun just to agree how lambdas should work. It’d be healthy for there to be other popular platforms for ObjC programming, except for the inconsistencies and conflicts that would produce. It’s great that GNUstep, Cocotron and Apportable exist and are as mature as they are, but “popular” is not quite the correct adjective for them.

Fundamentally I fear a world in which programmers think JavaScript is acceptable. Partly because JavaScript, but mostly because when a language is introduced and people avoid it for ages, then just because some CEO says all future websites must use it they start using it, that’s not healthy. Objective-C was introduced and people avoided it for ages, then just because some CEO said all future apps must use it they started using it.

I feel like I ought to do something about some of that. I haven’t, and perhaps that makes me the guy who comes up to a bunch of developers, says “I’ve got a great idea” and expects them to make it.

Representativeness in Software Engineering Research

The first paragraph describes the context of this post in relation to the blog on which it originally appeared, not blog.securemacprogramming.com.

For this post, I wanted to go a little bit meta. One focus of this blog will be on whether results from academic software engineering are applicable to the work I do as a commercial software developer, so it was hard to pass up this Microsoft Research paper on representativeness of research.

In a nutshell, the problem is this: imagine that you read some study that shows, for example, that schedule slippage on a software project is significantly lessened if developers are given two digestive biscuits and a cup of tea at 4pm on working days. The study examined the breaktime habits of developers on 500 open source projects. This sounds quite convincing. If this thing with the tea and biscuits is true across five hundred projects, it must be applicable to my project, right?

That doesn’t follow. There are many reasons why it might not follow: the study may be biased. The analysis may be wrong. The biscuit thing may be significant but not the root cause of success. The authors may have selected projects that would demonstrate the biscuit outcome. Projects that had initially signed up but got delayed might have dropped out. This paper evaluates one cause of bias: the projects used in the study aren’t representative of the project you’re doing.

It’s a fallacy to assume that just because a study has a large sample size, its results can be generalised to the population. This only applies in the case that the sample represents an even slice of the population. Imagine a world in which all software projects are either written in Java or LISP. Now it doesn’t matter whether I select 10 projects or 10,000 projects: a sample of LISP practices will not necessarily tell us anything about how to conduct a Java project.

Conversely a study that investigates both Java and LISP projects can—in this restricted universe, and with the usual “all other things being equal” caveat—tell us something generally about software projects independent of language. However, choice of language is only one dimension in which software can be measured: the size, activity, number of developers, licence and other factors can all be relevant. Therefore the possible phase space of important factors can be multidimensional.

In this paper the authors develop a framework, based on work in medicine and other fields, for measuring and maximising representativeness of a sample by appropriate selection of projects along the dimensions of the problem space. They apply the framework to recent research.

What they discovered, tabulated in Table II of the paper, is that while a very small, carefully-selected sample can be surprisingly representative (50 out of ~20k projects represented ~15% of their problem space), the ~200 projects they could find analysed in recent research only scored around 9% on their representativeness metric. However in certain dimensions the studies were highly representative, many covering 100% of the phase space in specific dimensions.

Conclusions

A fact that jumped out at me, because of the field I work in, is that there are 245 Objective-C projects in the universe studied by this paper (the projects indexed on Ohloh) and that not one of these is covered by any of the studies they analysed. That could mean that my own back yard is ripe for the picking: that there are interesting results to be determined by analysing Objective-C projects and comparing those results with received wisdom.

In the discussion section, the authors point out that just because a study is not general, does not mean it is not useful. You may not be able to generalise a result gleaned from analysing (say) Java developer tools to all software development, but if what you’re interested in is Java developer tools then that is not a problem.

What this paper gives us, then, is not necessarily a tool that commercial developers can run out and use. It gives us some important quantitative context on evaluating the research that we do read. And, should we want to analyse our own work and investigate hypotheses about factors affecting our projects, it gives us a framework to understand just how representative those analyses would be.

At the old/new interface: jQuery in WebObjects

It turns out to be really easy to incorporate jQuery into an Objective-C WebObjects app (targeting GNUstep Web). In fact, it doesn’t really touch the Objective-C source at all. I defined a WOJavascript object that loads jQuery itself from the application’s web server resources folder, so it can be reused across multiple components:

jquery_script:WOJavaScript {scriptFile="jquery-2.0.2.js"}

Then in the components where it’s used, any field that needs uniquely identifying should have a CSS identifier, which can be bound via WebObjects’s id binding. In this example, a text field for entering an email address in a form will only be enabled if the user has checked a “please contact me” checkbox.

email_field:WOTextField {value=email; id="emailField"}
contact_boolean:WOCheckBox {checked=shouldContact; id="shouldContact"}

The script itself can reside in the component’s HTML template, or in a WOJavascript that looks in the app’s resources folder or returns javascript that’s been prepared by the Objective-C code.

    <script>
function toggleEmail() {
    var emailField = $("#emailField");
    var isChecked = $("#shouldContact").prop("checked");
    emailField.prop("disabled", !isChecked);
    if (!isChecked) {
        emailField.val("");
    }
}
$(document).ready(function() {
    toggleEmail();
    $("#shouldContact").click(toggleEmail);
});
    </script>

I’m a complete newbie at jQuery, but even so that was easier than expected. I suppose the lesson to learn is that old technology isn’t necessarily incapable technology. People like replacing their web backend frameworks every year or so; whether there’s a reason (beyond caprice) warrants investigation.

The code you wrote six months ago

We have this trope in programming that you should hate the code you wrote six months ago. This is a figurative way of saying that you should be constantly learning and assimilating new ideas, so that you can look at what you were doing earlier this year and have new ways of doing it.

It would be more accurate, though less visceral, to say “you should be proud that the code you wrote six months ago was the best you could do with the knowledge you then had, and should be able to ways to improve upon it with the learning you’ve accomplished since then”. If you actually hate the code, well, that suggests that you think anyone who doesn’t have the knowledge you have now is an idiot. That kind of mentality is actually deleterious to learning, because you’re not going to listen to anyone for whom you have Set the Bozo Bit, including your younger self.

I wrote a lot about learning and teaching in APPropriate Behaviour, and thinking about that motivates me to scale this question up a bit. Never mind my code, how can we ensure that any programmer working today can look at the code I was writing six months ago and identify points for improvement? How can we ensure that I can look at the code any other programmer was working on six months ago, and identify points for improvement?

My suggestion is that programmers should know (or, given the existence of the internet, know how to use the index of) the problems that have already come before, how we solved them, and why particular solutions were taken. Reflecting back on my own career I find a lot of problems I introduced by not knowing things that had already been solved: it wasn’t until about 2008 that I really understood automated testing, a topic that was already being discussed back in 1968. Object-oriented analysis didn’t really click for me until later, even though Alan Kay and a lot of really other clever people had been working on it for decades. We’ll leave discussion of parallel programming aside for the moment.

So perhaps I’m talking about building, disseminating and updating a shared body of knowledge. The building part already been done, but I’m not sure I’ve ever met anyone who’s read the whole SWEBOK or referred to any part of it in their own writing or presentations so we’ll call the dissemination part a failure.

Actually, as I said we only really need an index, not the whole BOK itself: these do exist for various parts of the programming endeavour. Well, maybe not indices so much as catalogues; summaries of the state of the art occasionally with helpful references back to the primary material. Some of them are even considered “standards”, in that they are the go-to places for the information they catalogue:

  • If you want an algorithm, you probably want The Art of Computer Programming or Numerical Recipes. Difficulties: you probably won’t understand what’s written in there (the latter book in particular assumes a bunch of degree-level maths).
  • If you want idioms for your language, look for a catalogue called “Effective <name of your language>”. Difficulty: some people will disagree with the content here just to be contrary.
  • If you want a pattern, well! Have we got a catalogue for you! In fact, have we got more catalogues than distinct patterns! There’s the Gang of Four book, the PloP series, and more. If you want a catalogue that looks like it’s about patterns but is actually comprised of random internet commentators trying to prove they know more than Alastair Cockburn, you could try out the Portland Pattern Repository. Difficulty: you probably won’t know what you’re looking for until you’ve already read it—and a load of other stuff.

I’ve already discussed how conference talks are a double-edged sword when it comes to knowledge sharing: they reach a small fraction of the practitioners, take information from an even smaller fraction, and typically set up a subculture with its own values distinct from programming in the large. The same goes for company-internal knowledge sharing programs. I know a few companies that run such programs (we do where I work, and Etsy publish the talks from theirs). They’re great for promoting research, learning and sharing within the company, but you’re always aware that you’re not necessarily discovering things from without.

So I consider this one of the great unsolved problems in programming at the moment. In fact, let me express it as two distinct questions:

  1. How do I make sure that I am not reinventing wheels, solving problems that no longer need solving or making mistakes that have already been fixed?
  2. A new (and for sake of this discussion) inexperienced programmer joins my team. How do I help this person understand the problems that have already been solved, the mistakes that have already been made, and the wheels that have already been invented?

Solve this, and there are only two things left to do: fix concurrency, name things, and improve bounds checking.

Lighter UIViewControllers

The first issue of Objective-C periodical objc.io has just been announced:

Issue #1 is about lighter view controllers. The introduction tells you a bit more about this issue and us. First, Chris writes about lighter view controllers. Florian expands on this topic with clean table view code. Then Daniel explores view controller testing. Finally, in our first guest article, Ricki explains view controller containment.

Each of the articles is of a high quality, I thoroughly recommend that you read this. I powered through it this morning and have already put myself on the mailing list for the next issue. I think it’s worth quite a few dollars more than the $0 they’re asking.

On a more self-focussed note, I’m pleased to note that two of the articles cite my work. This isn’t out of some arrogant pleasure at seeing my name on someone else’s website. One of my personal goals has been to teach the people who teach the other people: to ensure that what I learn becomes a part of the memeplex and isn’t forgotten and reinvented a few years later. In APPropriate Behaviour a few of the chapters discuss the learning and teaching of software making. I consider it one of the great responsibilities of our discipline, to ensure the mistakes we had to make are not made by those who come after us.

The obj.io team have done a great job of researching their articles, taking the knowledge that has been found and stories that have been told and synthesising new knowledge and new stories from them. I’m both proud and humble about my small, indirect role in this effort.

APPropriate Behaviour is complete!

APPropriate Behaviour, the book on things programmers do that aren’t programming, is now complete! The final chapter – a philosophy of software making – has been added, concluding the book.

Just because it’s complete, doesn’t mean it’s finished: as my understanding of what we do develops I’ll probably want to correct things, or add new anecdotes or ideas. Readers of the book automatically get free updates whenever I create them in the future, so I hope that this is a book that grows with us.

As ever, the introduction to the book has instructions on joining the book’s Glassboard to discuss the content or omissions from the content. I look forward to reading what you have to say about the book in the Glassboard.

While the recommended purchase price of APPropriate Behaviour is $20, the minimum price now that it’s complete is just $10. Looking at the prices paid by the 107 readers who bought it while it was still being written, $10 is below the median price (so most people chose to pay more than $10) and the modal price (so the most common price chosen by readers was higher than $10).

A little about writing the book: I had created the outline of the book last Summer, while thinking about the things I believed should’ve been mentioned in Code Complete but were missing. I finally decided that it actually deserved to be written toward the end of the year, and used National Novel Writing Month as an excuse to start on the draft. A sizeable portion of the draft typescript was created in that month; enough to upload to LeanPub and start getting feedback on from early readers. I really appreciate the help and input those early readers, along with other people I’ve talked to the material about, have given both in preparing APPropriate Behaviour and in understanding my career and our industry.

Over the next few months, I tidied up that first draft, added new chapters, and extended the existing material. The end result – the 11th release including that first draft – is 141 pages of reflection over the decade in which I’ve been paid to make software: not a long time, but still nearly 15% of the sector’s total lifespan. I invite you to grab a copy from LeanPub and share in my reflections on that decade, and consider what should happen in the next.

Specifications for interchanging objects

One of the interesting aspects of Smalltalk and similar languages including Objective-C and Ruby is that while the object model exposes a hierarchy of classes, consumers of objects in these environments are free to ignore the position of the object in that hierarchy. The hierarchy can be thought of as a convenience: on the one hand, for people building objects (“this object does all the same stuff as instances of its parent class, and then some”). It’s also a convenience for people consuming objects (“you can treat this object like it’s one of these types further up the hierarchy”).

So you might think that -isKindOfClass: represents a test for “I can use this object like I would use one of these objects”. There are two problems with this, which are both expressed across two dimensions. As with any boolean test, the problems are false positives and false negatives.

A false positive is when an object passes the test, but actually can’t be treated as an instance of the parent type. In a lot of recent object-oriented code this is a rare problem. The idea of the Liskov Substitution Principle, if not its precise intent as originally stated, has become entrenched in the Object-Oriented groupthink.

I’ve worked with code from the 1980s though where these false positives exist: an obvious example is “closing off” particular selectors. A parent class defines some interface, then subclasses inherit from that class, overriding selectors to call [self doesNotRecognize:] on features of the parent that aren’t relevant in the subclass. This is still possible today, though done infrequently.

False negatives occur when an object fails the -isKindOfClass: test but actually could be used in the way your software intends. In Objective-C (though neither in Smalltalk[*] nor Ruby), nil _does_ satisfy client code’s needs in a lot of cases but never passes the hierarchy test. Similarly, you could easily arrange for an object to respond to all the same selectors as another object, and to have the same dynamic behaviour, but to be in an unrelated position in the hierarchy. You _can_ use an OFArray like you can use an NSArray, but it isn’t a kind of NSArray.

[*] There is an implementation of an Objective-C style Null object for Squeak.

Obviously if the test is broken, we should change the test. False negatives can be addressed by testing for protocols (again, in the languages I’ve listed, this only applies to Objective-C and MacRuby). Protocols are unfortunately named in this instance: they basically say “this object responds to any selector in this list”. We could then say that rather than testing for an object being a kind of UIView, we need an object that conforms to the UIDrawing protocol. This protocol doesn’t exist, but we could say that.

Problems exist here. An object that responds to all of the selectors doesn’t necessarily conform to the protocol, so we still have false negatives. The developer of the class might have forgotten to declare the protocol (though not in MacRuby, where protocol tests are evaluated dynamically), or the object could forward unknown selectors to another object which does conform to the protocol.

There’s still a false positive issue too: ironically protocol conformance only tells us what selectors exist, not the protocol in which they should be used. Learning an interface from a protocol is like learning a language from a dictionary, in that you’ve been told what words exist but not what order they should be used in or which ones it’s polite to use in what circumstances.

Consider the table view data source. Its job is to tell the table view how many sections there are, how many rows there are in each section, and what cell to display for each row. An object that conforms to the data source protocol does not necessarily do that. An object that tells the table there are three sections but crashes if you ask how many rows are in any section beyond the first conforms to the protocol, but doesn’t have the correct dynamic behaviour.

We have tools for verifying the dynamic behaviour of objects. In his 1996 book Superdistribution: Objects as Property on the Electronic Frontier, Brad Cox describes a black box test of an object’s dynamic behaviour, in which test code messages the object then asserts that the object responds in expected ways. This form of test was first implemented in a standard fashion, to my knowledge, in 1998 by Kent Beck as a unit test.

Unit tests are now also a standard part of the developer groupthink, including tests as specification under the name Test-Driven Development But we still use them in a craft way, as a bespoke specification for our one-of-a-kind classes. What we should really do is to make more use of these tests: substituting our static, error-prone type tests for dynamic specification tests.

A table view does not need something that responds to the data source selectors, it needs something that behaves like a data source. So let’s create some tests that any data source should satisfy, and bundle them up as a specification that can be tested at runtime. Notice that these aren’t quite unit tests in that we’re not testing our data source, we’re testing any data source. We could define some new API to test for satisfactory behaviour:

- (void)setDataSource: (id <UITableViewDataSource>)dataSource {
  NSAssert([Demonstrate that: dataSource satisfies: [Specification for: @protocol(UITableViewDataSource)]]);
  _dataSource = dataSource;
  [self reloadData];
}

But perhaps with new language and framework support, it could look like this:

- (void)setDataSource: (id @<UITableViewDataSource>)dataSource {
  NSAssert([dataSource satisfiesSpecification: @specification(UITableViewDataSource)]);
  _dataSource = dataSource;
  [self reloadData];
}

You could imagine that in languages that support design-by-contract, such as Eiffel, the specification of a collaborator could be part of the contract of a class.

In each case, the expression inside the assertion handler would find and run the test specification appropriate for the collaborating object. Yes this is slower than doing the error-prone type hierarchy or conformance tests. No, that’s not a problem: we want to make it right before making it fast.

Treating test fixtures as specifications for collaboration between objects, rather than (or in addition to) one-off tests for one-off classes, opens up new routes for collaboration between the developers of the objects. Framework vendors can supply specifications as enhanced documentation. Framework consumers can supply specifications of how they’re using the frameworks as bug reports or support questions: vendors can add those specifications to a regression testing arsenal. Application authors can create specifications to send to contractors or vendors as acceptance tests. Vendors could demonstrate that their code is “a drop-in replacement” for some other code by demonstrating that both pass the same specification.

But finally it frees object-oriented software from the tyranny of the hierarchy. The promise of duck typing has always been tempered by the dangers, because we haven’t been able to show that our duck typed objects actually can quack like ducks until it’s too late.

On designing collections

Introduction

This post explores the pros and the cons of following the design rule “Objects responsible for collections of other objects should expose an interface to the collection, not the collection itself”. Examples and other technical discussion is in Objective-C, assuming Foundation classes and idioms.

Example Problem

Imagine you were building Core Data today. You get to NSEntityDescription, which encapsulates the information about an entity in the Managed Object Model including its attributes and relations, collectively “properties”. You have two options:

  1. Let client code have access to the actual collection of properties.
  2. Make client code work with the properties via API on NSEntityDescription or abstract collection interfaces.

In reality, NSEntityDescription does both, but not with the same level of control. As external clients of the class we can’t see whether the collection objects are internal state of the entity description or are themselves derived from its real state, although this LGPL implementation from GSCoreData does return its real instance variable in one case. However, this implementation is old enough not to show another aspect of Apple’s class: their entity description class itself conforms to NSFastEnumeration.

How to give access to the actual collection

This is the easiest case.

@implementation NSEntityDescription
{
    NSArray *_properties;
}

- (NSArray *)properties
{
    return _properties;
}

@end

A common (but trivial) extension to this is to return an autoreleased copy of the instance variable, particularly in the case where the instance variable itself is mutable but clients should be given access to a read-only snapshot of the state. A less-common technique is to build a custom subclass of NSArray using an algorithm specific to this application.

How to provide abstract collection access

There are numerous ways to do this, so let’s take a look at a few of them.

Enumerator

It doesn’t matter how a collection of things is implemented, if you can supply each in turn when asked you can give out an enumerator. This is how NSFileManager lets you walk through the filesystem. Until a few years ago, it’s how NSTableView let you see each of the selected columns. It’s how PSFeed works, too.

The good news is, this is usually really easy. If you’re already using a Foundation collection class, it can already give you an appropriate NSEnumerator.

- (NSEnumerator *)propertyEnumerator
{
     return [_properties objectEnumerator];
}

You could also provide your own NSEnumerator subclass to work through the collection in a different way (for example if the collection doesn’t actually exist but can be derived lazily).

Fast enumeration

This is basically the same thing as “Enumerator”, but has different implementation details. In addition, the overloaded for keyword in Objective-C provides a shorthand syntax for looping over collections that conform to the NSFastEnumeration protocol.

Conveniently, NSEnumerator conforms to the protocol so it’s possible to go from “Enumerator” to “Fast enumeration” very easily. All of the Foundation collection classes also implement the protocol, so you could do this:

- (id <NSFastEnumeration>)properties
{
    return _properties;
}

Another option—the one that NSEntityDescription actually uses—is to implement the -countByEnumeratingWithState:options:count: method yourself. A simple implementation passes through to a Foundation collection class that already gets the details right. There are a lot of details, but a custom implementation could do the things a custom NSEnumerator subclass could.

Object subscripting

If you’ve got a collection that’s naturally indexed, or naturally keyed, you can let people access that collection without giving them the specific implementation that holds the collected objects. The subscripting methods let you answer questions like “what is the object at index 3?” or “which object is named ‘Bob’?”. As with “Fast enumeration” there is syntax support in the language for conveniently using these features in client code.

- (id)objectAtIndexedSubscript: (NSUInteger)idx
{
    return _properties[idx];
}

Key-Value Coding

Both ordered and unordered collections can be hidden behind the to-many Key-Value Coding accessors. These methods also give client code the opportunity to use KVC’s mutable proxy collections, treating the real implementation (whatever it is) as if it were an NSMutableArray or NSMutableSet.

- (NSUInteger)countOfProperties
{
    return [_properties count];
}

- (NSPropertyDescription *)objectInPropertiesAtIndex: (NSUInteger)index
{
    return _properties[index];
}

- (NSArray *)propertiesAtIndexes: (NSIndexSet *)indexes
{
    return [_properties objectsAtIndexes: indexes];
}

Block Application (or Higher-Order Messaging)

You could decide not to give out access to the collection at all, but to allow clients to apply work to the objects in that collection.

- (void)enumeratePropertiesWithBlock: (void (^)(NSPropertyDescription *propertyDescription))workBlock
{
    NSUInteger count = [_properties count];
    dispatch_apply(count, _myQueue, ^(size_t index) {
        workBlock(_properties[index]);
    });
}

Higher-order messaging is basically the same thing but without the blocks. Clients could supply a selector or an invocation to apply to each object in the collection.

Visitor

Like higher-order messaging, Visitor is a pattern that lets clients apply code to the objects in a collection regardless of the implementation or even logical structure of the collection. It’s particularly useful where client code is likely to need to maintain some sort of state across visits; a compiler front-end might expose a Visitor interface that clients use to construct indices or symbol tables.

- (void)acceptPropertyVisitor: (id <NSEntityDescriptionVisitor>)aVisitor
{
    for (NSPropertyDescription *property in _properties)
    {
        [aVisitor visit: property];
    }
}

Benefits of Hiding the Collection

By hiding the implementation of a collection behind its interface, you’re free to change the implementation whenever you want. This is particularly true of the more abstract approaches like Enumerator, Fast enumeration, and Block Application, where clients only find out that there is a collection, and none of the details of whether it’s indexed or not, sparse or not, precomputed or lazy and so on. If you started with an array but realise an indexed set or even an unindexed set would be better, there’s no problem in making that change. Clients could iterate over objects in the collection before, they can now, nothing has changed—but you’re free to use whatever algorithms make most sense in the application. That’s a specific example of the Principle of Least Knowledge.

With the Key-Value Coding approach, you may be able to answer some questions in your class without actually doing all the work to instantiate the collected objects, such as the Key-Value Coding collection operators.

Additionally, there could be reasons to control use of the collected objects. For mutable collections it may be better to allow Block Application than let clients take an array copy which will become stale. Giving out the mutable collection itself would lead to all sorts of trouble, but that can be avoided just with a copy.

Benefits of Providing the Collection

It’s easier. Foundation already provides classes that can do collections and all of the things you might want to do with them (including enumeration, KVC, and application) and you can just use those implementations without too much work: though notice that you can do that while still following the Fast Enumeration pattern.

It’s also conventional. Plenty of examples exist of classes that give you a collection of things they’re responsible for, like subviews, child view controllers, table view columns and so on (though notice that the table view previously gave an Enumerator of selected columns, and lets you Apply Block to row views). Doing the same thing follows a Principle of Least Surprise.

When integrating your components into an app, there may be expectations that a given collection is used in a context where a Foundation collection class is expected. If your application uses an array controller, you need an array: you could expect the application to provide an adaptor, or you could use Key-Value Coding and supply the proxy array, but it might be easiest to just give the application access to the collection in the first place.

Finally, the Foundation collection classes are abstract, so you still get some flexibility to change the implementation by building custom subclasses.

Conclusion

There isn’t really a “best” way to do collections, there are benefits and limitations of any technique. By understanding the alternatives we can choose the best one for any situation (or be like NSEntityDescription and do a bit of each). In general though, giving interfaces to get or apply work to elements of the collection gives you more flexibility and control over how the collection is maintained, at the cost of being more work.

Coda

“More than one thing” isn’t necessarily a collection. It could be better modelled as a source, that generates objects over time. It could be a cursor that shows the current value but doesn’t remember earlier ones, a bit like an Enumerator. It could be something else. The above argument doesn’t consider those cases.

On rewriting your application

I’m really far behind on podcasts. I have a long commute, and listen to one audiobook every month, filling the slack time with a selection of podcasts. It happens that between two really long books (Cryptonomicon by Neal Stephenson and The Three Musketeers by Alexandre Dumas, both of which I’d recommend) and quite a few snow days I’ve managed to fall behind.

This is the context in which I listened to Episode 77 of iDeveloper Live, on whether to rewrite or not rewrite an application. I was meant to be on that program but had to pull out due to a combination of being ill and pressing on with writing what would become Discworld: the Ankh-Morpork Map. I imagine that one of these two factors was the cause of the other.

Anyway, listening to the podcast made me decide to put my case forward. It’s unfortunate that I wasn’t part of the live discussion but I hope that what I have to say still has value. I’d recommend listening to what Scotty, Uli, John and Pilky have to say on the original podcast, too. If I make references to the discussion in the podcast, I’ll apologise in advance for failing to remember which person said what.

The view from the outside

The point that got me thinking about this was the marketing for a version x.0 of an app. I don’t remember what the app was, I think Danny Greg probably posted the link to it. The release described how the app was “completely rewritten” from the previous version; an announcement that’s not uncommon in software release notes.

My position is this: A complete rewrite is neither a feature nor a benefit. It’s a warning. It says “warning: you might like the previous version, in fact that’s probably why you’re buying it, but what we’re now selling you has nothing in common with it”. It says “warning: the workflow you’re accustomed to in this app may no longer work, or may have different bugs than the ones you’ve discovered how to work around”. It says “warning: this product and the previous version are similar in name alone”. This is what I infer when I read that your product has been rewritten.

The view from the inside

As programmers, we’re saying “but I’ve got to rewrite this, it’s so old and crappy”. Why is it old and crappy? Is it because we weren’t trained to write readable code, and we weren’t trained to read code? Someone on the podcast referred to the “you should hate code you wrote six months ago or you’re not learning” trope. No. You should look at code you were writing six months ago and see how it could be improved. You should be able to decide whether to make those improvements, depending on whether they would benefit the product.

Many of the projects I’ve worked on have taken more than six months to complete. In each case, we could have either released it, or we could still be in a cycle of finding code that was modified more than six months ago, looking at it in disgust, throwing it away and writing it again—and waiting for the new regression bug reports to come in.

Bear in mind that source code is a liability, not an asset. When you tear something out to rewrite it from scratch, you’re using up time and money to create a thing that provides the same value as the thing you’re replacing. It’s at times like this that we enjoy waving our hands and talking about Technical Debt. Martin Fowler:

The tricky thing about technical debt, of course, is that unlike money it’s impossible to measure effectively. The interest payments hurt a team’s productivity, but since we CannotMeasureProductivity, we can’t really see the true effect of our technical debt.

We can’t really see the true effect. So how do you know that this rewrite is going to pay off? If you have some problem now it might be clear that this problem can be addressed more cheaply with a rewrite than by extending or modifying existing code. This is the situation Uli described in the podcast; they’d used some third-party library in version 1, which got them a quick release but had its problems. Having learned from those problems, they decided to replace that library in version 2.

Where you have problems, you can solve them either by modification or by rewriting. If you think that some problem might occur in the future, then leave it: you don’t make a profit by solving problems that no-one has.

A case study

While I had a few jobs in computing beforehand, all of which required that I write code, my first job where the title meant “someone who writes code” started about six years ago, at a company that makes anti-virus software. I was a developer (and would quickly become lead developer, despite not being ready) on the Mac version of the software.

This software had what could be described as an illustrious history: it was older than some of the readers of this blog (which is not uncommon: Cocoa is old enough to vote in the UK and UNIX is middle-aged). It started life as a Lightspeed/THINK C product, then become a PowerPlant product at around the time of the PowerPC transition. When I started working on it in 2007 the PowerPlant user interface still existed, but it did look out of place and dated. In addition, Apple were making noises about the library not being supportable on future versions of Mac OS X, so the first project for the new team was to build a new UI in Cocoa.

We immediately set out getting things wrong more quickly than any other team in the company. The lead developer when I joined had plenty of experience on the MS-DOS and Windows versions of the product, but had never worked on a Mac nor in Objective-C: then I became the lead developer, having worked in Objective-C but never in a team of more than one person. I won’t go into all of the details but the project ended up taking multiple times its estimated duration: not surprising, when the team had never worked together and none of the members had worked on this type of project so our estimates were really random numbers.

At the outset of this project, being untrained in the reading of other people’s code, I was dismayed by what I saw. I repeatedly asked for permission to rewrite other parts of the system that had copyright dates from when Kurt Cobain was still making TV appearances. I was repeatedly refused: the correct decision as while the old code had its bugs it was a lot more stable than what we were writing, and as it had already been written it cost a lot less to supply to the customer[*].

Toward the eventual end of the project, I asked my manager why we hadn’t given up on it after a year of getting things wrong, declared that a learning exercise and started over. Essentially, why couldn’t we take the “I hate the code I wrote last year, let’s start from scratch” approach. His answer was that at least one person would’ve got frustrated and quit after having their code thrown away; then we’d have no product and also no team so would not be in a better position.[*]

Eventually the product did get out of the door, and while I’m no longer involved with it I can tell that the version shipping today still has most of the moving parts that were developed during my time and before. Gradual improvement, responding to changes in what customers want and what suppliers provide, has stood that product in good stead for over two decades.

[*] It’s important to separate these two arguments from the Sunk Cost Fallacy. In neither case are we including the money spent on prior work. The first paragraph says “from today’s perspective, what we already have is free and what you haven’t written is not free, but they both do the same thing”. The second paragraph says “from today’s perspective, finishing what you’ve done costs a lot of money. Starting afresh costs a lot of money and introduces social disruption. But they both do the same thing.”

A two-dimensional dictionary

What?

A thing I made has just been open-sourced by my employers at Agant: the AGTTwoDimensionalDictionary works a bit like a normal dictionary, except that the keys are CGPoints meaning we can find all the objects within a given rectangle.

Why?

A lot of time on developing Discworld: The Ankh-Morpork Map was spent on performance optimisation: there’s a lot of stuff to get moving around a whole city. As described by Dave Addey, the buildings on the map were traced and rendered into separate images so that we could let characters go behind them. This means that there are a few thousand of those little images, and whenever you’re panning the map around the app has to decide which images are visible, put them in the correct place (in three dimensions; remember people can be in front of or behind the buildings) and draw everything.

A first pass involved creating a set containing all of the objects, looping over them to find out which were within the screen region. This was too slow. Implementing this 2-d index instead made it take about 20% the original time for only a few tens of kilobytes more memory, so that’s where we are now. It’s also why the data type doesn’t currently do any rebalancing of its tree; it had become fast enough for the app it was built in already. This is a key part of performance work: know which battles are worth fighting. About one month of full-time development went into optimising this app, and it would’ve been more if we hadn’t been measuring where the most benefit could be gained. By the time we started releasing betas, every code change was measured in Instruments before being accepted.

Anyway, we’ve open-sourced it so it can be fast enough for your app, too.

How?

There’s a data structure called the multidimensional binary tree or k-d tree, and this dictionary is backed by that data structure. I couldn’t find an implementation of that structure I could use in an iOS app, so cracked open the Objective-C++ and built this one.

Objective-C++? Yes. There are two reasons for using C++ in this context: one is that the structure actually does get accessed often enough in the Discworld app that dynamic dispatch all the way down adds a significant time penalty. The other is that the structure contains enough objects that having a spare isa pointer per node adds a significant memory penalty.

But then there’s also a good reason for using Objective-C: it’s an Objective-C app. My controller objects shouldn’t have to be written in a different language just to use some data structure. Therefore I reach for the only application of ObjC++ that should even be permitted to compile: an implementation in one language that exposes an interface in the other. Even the unit tests are written in straight Objective-C, because that’s how the class is supposed to be used.