More security processes go wrong

I just signed a piece of card so that I could take a picture of it, clean it up and attach it to a document, pretending that I’d printed the document out, signed it, and scanned it back in. I do that about once a year (it was more frequent when I ran my own business, but then I only signed the piece of card once).

Just a little reminder: it’s not having my signature that should be valued, it’s having seen me perform the act of signing. Signatures can easily be duplicated. If you’ve decided that I’m me, and you’ve seen me put my signature to a document, from that moment on you can know that I signed that document. If you didn’t see it, but got a validated statement from a known notary that they saw it, then fair enough. If you didn’t see it, and a notary didn’t see it, then all you know is that you have a sheet of paper containing some words and my signature. This should tell you nothing about how the two came into proximity.

Posted in Authentication, Vulnerability | Comments Off on More security processes go wrong

Could effortless lecturers make everything seem too easy?

From the British Psychological Society blog: Engaging lecturers can breed overconfidence.

The students who’d seen the smooth lecturer thought they would do much better than did the students who saw the awkward lecturer, consistent with the idea that a fluent speaker breeds confidence. In fact, both groups of students fared equally well in the test. In the case of the students in the fluent lecturer condition, this wasn’t as good as they’d predicted. Their greater confidence was misplaced.

I’m speaking at a couple of conferences later this year (iOSDev UK in Aberystwyth in September and NSScotland in Edinburgh in October), and will endeavour to be exactly as exciting as the material deserves: a capability to which my track record can attest.

Posted in psychology, Talk | Comments Off on Could effortless lecturers make everything seem too easy?

Objective-C, dependencies, linking

In the most recent episode of Edge Cases, Wolf and Andrew discuss dependency management, specifically as it pertains to Objective-C applications that import libraries using the Cocoapods tool.

In one app I worked on a few years ago, two different libraries each tried to include (as part of the libraries themselves, not as dependencies) the Reachability classes from Apple’s sample code. The result was duplicate symbol definitions, because my executable was trying to link both (identical) definitions of the classes. Removing one of the source files from the build fixed it, but how could we avoid getting into that situation in the first place?

One way explored in the podcast is to namespace the classes. So Mister Framework could rename their Reachability to MRFReachability, Framework-O-Tron could rename theirs to FOTReachability. Now we have exactly the same code included twice, under two different names. They don’t conflict, but they are identical so our binary is bigger than it needs to be.

It’d be great if they both encoded their dependency on a common class but didn’t try to include it themselves so we could just fetch that class and use it in both places. Cocoapods’s dependency resolution allows for that, and will work well when both frameworks want exactly the same Reachability class. However, we hit a problem again when they want different libraries, with the same names in.

Imagine that the two frameworks were written using different versions of libosethegame. The developers changed the interface when they went from v1.0 to v1.1, and Framework-O-Tron is still based on the older version of the interface. So just selecting the newer version won’t work. Of course, neither does just selecting the older version. Can we have both versions of libosethegame, used by the two different frameworks, without ending up back with the symbol collision error?

At least in principle, yes we can. The dynamic loader, dyld (also described in the podcast) supports two-level namespace for dynamic libraries. Rather than linking against the osethegame library with -losethegame, you could deploy both libosethegame.1.0.0.dylib and libosethegame.1.1.0.dylib. One framework links with -losethegame.1.0, the other links with -losethegame.1.1. Both are deployed, and the fact that they were linked with different names means that the two-level namespace resolves the correct symbol from the correct version of the library, and all is well.

Of course, if you’ve got dynamic libraries, and the library vendor is willing to do a little work, they can just ship one version that supports all previous behaviour, looking at which version of the library the client was linked against to decide what behaviour to provide. Despite Mac OS X providing a versioned framework bundle layout, Apple has never (to my knowledge) shipped different versions of the system frameworks. Instead, the developers use the Mach-O load headers for an executable to find the linked version of their library, and supply behaviour equivalent to that version.

The above two paragraphs do rather depend on being able to use the dynamic linker. We can’t, on iOS, at the moment.

Posted in code-level, tool-support | Leave a comment

When security procedures go bad

My password with my bank may as well be “I can’t remember, can we go through the security questions please?” That’s my answer so many times when they ask, and every time it gets me in via a slightly tedious additional verification step. Losing customers probably represents a greater financial risk to them than fraud on any individual account, so they don’t seem to take the password thing too seriously.

Posted in Uncategorized | Comments Off on When security procedures go bad

Specifications for interchanging objects

One of the interesting aspects of Smalltalk and similar languages including Objective-C and Ruby is that while the object model exposes a hierarchy of classes, consumers of objects in these environments are free to ignore the position of the object in that hierarchy. The hierarchy can be thought of as a convenience: on the one hand, for people building objects (“this object does all the same stuff as instances of its parent class, and then some”). It’s also a convenience for people consuming objects (“you can treat this object like it’s one of these types further up the hierarchy”).

So you might think that -isKindOfClass: represents a test for “I can use this object like I would use one of these objects”. There are two problems with this, which are both expressed across two dimensions. As with any boolean test, the problems are false positives and false negatives.

A false positive is when an object passes the test, but actually can’t be treated as an instance of the parent type. In a lot of recent object-oriented code this is a rare problem. The idea of the Liskov Substitution Principle, if not its precise intent as originally stated, has become entrenched in the Object-Oriented groupthink.

I’ve worked with code from the 1980s though where these false positives exist: an obvious example is “closing off” particular selectors. A parent class defines some interface, then subclasses inherit from that class, overriding selectors to call [self doesNotRecognize:] on features of the parent that aren’t relevant in the subclass. This is still possible today, though done infrequently.

False negatives occur when an object fails the -isKindOfClass: test but actually could be used in the way your software intends. In Objective-C (though neither in Smalltalk[*] nor Ruby), nil _does_ satisfy client code’s needs in a lot of cases but never passes the hierarchy test. Similarly, you could easily arrange for an object to respond to all the same selectors as another object, and to have the same dynamic behaviour, but to be in an unrelated position in the hierarchy. You _can_ use an OFArray like you can use an NSArray, but it isn’t a kind of NSArray.

[*] There is an implementation of an Objective-C style Null object for Squeak.

Obviously if the test is broken, we should change the test. False negatives can be addressed by testing for protocols (again, in the languages I’ve listed, this only applies to Objective-C and MacRuby). Protocols are unfortunately named in this instance: they basically say “this object responds to any selector in this list”. We could then say that rather than testing for an object being a kind of UIView, we need an object that conforms to the UIDrawing protocol. This protocol doesn’t exist, but we could say that.

Problems exist here. An object that responds to all of the selectors doesn’t necessarily conform to the protocol, so we still have false negatives. The developer of the class might have forgotten to declare the protocol (though not in MacRuby, where protocol tests are evaluated dynamically), or the object could forward unknown selectors to another object which does conform to the protocol.

There’s still a false positive issue too: ironically protocol conformance only tells us what selectors exist, not the protocol in which they should be used. Learning an interface from a protocol is like learning a language from a dictionary, in that you’ve been told what words exist but not what order they should be used in or which ones it’s polite to use in what circumstances.

Consider the table view data source. Its job is to tell the table view how many sections there are, how many rows there are in each section, and what cell to display for each row. An object that conforms to the data source protocol does not necessarily do that. An object that tells the table there are three sections but crashes if you ask how many rows are in any section beyond the first conforms to the protocol, but doesn’t have the correct dynamic behaviour.

We have tools for verifying the dynamic behaviour of objects. In his 1996 book Superdistribution: Objects as Property on the Electronic Frontier, Brad Cox describes a black box test of an object’s dynamic behaviour, in which test code messages the object then asserts that the object responds in expected ways. This form of test was first implemented in a standard fashion, to my knowledge, in 1998 by Kent Beck as a unit test.

Unit tests are now also a standard part of the developer groupthink, including tests as specification under the name Test-Driven Development But we still use them in a craft way, as a bespoke specification for our one-of-a-kind classes. What we should really do is to make more use of these tests: substituting our static, error-prone type tests for dynamic specification tests.

A table view does not need something that responds to the data source selectors, it needs something that behaves like a data source. So let’s create some tests that any data source should satisfy, and bundle them up as a specification that can be tested at runtime. Notice that these aren’t quite unit tests in that we’re not testing our data source, we’re testing any data source. We could define some new API to test for satisfactory behaviour:

- (void)setDataSource: (id <UITableViewDataSource>)dataSource {
  NSAssert([Demonstrate that: dataSource satisfies: [Specification for: @protocol(UITableViewDataSource)]]);
  _dataSource = dataSource;
  [self reloadData];
}

But perhaps with new language and framework support, it could look like this:

- (void)setDataSource: (id @<UITableViewDataSource>)dataSource {
  NSAssert([dataSource satisfiesSpecification: @specification(UITableViewDataSource)]);
  _dataSource = dataSource;
  [self reloadData];
}

You could imagine that in languages that support design-by-contract, such as Eiffel, the specification of a collaborator could be part of the contract of a class.

In each case, the expression inside the assertion handler would find and run the test specification appropriate for the collaborating object. Yes this is slower than doing the error-prone type hierarchy or conformance tests. No, that’s not a problem: we want to make it right before making it fast.

Treating test fixtures as specifications for collaboration between objects, rather than (or in addition to) one-off tests for one-off classes, opens up new routes for collaboration between the developers of the objects. Framework vendors can supply specifications as enhanced documentation. Framework consumers can supply specifications of how they’re using the frameworks as bug reports or support questions: vendors can add those specifications to a regression testing arsenal. Application authors can create specifications to send to contractors or vendors as acceptance tests. Vendors could demonstrate that their code is “a drop-in replacement” for some other code by demonstrating that both pass the same specification.

But finally it frees object-oriented software from the tyranny of the hierarchy. The promise of duck typing has always been tempered by the dangers, because we haven’t been able to show that our duck typed objects actually can quack like ducks until it’s too late.

Posted in documentation, OOP, software-engineering, TDD, tool-support | 1 Comment

Can Objective-C be given safe categories?

That was the subject of this lunchtime’s vague thinking out loud. The problems with categories are well-known: you can override the methods already declared on a class, or the methods provided in another category (and therefore another category can replace your implementations too). Your best protection is to use ugly wartifying prefixes in the hope that your bewarted method names don’t collide with everybody else’s bewarted method names.

A particular problem with categories, and one that’s been observed in the wild, is when you add a method in a category that is, at some later time, added to the original implementation of the class itself. Other consumers of the class (including the framework it’s part of) may be expecting to work with the first-party implementation, not your substitution. If the first-party method has a different binary interface to yours (e.g. one of you returns a primitive value and the other a struct), as happened to a lot of people with NSArray around the end of the 1990s, prepare to start crashinating.

Later implementations of similar features in other languages have avoided this problem by refusing to add methods that already exist, and by ensuring that even if multiple extensions define the same method they can all coexist and the client code expresses exactly which one it’s referring to. Can we add any of this safety to Objective-C?

Partially. We could design a function for adding a collection of methods from a “category” to a class at runtime, that only adds them if the class doesn’t already implement them. class_addCategory() shows what this might look like, but it only supports non-struct-returning instance methods.

If class_addCategory(target, source, NO) succeeds, then the methods you were trying to add did not exist on the target class before you called the function. However, you cannot be sure that they weren’t being added while your call was in progress, and you can’t know later that they weren’t clobbered by someone else at some point between successfully adding the methods and using them. Also, if class_addCategory() fails, you may find the only reasonable course of action is to not use the methods you were trying to add: the only thing you know about their implementation is that it either doesn’t exist or isn’t the one you were expecting. This is at odds with a hypothetical purist notion of Object-Oriented Programming where you send messages to objects and don’t care what happens as a result.

There are plenty of ways to work around the limitations of categories: composition is the most likely to succeed (more likely than subclassing, which suffers the same collision problem as a later version of the superclass might try to define a method with the same name as one you’ve chosen, which you’re now clobbering). It doesn’t let you replace methods on a class—a tool that like most in the programmer’s utility belt is both occasionally useful and occasionally abuseful.

Coda

I should point out that I’m not a fan of taking away the potentially dangerous tools. Many people who see the possibility for a language feature to be abused argue that it should never be used or that languages that don’t offer it should be preferred. This is continuum-fallacy nonsense, to which I do not subscribe. Use whatever language features help you to produce a working, comprehensible, valuable software system: put in whatever protections you want to guard against existing or likely problems.

Posted in code-level, OOP | Comments Off on Can Objective-C be given safe categories?

APPropriate Behaviour is almost done

I just pushed another update to APPropriate Behaviour, my work on the things programmers do that aren’t programming. There’s some refinement to the existing material to be done, and a couple of short extra chapters to finish and add. But then it will be complete!

The recommended price of APPropriate Behaviour is $20. While it’s been under development, I’ve allowed readers interested in a sneak peak to buy APPropriate Behaviour at any price above $5. Once the final chapters are in place, the recommended price will remain $20 but the minimum price will be increasing. If you’ve been pondering buying it but haven’t yet, I recommend you do so now to get a bargain. Even if you buy it while I’m still working on it, you’ll get free updates for life as I add new material and make corrections.

As a little taster of things to come, the two remaining chapters are:

  • The ethics of making software
  • The philosophy of making software

Can’t wait to see what that means? Neither can I!

Posted in advancement of the self, books | Leave a comment

As the Kaiser Chiefs might say: Ruby ruby ruby n00bie

Imagine someone took the training wheels off of Objective-C. That’s how I currently feel.

Bike with Training Wheels: image credit Break

I’ve actually had a long—erm, not quite “love-hate”, more “‘sup?-meh”—relationship with Ruby. I’ve long wanted to tinker but never really had a project where I could make it fit; I did learn a little about Rails a couple of years back but didn’t then get to put it into practice. Recently I’ve been able to do some Real Work™ with Ruby, and wanted to share the experience.

Bear in mind that when I say I’ve been working with Ruby, I mean that I’ve been writing Objective-C in Ruby. This becomes clear when we see one of the problems I’ve been facing: I couldn’t work out how to indicate that a variable exposes some interface, until I realised I didn’t need to. Ruby takes the idea of duck typing much further than Objective-C does: using Ruby is much more like Smalltalk in that you don’t care what an object is, you care what it does. Currently no tools really support that way of working (and so Stockholm Syndrome-wielding developers will tell you that you don’t need such tools; just vi and a set of tests); the first warning I get when I’ve made a mistake is usually an exception backtrace. Something I had to learn quite quickly is that Ruby and Objective-C have different ideas of nil: Ruby behaves as the gods intend and lets you put nil into collections; but Objective-C behaves as the gods intend and lets you treat nil as a null object.

The problems I’ve been facing have largely involved learning how things are conventionally done. One example is that a library I was using took a particular parameter and treated it as a constant. Apparently Matz is a big fan of Fortran, but only early hipster Fortran before they sold out and added implicit none (around the time they fired their bass player and started playing the bigger venues). So Ruby provides its own implicit convention: constants have to be named starting with an uppercase. Otherwise you get told this:

wrong constant name parameter-value

Erm, that’s it. Not “you should try calling it ParameterValue“, or “constants must start with a capital letter”. Not even “this is not a good name for a constant”; who else interpreted that as “you gave the name of the wrong constant”? I think I’ve been spoiled by the improvements to the clang diagnostics over the last couple of years, but I found some of Ruby’s messages confusing and unhelpful. This is often the case with software that relies on convention: once you know the conventions you can go really fast, but when you don’t know them you feel like you’re being ignored or that it’s being obtuse.[*]

[*] When I asked for help on this issue I was told I suggest you pick of[sic] a good Ruby book or watch some Ruby tutorials on YouTube; you’ll be pleased to know that the interpreter wasn’t the only ignorant or obtuse tool I had to deal with.

These are very neophyte problems though, and once I got past them I found that I was able to make good progress with the language. I was using LightTable and RubyMine for editing, and found that I could work really quickly with a combination of those editors and irb. Having an interactive environment or a REPL is amazing for trying out little things, particularly when you’re new at a language and don’t know what’s going to work. It’s a bit cumbersome for more involved tests, but the general execute-test cycle is much faster than with Objective-C.

Speaking of tests, I know that if you ask four Ruby developers how to write unit tests you’ll get six different answers and at least eighteen of them will have moved on to Node.JS. I’ve been using Mini::Test, as it’s part of the standard library so involved the least configuration to get going.

I also took the opportunity to install MacRuby and have a go at building a Mac app, using Cocoa Bindings on the UI side to work with controllers and models that I’d written in Ruby. This isn’t the first exposure I’ve had to a bridged environment: I’ve done a lot of Perl-Cocoa with CamelBones, the PerlObjCBridge and ObjectiveFramework. MacRuby isn’t like those bridges though, in that (as I understand it) MacRuby builds Ruby’s object model on top of NSObject and the Objective-C runtime so Ruby objects actually are ObjC objects. It means there’s less manual gluing: e.g. in Perl you might do:

my $string = NSString->alloc->initWithCString_encoding_("Hello", NSUTF8StringEncoding);

In MacRuby that becomes:

string = "Hello"

That’s not to say there’s no boilerplate. I found that by-return references need the creation of a Pointer object on the Ruby side to house the pointer to the object reference, which looks like this:

error = Pointer.new(:object)
saveResult = string.writeToFile path, atomically: false, encoding: NSUTF8StringEncoding, error: error

For a long time, I’ve thought that there would be mileage in suggesting programmers use a different language than Objective-C for building applications in Cocoa, relying on ObjC as the systems language. Ruby could be that thing. The object models are very similar, so there isn’t a great deal of mind-twisting going on in exposing Objective-C classes and objects in Ruby. There’s a lot less “stuff you do that shuts the compiler up”, though ObjC has seen a reduction in that itself of late it still relies on C and all of its idiosyncrasies. Whether it’s actually better for some developers, and if so for whom, would need study.

Summarising, Ruby feels a lot like Objective-C without the stabilisers. You can work with objects and methods in a very similar way. The fast turnaround afforded by having an interactive shell and no compile-link waiting means you can go very quickly. The fact that you don’t get the same up-front analysis and reporting of problems means you can easily drive into a wall at full tilt. But at least you did so while you were having fun.

Posted in OOP, ruby | Comments Off on As the Kaiser Chiefs might say: Ruby ruby ruby n00bie

On designing collections

Introduction

This post explores the pros and the cons of following the design rule “Objects responsible for collections of other objects should expose an interface to the collection, not the collection itself”. Examples and other technical discussion is in Objective-C, assuming Foundation classes and idioms.

Example Problem

Imagine you were building Core Data today. You get to NSEntityDescription, which encapsulates the information about an entity in the Managed Object Model including its attributes and relations, collectively “properties”. You have two options:

  1. Let client code have access to the actual collection of properties.
  2. Make client code work with the properties via API on NSEntityDescription or abstract collection interfaces.

In reality, NSEntityDescription does both, but not with the same level of control. As external clients of the class we can’t see whether the collection objects are internal state of the entity description or are themselves derived from its real state, although this LGPL implementation from GSCoreData does return its real instance variable in one case. However, this implementation is old enough not to show another aspect of Apple’s class: their entity description class itself conforms to NSFastEnumeration.

How to give access to the actual collection

This is the easiest case.

@implementation NSEntityDescription
{
    NSArray *_properties;
}

- (NSArray *)properties
{
    return _properties;
}

@end

A common (but trivial) extension to this is to return an autoreleased copy of the instance variable, particularly in the case where the instance variable itself is mutable but clients should be given access to a read-only snapshot of the state. A less-common technique is to build a custom subclass of NSArray using an algorithm specific to this application.

How to provide abstract collection access

There are numerous ways to do this, so let’s take a look at a few of them.

Enumerator

It doesn’t matter how a collection of things is implemented, if you can supply each in turn when asked you can give out an enumerator. This is how NSFileManager lets you walk through the filesystem. Until a few years ago, it’s how NSTableView let you see each of the selected columns. It’s how PSFeed works, too.

The good news is, this is usually really easy. If you’re already using a Foundation collection class, it can already give you an appropriate NSEnumerator.

- (NSEnumerator *)propertyEnumerator
{
     return [_properties objectEnumerator];
}

You could also provide your own NSEnumerator subclass to work through the collection in a different way (for example if the collection doesn’t actually exist but can be derived lazily).

Fast enumeration

This is basically the same thing as “Enumerator”, but has different implementation details. In addition, the overloaded for keyword in Objective-C provides a shorthand syntax for looping over collections that conform to the NSFastEnumeration protocol.

Conveniently, NSEnumerator conforms to the protocol so it’s possible to go from “Enumerator” to “Fast enumeration” very easily. All of the Foundation collection classes also implement the protocol, so you could do this:

- (id <NSFastEnumeration>)properties
{
    return _properties;
}

Another option—the one that NSEntityDescription actually uses—is to implement the -countByEnumeratingWithState:options:count: method yourself. A simple implementation passes through to a Foundation collection class that already gets the details right. There are a lot of details, but a custom implementation could do the things a custom NSEnumerator subclass could.

Object subscripting

If you’ve got a collection that’s naturally indexed, or naturally keyed, you can let people access that collection without giving them the specific implementation that holds the collected objects. The subscripting methods let you answer questions like “what is the object at index 3?” or “which object is named ‘Bob’?”. As with “Fast enumeration” there is syntax support in the language for conveniently using these features in client code.

- (id)objectAtIndexedSubscript: (NSUInteger)idx
{
    return _properties[idx];
}

Key-Value Coding

Both ordered and unordered collections can be hidden behind the to-many Key-Value Coding accessors. These methods also give client code the opportunity to use KVC’s mutable proxy collections, treating the real implementation (whatever it is) as if it were an NSMutableArray or NSMutableSet.

- (NSUInteger)countOfProperties
{
    return [_properties count];
}

- (NSPropertyDescription *)objectInPropertiesAtIndex: (NSUInteger)index
{
    return _properties[index];
}

- (NSArray *)propertiesAtIndexes: (NSIndexSet *)indexes
{
    return [_properties objectsAtIndexes: indexes];
}

Block Application (or Higher-Order Messaging)

You could decide not to give out access to the collection at all, but to allow clients to apply work to the objects in that collection.

- (void)enumeratePropertiesWithBlock: (void (^)(NSPropertyDescription *propertyDescription))workBlock
{
    NSUInteger count = [_properties count];
    dispatch_apply(count, _myQueue, ^(size_t index) {
        workBlock(_properties[index]);
    });
}

Higher-order messaging is basically the same thing but without the blocks. Clients could supply a selector or an invocation to apply to each object in the collection.

Visitor

Like higher-order messaging, Visitor is a pattern that lets clients apply code to the objects in a collection regardless of the implementation or even logical structure of the collection. It’s particularly useful where client code is likely to need to maintain some sort of state across visits; a compiler front-end might expose a Visitor interface that clients use to construct indices or symbol tables.

- (void)acceptPropertyVisitor: (id <NSEntityDescriptionVisitor>)aVisitor
{
    for (NSPropertyDescription *property in _properties)
    {
        [aVisitor visit: property];
    }
}

Benefits of Hiding the Collection

By hiding the implementation of a collection behind its interface, you’re free to change the implementation whenever you want. This is particularly true of the more abstract approaches like Enumerator, Fast enumeration, and Block Application, where clients only find out that there is a collection, and none of the details of whether it’s indexed or not, sparse or not, precomputed or lazy and so on. If you started with an array but realise an indexed set or even an unindexed set would be better, there’s no problem in making that change. Clients could iterate over objects in the collection before, they can now, nothing has changed—but you’re free to use whatever algorithms make most sense in the application. That’s a specific example of the Principle of Least Knowledge.

With the Key-Value Coding approach, you may be able to answer some questions in your class without actually doing all the work to instantiate the collected objects, such as the Key-Value Coding collection operators.

Additionally, there could be reasons to control use of the collected objects. For mutable collections it may be better to allow Block Application than let clients take an array copy which will become stale. Giving out the mutable collection itself would lead to all sorts of trouble, but that can be avoided just with a copy.

Benefits of Providing the Collection

It’s easier. Foundation already provides classes that can do collections and all of the things you might want to do with them (including enumeration, KVC, and application) and you can just use those implementations without too much work: though notice that you can do that while still following the Fast Enumeration pattern.

It’s also conventional. Plenty of examples exist of classes that give you a collection of things they’re responsible for, like subviews, child view controllers, table view columns and so on (though notice that the table view previously gave an Enumerator of selected columns, and lets you Apply Block to row views). Doing the same thing follows a Principle of Least Surprise.

When integrating your components into an app, there may be expectations that a given collection is used in a context where a Foundation collection class is expected. If your application uses an array controller, you need an array: you could expect the application to provide an adaptor, or you could use Key-Value Coding and supply the proxy array, but it might be easiest to just give the application access to the collection in the first place.

Finally, the Foundation collection classes are abstract, so you still get some flexibility to change the implementation by building custom subclasses.

Conclusion

There isn’t really a “best” way to do collections, there are benefits and limitations of any technique. By understanding the alternatives we can choose the best one for any situation (or be like NSEntityDescription and do a bit of each). In general though, giving interfaces to get or apply work to elements of the collection gives you more flexibility and control over how the collection is maintained, at the cost of being more work.

Coda

“More than one thing” isn’t necessarily a collection. It could be better modelled as a source, that generates objects over time. It could be a cursor that shows the current value but doesn’t remember earlier ones, a bit like an Enumerator. It could be something else. The above argument doesn’t consider those cases.

Posted in Foundation, OOP, software-engineering | Comments Off on On designing collections

On rewriting your application

I’m really far behind on podcasts. I have a long commute, and listen to one audiobook every month, filling the slack time with a selection of podcasts. It happens that between two really long books (Cryptonomicon by Neal Stephenson and The Three Musketeers by Alexandre Dumas, both of which I’d recommend) and quite a few snow days I’ve managed to fall behind.

This is the context in which I listened to Episode 77 of iDeveloper Live, on whether to rewrite or not rewrite an application. I was meant to be on that program but had to pull out due to a combination of being ill and pressing on with writing what would become Discworld: the Ankh-Morpork Map. I imagine that one of these two factors was the cause of the other.

Anyway, listening to the podcast made me decide to put my case forward. It’s unfortunate that I wasn’t part of the live discussion but I hope that what I have to say still has value. I’d recommend listening to what Scotty, Uli, John and Pilky have to say on the original podcast, too. If I make references to the discussion in the podcast, I’ll apologise in advance for failing to remember which person said what.

The view from the outside

The point that got me thinking about this was the marketing for a version x.0 of an app. I don’t remember what the app was, I think Danny Greg probably posted the link to it. The release described how the app was “completely rewritten” from the previous version; an announcement that’s not uncommon in software release notes.

My position is this: A complete rewrite is neither a feature nor a benefit. It’s a warning. It says “warning: you might like the previous version, in fact that’s probably why you’re buying it, but what we’re now selling you has nothing in common with it”. It says “warning: the workflow you’re accustomed to in this app may no longer work, or may have different bugs than the ones you’ve discovered how to work around”. It says “warning: this product and the previous version are similar in name alone”. This is what I infer when I read that your product has been rewritten.

The view from the inside

As programmers, we’re saying “but I’ve got to rewrite this, it’s so old and crappy”. Why is it old and crappy? Is it because we weren’t trained to write readable code, and we weren’t trained to read code? Someone on the podcast referred to the “you should hate code you wrote six months ago or you’re not learning” trope. No. You should look at code you were writing six months ago and see how it could be improved. You should be able to decide whether to make those improvements, depending on whether they would benefit the product.

Many of the projects I’ve worked on have taken more than six months to complete. In each case, we could have either released it, or we could still be in a cycle of finding code that was modified more than six months ago, looking at it in disgust, throwing it away and writing it again—and waiting for the new regression bug reports to come in.

Bear in mind that source code is a liability, not an asset. When you tear something out to rewrite it from scratch, you’re using up time and money to create a thing that provides the same value as the thing you’re replacing. It’s at times like this that we enjoy waving our hands and talking about Technical Debt. Martin Fowler:

The tricky thing about technical debt, of course, is that unlike money it’s impossible to measure effectively. The interest payments hurt a team’s productivity, but since we CannotMeasureProductivity, we can’t really see the true effect of our technical debt.

We can’t really see the true effect. So how do you know that this rewrite is going to pay off? If you have some problem now it might be clear that this problem can be addressed more cheaply with a rewrite than by extending or modifying existing code. This is the situation Uli described in the podcast; they’d used some third-party library in version 1, which got them a quick release but had its problems. Having learned from those problems, they decided to replace that library in version 2.

Where you have problems, you can solve them either by modification or by rewriting. If you think that some problem might occur in the future, then leave it: you don’t make a profit by solving problems that no-one has.

A case study

While I had a few jobs in computing beforehand, all of which required that I write code, my first job where the title meant “someone who writes code” started about six years ago, at a company that makes anti-virus software. I was a developer (and would quickly become lead developer, despite not being ready) on the Mac version of the software.

This software had what could be described as an illustrious history: it was older than some of the readers of this blog (which is not uncommon: Cocoa is old enough to vote in the UK and UNIX is middle-aged). It started life as a Lightspeed/THINK C product, then become a PowerPlant product at around the time of the PowerPC transition. When I started working on it in 2007 the PowerPlant user interface still existed, but it did look out of place and dated. In addition, Apple were making noises about the library not being supportable on future versions of Mac OS X, so the first project for the new team was to build a new UI in Cocoa.

We immediately set out getting things wrong more quickly than any other team in the company. The lead developer when I joined had plenty of experience on the MS-DOS and Windows versions of the product, but had never worked on a Mac nor in Objective-C: then I became the lead developer, having worked in Objective-C but never in a team of more than one person. I won’t go into all of the details but the project ended up taking multiple times its estimated duration: not surprising, when the team had never worked together and none of the members had worked on this type of project so our estimates were really random numbers.

At the outset of this project, being untrained in the reading of other people’s code, I was dismayed by what I saw. I repeatedly asked for permission to rewrite other parts of the system that had copyright dates from when Kurt Cobain was still making TV appearances. I was repeatedly refused: the correct decision as while the old code had its bugs it was a lot more stable than what we were writing, and as it had already been written it cost a lot less to supply to the customer[*].

Toward the eventual end of the project, I asked my manager why we hadn’t given up on it after a year of getting things wrong, declared that a learning exercise and started over. Essentially, why couldn’t we take the “I hate the code I wrote last year, let’s start from scratch” approach. His answer was that at least one person would’ve got frustrated and quit after having their code thrown away; then we’d have no product and also no team so would not be in a better position.[*]

Eventually the product did get out of the door, and while I’m no longer involved with it I can tell that the version shipping today still has most of the moving parts that were developed during my time and before. Gradual improvement, responding to changes in what customers want and what suppliers provide, has stood that product in good stead for over two decades.

[*] It’s important to separate these two arguments from the Sunk Cost Fallacy. In neither case are we including the money spent on prior work. The first paragraph says “from today’s perspective, what we already have is free and what you haven’t written is not free, but they both do the same thing”. The second paragraph says “from today’s perspective, finishing what you’ve done costs a lot of money. Starting afresh costs a lot of money and introduces social disruption. But they both do the same thing.”

Posted in Business, software-engineering | Leave a comment