Does the history of making software exist?

A bit of a repeated theme in the construction of APPropriate Behaviour has been that I’ve tried to position certain terms or concepts in their historical context, and found it difficult, or impossible to do so with sufficient rigour. There’s an extent to which I don’t want the book to become historiographical so have avoided going too deep into that angle, but have discovered that either no-one else has or that if they have, I can’t find their work.

What often happens is that I can find a history or even many histories, but that these aren’t reliable. I already wrote in the last post on this blog about the difficulties in interpreting references to the 1968 NATO conference; well today I read another two sources that have another two descriptions of the conference and how it kicked off the software crisis. Articles like that linked in the above post help to defuse some of the myths and partisan histories, but only in very specific domains such as the term “software crisis”.

Occasionally I discover a history that has been completely falsified, such as the great sequence of research papers that “prove” how some programmers are ten (or 25, or 1000) times more productive than others or those that “prove” bugs cost 100x more to fix in maintenance. Again, it’s possible to find specific deconstructions of these memes (mainly by reading Laurent Bossavit), but having discovered the emperor is naked, we have no replacement garments with which to clothe him.

There are a very few subjects where I think the primary and secondary literature necessary to construct a history exist, but that I lack the expertise or, frankly, the patience to pursue it. For example you could write a history of the phrase “software engineering”, and how it was introduced to suggest a professionalism beyond the craft discipline that went before it, only to become a symbol of lumbering lethargy among adherents of the craft discipline that came after it. Such a thing might take a trained historian armed with a good set of library cards a few years to complete (the book The Computer Boys Take Over covers part of this story, though it is written for the lay reader and not the software builder). But what of more technical ideas? Where is the history of “Object-Oriented”? Does that phrase mean the same thing in 2013 as in 1983? Does it even mean the same thing to different people in 2013?

Of course there is no such thing as an objective history. A history is an interpretation of a collection of sources, which are themselves interpretations drawn from biased or otherwise limited fonts of knowledge. The thing about a good history is that it lets you see what’s behind the curtain. The sources used will all be listed, so you can decide whether they lead you to the same conclusions as the author. It concerns me that we either don’t have, or I don’t have access to, resources allowing us to situate what we’re trying to do today in the narrative of everything that has gone before and will go hence. That we operate in a field full of hype and innuendo, and lack the tools to detect Humpty Dumptyism and other revisionist rhetoric.

With all that said, are the histories of the software industry out there? I don’t mean the collectors like the museums, who do an important job but not the one I’m interested in here. I mean the histories that help us understand our own work. Do degrees in computer science, to the extent they consider “real world” software making at all, teach the history of the discipline? Not the “assemblers were invented in 1949 and the first binary tree was coded in 19xy” history, but the rise and fall of various techniques, fads, disciplines, and so on? Or have I just volunteered for another crazy project?

I hope not, I haven’t got a good track record at remembering my library cards. Answers on a tweet, please.

An observation designed to aid the reading of books on software

Wherever a book on writing software describes the 1968 NATO conference in Garmisch on Software Engineering, consider whether the clarity of the argument can be improved by adding the following parenthetical clause:

[…], a straw man version of an otherwise real conference that took place in 1968, […]

Usually it can. The proceedings of the conference, which were written post facto by the editors and typists locking themselves in a hotel room with tapes of the sessions and typewriters in various states of repair, are available at Brian Randell’s website along with reflections on their creation. Does the report actually contain the fact presented in whichever book you’re reading now?

Probably not. The article “Crisis, What Crisis?” Reconsidering the Software Crisis of the 1960s and the Origins of Software Engineering investigates the position of the 1968 report in the rhetoric of the software industry and reliance by secondary authors on its content. The conclusion is that the report was largely ignored for about a decade, when it suddenly became the thing that kickstarted the software crisis and software engineering.

It would only be a little satirical to say “the software crisis was invented circa 1980 by Edsger Dijkstra, who postulated its origins in the NATO conference of 1968, a straw man conference” etc.

Retiring the “Apple developers are insular” meme

There’s an old trope used in discussions of Mac and iOS developers, that says they’re too inward-looking. They only think about software in ways that have been “blessed” by Apple, their platform vendor. I’m pretty sure that I’ve used this meme myself though couldn’t find an example in a short Bing for the topic. It’s now time to put that meme out to pasture (though, please, not out to stud. We don’t want that thing breeding.)

“Apple-supplied” is a broad church

Since I’ve been using Macs, it’s included: C, C++, Objective-C, five different assemblers, Java, AppleScript, perl, python, ruby (both vanilla and MacRuby), Tcl, bash, csh, JavaScript, LISP and PHP. Perhaps more. Admittedly on the iOS side options are fewer: but do you know anyone who’s found their way around all of modern C++? You can be a programmer who never leaves the aforementioned collection of languages and yet is familiar with procedural, object-oriented, structured, functional and template programming techniques. There’s no need to learn Haskell just to score developer points.

There is more to heaven and earth

“The community” has actually provided even more options than those listed above. RubyMotion, MonoTouch, MonoMac, PhoneGap/Cordova, wxWidgets, Titanium: these and more provide options for developing for Apple’s platform with third party tools, languages and APIs. To claim that the Apple-based community is insular is to choose an exclusive subset of the community, ignoring all of the developers who, well, aren’t that insular subset. If playing that sort of rhetorical game is acceptable then we aren’t having grown-up discussions. Well, don’t blame me, you started it.

Find out how many iOS apps are built with C#, or LUA, or JavaScript, or Ruby. Now see if you can say with conviction that the community of iOS app developers pays attention to nothing outside the field of Objective-C.

Not everyone need be a generalist

Back when Fred Brooks was writing about the failures of the System/360 project in his book “The Mythical-Man Month” and the article “No Silver Bullet”, he suggested that instead of building armies of programmers to create software the focus should be on creating small, focussed surgical teams with a limited number of people assuming the roles required. The “surgeon” was played by the “chief programmer”, somewhere between a software architect and a middle manager.

One of the roles on these “chief programmer teams” was the language lawyer. It’s the job of the language lawyer to know the programming language and interfaces inside-out, to suggest smarter or more efficient ways of doing what’s required of the software. They’re also great at knowing what happens at edge-case uses of the language (remember the previous post on the various things that happen when you add one to an integer?) which is great for those last-minute debugging pushes towards the end of a project.

Having language lawyers is a good thing. If some people want to focus on knowing a small area of the field inside-out rather than having broader, but shallower, coverage, that’s a good thing. These are people who can do amazing things with real code on real projects.

It doesn’t help any discussion

Even if the statement were true, and if its truth in some way pointed to a weakness in the field and its practitioners, there are more valuable things to do than to express the statement. We need some internet-age name for the following internet-age rhetorical device:

I believe P is true. I state P. Therefore I have made the world better.

If you think that I haven’t considered some viewpoint and my way of working or interacting with other developers suffers as a result, please show that thing to me. Preferably in a friendly compelling fashion that explains the value. Telling me I’m blinkered may be true, but is unlikely to change my outlook. Indeed I may be inclined to find that distasteful and stop listening; the “don’t read the comments” meme is predicated on the belief that short, unkind statements are not worth paying attention too.

Conclusion

Absorption of external ideas does exist in our community, claiming that it doesn’t is a fallacy. Not everyone need learn everything about the entirety of software making in order to contribute; claiming that they should is a fallacy. Making either of these claims is in itself not helpful. Therefore there’s no need to continue on the “Apple developers are insular” meme, and I shan’t.

If you find exciting ideas from other areas of software development, share them with those who will absorb. Worry not about people who don’t listen, but rather wonder what they know and which parts of that you haven’t discovered yet.

Server-side Objective-C

Recently, Kevin Lawler posted an “Informal Technical Note” saying that Apple could clean up on licence sales if only they’d support web backend development. There are only two problems with this argument: it’s flawed, and the precondition probably won’t be met. I’m sure there is an opportunity for server-side programming with Objective-C, but it won’t be met by Apple.

The argument is flawed

The idea is that the community is within a gnat’s crotchet of using ObjC on the web, if only ObjC were slightly better. This represents an opportunity for Apple:

  1. Licensing fees
  2. Sales of Macs for development
  3. Increase share of Objective-C at the expense of Java
  4. Get more devs capable with Objective-C, which is necessary for OSX & iOS development
  5. Developer good will
  6. Steer development on the web

Every one of these "opportunities" seems either inconsequential or unrealistic. Since the dot-com crash, much web server development has been done on free platforms with free tools: LAMP, Java/Scala/Clojure + Tomcat + Spring + Hibernate + Eclipse, Ruby on Rails, Node.js, you name it. The software’s free, you pay for the hardware (directly or otherwise) and the developers. The opportunities for selling licences are few and far between—there are people who will pay good money for good developer tools that save them a good amount of time, but most developers are not those people. The money is made in support and in consultancy. This is why Oracle still exists, and Sun doesn’t.

Of course, Apple already knows this, having turned the $n*10^4-per-license NeXT developer tools into a set of free developer tools.

Speaking of sales, the argument about selling Macs to developers is one that made sense in 2000. When Apple still needed to convince the computer-buying public that the new NeXT-based platform had a future, then selling to technologists and early adopters was definitely a thing. You could make a flaccid but plausible argument that Java and TextMate 1 provided an important boost to the platform. You can’t argue that the same’s true today. Developers already have Macs. Apple is defending their position from what has so far been lacklustre competition; there’s no need for them to chase every sale to picky developers any more.

I’ll sum up the remaining points as not being real opportunities for Apple, and move on. For Objective-C to win, Java does not have to lose (and for that matter, for Apple to win, Objective-C does not have to win; they’ve already moved away from Apple BASIC, Microsoft BASIC, Pascal and C-with-Carbon). Having ObjC backend developers won’t improve the iOS ecosystem any more than Windows 8 has benefitted from all the VB and C# developers in the world. “Developer good will” is a euphemism for “pandering to fickle bloggers”, and I’ve already argued that Apple no longer needs to do that. And Apple already has a strong position in directing the web, due to controlling the majority of clients. Who cares whether they get their HTML from ObjC or COBOL?

It probably won’t happen

Even if Craig Federighi saw that list and did decide Apple’s software division needed a slice of the server pie, it would mean reversing Apple’s 15-year slow exit of the server and services market.

Apple already stopped making servers last year due to a lack of demand. Because OS X is only licensed to run on Apple-badged hardware, even when virtualised, this means there’s no datacenter-friendly way you can run OS X. The Mac Mini server is a brute-force solution: rather than redundant PSUs, you have redundant servers. Rather than lights-out management, you hope some of the redundant servers stay up. Rather than fibre channel-attached storage, you have, well, nothing. And so on.

OS X Server has been steadily declining in both features and quality. The App Store reviews largely coincide with my experience—you can’t even rely on an upgrade from a supported configuration of 10.N, N^≤7 to 10.8 to leave your server working properly any more.

Apple have a server product that (barely) lets a group of people in the same building share wikis and calendars. They separately have WebObjects: a web application platform that they haven’t updated in four years and no longer provide support for. One of their biggest internal server deployments is based on WebObjects (with, apparently, an Oracle stack): almost all of their others aren’t. iCloud is run on external services. They internally use J2EE and Spring MVC.

So Apple have phased out their server hardware and software, and the products they do have do not appear to be well-supported. This is consistent with Tim Cook’s repeated statement of “laser focus” on their consumer products; not so much with the idea that Apple is about to ride the Objective-C unicorn into web server town.

But that doesn’t mean it won’t happen

If there is a growth of server-side Objective-C programming, it’s likely to come from people working without, around or even despite Apple. The options as they currently exist:

  • Objective-Cloud is, putting it crudely, Cocoa as a Service. It’s a good solution as it caters to the “I’m an iOS app maker who just needs to do a little server work” market; in the same way that Azure is a good (first-party) platform for Microsoft developers.
  • GNUstepWeb is based on a platform that’s even older than Apple’s WebObjects. My own attempts to use it for web application development have hit a couple of walls: the GNUstep community has not shown interest in accepting my patches; the frameworks need a lot of love to do modern things like AJAX, REST or security; and even with the help of someone at Heroku I couldn’t get Vulcan to build the framework.
  • Using any Objective-C environment such as GNUstep or the Cocotron, you could build something new or even old-school CGI binaries.
  • If it were me, I’d fork GNUstep and GSW. I’d choose one deployment platform, one web server, and one database, and I’d support the hell out of that platform only. I’d sell that as a hosted platform with the usual tiered support. The applications needed to do the sales, CRM and so on? Written on that platform. As features are needed, they get added; and the support apps are suitable for turning into the tutorials and sample code that help to reduce the support effort.

    Of course, that’s just me.

Can code be “readable”?

Did Isaac Asimov write good stories?

Different people will answer that question in different ways. People who don’t read English and don’t have access to a translation will probably be unable to answer. People who don’t like science fiction on principle (and who haven’t been introduced to his mystery stories) will likely say ‘no’, on principle. Other people will like what he wrote. Some will like some of what he wrote. Others will accept that he did good work but “that it isn’t really for me”.

The answers above are all based on a subjective interpretation, both of Asimov’s work and the question that was asked. You could imagine an interpretation in the form of an appeal to satisfaction: who was the author writing for, and how does the work achieve the aim of satisfying those people? What themes was the author exploring, and how does the work achieve the goal of conveying those themes? These questions were, until the modern rise of literary theory, key ways in which literary criticism analysed texts.

Let us take these ideas and apply them to programming. We find that we demand of our programmers not “can you please write readable code?”, but “can you consider what the themes and audience of this code are, and write in a way that promotes the themes among members of that audience?” The themes are the problems you’re trying to solve, and the constraints on solving them. The audience is, well, it’s the audience; it’s the people who will subsequently have to read and understand the code as a quasi-exclusive collection.

We also find that we can no longer ask the objective-sounding question “did this coder write good code?” Nor can we ask “is this code readable?” Instead, we ask “how does this code convey its themes to its audience?”

In conclusion, then, a sound approach to writing readable code requires author and reader to meet in the middle. The author must decide who will read the code, and how to convey the important information to those readers. The reader must analyse the code in terms of how it satisfies this goal of conveyance, not whether they enjoyed the indentation strategy or dislike dots on principle.

Source code is not software written in a human-readable notation. It’s an essay, written in executable notation.

I published a new book!

Executive summary: it’s called APPropriate Behaviour, head over to the LeanPub site to check it out.

For quite a while, I’ve noticed that posts here are moving away from nuts and bolts code towards questions about evaluating my own performance, working with other developers and the industry in general.

I decided to spend some time working on these and related thoughts, trying to derive some consistent narrative as well as satisfying myself that these ideas were indeed going somewhere. I quickly ended up with about half of a novel-length book.

The other half is coming soon, but in the meantime the book is already published in preview state. To quote from the introduction:

this book is about the things that go into being a programmer that aren’t specifically the programming. It starts fairly close to home, with chapters on development tools, on supporting your own programming needs, and on other “software engineering” practices that programmers should understand and make use of. But by the end of the book we’ll be talking about psychology and metacognition — about understanding how you the programmer function and how to improve that functioning.

As I said, this is currently in very much a preview state—only about half of the content is there, it hasn’t been reviewed, and the thread that runs through it has dropped a few stitches. However, even if you buy the book now you’ll get free updates forever so you’ll get to find out as chapters are added and as changes are made.

At this early stage I’m particularly interested in any feedback readers have. I’ve set up a Glassboard for the book—in the Glassboard app, use invite code XVSSV to join the discussion.

I hope you enjoy APPropriate behaviour!

Surprising ARC performance characteristics

The project I’m working on at the moment has quite tight performance constraints. It needs to start up quickly, do its work at a particular rate and, being an iOS app, there’s a hard limit on how much RAM can be used.

The team’s got quite friendly with Instruments, watching the time profile, memory allocations, thread switches[*] and storage access trying to discover where we can trade one off in favour of another.

[*] this is a topic for a different post, but “dispatch_async() all the things” is a performance “solution” that brings its own problems.

It was during one of these sessions that I noticed a hot loop in the app was spending a lot of time in a C++ collection type called objc::DenseMap. This is apparently used by objc_retain() and objc_release(), the functions used to implement reference counting when Objective-C code is compiled using ARC.

The loop was implemented using the closure interface, -[NSDictionary enumerateKeysAndValuesUsingBlock:. Apparently the arguments to a block are strong references, so each was being retained on entering the block and released on return. Multiply by thousands of objects in the collection and tens of iterations per second, and that was a non-trivial amount of time to spend in memory management functions.

I started to think about other data types in which I could express the same collection—is there something in the C++ standard library I could use?

I ended up using a different interface to the same data type – something proposed by my colleague, Mo. Since Cocoa was released, Foundation data types have been accessible via the CoreFoundation C API[**]. The key difference as far as modern applications are concerned is that the C API uses void * to refer to its content rather than id. As a result, and with appropriate use of bridging casts, ARC doesn’t try to retain and release the objects.

[**]I think that Foundation on OPENSTEP was designed in the same way, but that the C API wasn’t exposed until the OS X 10.0 release.

So this:

[myDictionary enumerateKeysAndObjectsUsingBlock: ^(id key, id object, BOOL *stop) {
  //...
}];

became this:

CFDictionaryRef myCFDictionary = (__bridge CFDictionaryRef)myDictionary;
CFIndex count = CFDictionaryGetCount(myCFDictionary);
void *keys[count];
void *values[count];
CFDictionaryGetKeysAndValues(myCFDictionary, keys, values);

for (CFIndex i = 0; i < count; i++)
{
  __unsafe_unretained id key = (__bridge id)keys[i];
  __unsafe_unretained id value = (__bridge id)values[i];
  //...
}

which turned out to be about 12% faster in this case.

I’ll finish by addressing an open question from earlier, when should I consider ditching Foundation/CoreFoundation completely? There are times when it’s appropriate to move away from those data types. Foundation’s adaptive algorithms are very fast a lot of the time, choosing different representations under different conditions – but aren’t always the best choice.

Considering loops that enumerate over a collection like the loop investigated in this post, a C++ or C structure representation is good if the loop is calling a lot of messages. Hacks like IMP caching can also help, in which this:

for (MyObject *foo in bar)
{
  [foo doThing];
}

becomes this:

SEL doThingSelector = @selector(doThing);
IMP doThingImp = class_getMethodImplementation([MyObject class], doThingSelector);

for (MyObject *foo in bar)
{
  doThingImp(foo, doThingSelector);
}

If you’ve got lots (like tens of thousands, or hundreds of thousands) of instances of a class, Objective-C will add a measurable memory impact in the isa pointers (each object contains a pointer to its class), and the look aside table that tracks retain counts. Switching to a different representation can regain that space: in return for losing dynamic dispatch and reference-counted memory management—automatic or otherwise.

Object-Oriented callback design

One of the early promises of object-oriented programming, encapsulated in the design of the Smalltalk APIs, was a reduction – or really an encapsulation – of the complexity of code. Many programmers believe that the more complex a method or function is, the harder it is to understand and to maintain. Some developers even use tools to measure the complexity quantitatively, in terms of the number of loops or conditions present in the function’s logic. Get this “cyclomatic complexity” figure too high, and your build fails. Unfortunately many class APIs have been designed that don’t take the complexity of client code into account. Here’s an extreme example: the NSStreamDelegate protocol from Foundation.

-(void)stream:(NSStream *)stream handleEvent:(NSStreamEvent)streamEvent;

This is not so much an abstraction of the underlying C functions as a least-effort adaptation into Objective-C. Every return code the lower-level functionality exposes is mapped onto a code that’s funnelled into one place; this delegate callback. Were you trying to read or write the stream? Did it succeed or fail? Doesn’t matter; you’ll get this one callback. Any implementation of this protocol looks like a big bundle of if statements (or, more tersely, a big bundle of cases in a switch) to handle each of the possible codes. The default case has to handle the possibility that future version of the API adds a new event to the list. Whenever I use this API, I drop in the following implementation that “fans out” the different events to different handler methods.

-(void)stream:(NSStream *)stream handleEvent:(NSStreamEvent)streamEvent
{
  switch(streamEvent)
  {
  case NSStreamEventOpenCompleted:
    [self streamDidOpen: stream];
    break;
  //...
  default:
    NSAssert(NO, @"Apple changed the NSStream API");
    [self streamDidSomethingUnexpected: stream];
    break;
  }
}

Of course,

NSStream is an incredibly old class. We’ve learned a lot since then, so modern callback techniques are much better, aren’t they? In my opinion, they did indeed get better for a bit. But then something happened that led to a reduction in the quality of these designs. That thing was the overuse of blocks as callbacks. Here’s an example of what I mean, taken from Game Center’s authentication workflow.

@property(nonatomic, copy) void(^authenticateHandler)(UIViewController *viewController, NSError *error)

Let’s gloss over, for a moment, the fact that simply setting this property triggers authentication. There are three things that could happen as a result of calling this method: two are the related ideas that authentication could succeed or fail (related, but diametrically opposed). The third is that the API needs some user input, so wants the app to present a view controller for data entry. Three things, one entry point. Which do we need to handle on this run? We’re not told; we have to ask. This is the antithesis of accepted object-oriented practice. In this particular case, the behaviour required on event-handling is rich enough that a delegate protocol defining multiple methods would be a good way to handle the interaction:

@protocol GKLocalPlayerAuthenticationDelegate

@required
-(void)localPlayer: (GKLocalPlayer *)localPlayer needsToPresentAuthenticationInterface: (UIViewController *)viewController;
-(void)localPlayerAuthenticated: (GKLocalPlayer *)localPlayer;
-(void)localPlayer: (GKLocalPlayer *)localPlayer failedToAuthenticateWithError: (NSError *)error;

@end

The simpler case is where some API task either succeeds or fails. Smalltalk had a pattern for dealing with this which could be both supported in Objective-C, and extended to cover asynchronous design. Here’s how you might encapsulate a boolean success state with error handling in an object-oriented fashion.

typedef id(^conditionBlock)(NSError **error);
typedef void(^successHandler)(id result);
typedef void(^failureHandler)(NSError *error);
- (void)ifThis:(conditionBlock)condition then:(successHandler)success otherwise:(failureHandler)failure
{
  __block NSError *error;
  __block id result;
  if ((result = condition(&error)))
    success(result);
  else
    failure(error);
}

Now you’re telling client code whether your operation worked, not requiring that it ask. Each of the conditions is explicitly and separately handled. This is a bit different from Smalltalk’s condition handling, which works by sending the ifTrue:ifFalse: message to an object that knows which Boolean state it represents. The ifThis:then:otherwise: message needs to deal with the common Cocoa idiom of describing failure via an error object – something a Boolean wouldn’t know about.[] However, the Smalltalk pattern *is possible while still supporting the above requirements: see the coda to this post. This method could be exposed directly as API, or it can be used to service conditions inside other methods:

@implementation NSFileManager (BlockDelete)

- (void)deleteFileAtPath:(NSString *)path success:(successHandler)success failure:(failureHandler)failure
{
    [self ifThis: ^(NSError **error){
        return [self removeItemAtPath: path error: error]?@(1):nil;
    } then: success otherwise: failure];
}

@end

int main(int argc, const char * argv[])
{
    @autoreleasepool {
        [[NSFileManager defaultManager] deleteFileAtPath: @"/private/tmp" success: ^(id unused){
            NSLog(@"Holy crap, you deleted the temporary folder!");
        } failure: ^(NSError *error){
            NSLog(@"Meh, that didn't work. Here's why: %@", error);
        }];
    }
    return 0;
}

[*]As an aside, there’s no real reason that Cocoa needs to use indirect error pointers. Consider the following API:

-(id)executeFetchRequest:(NSFetchRequest *)fetchRequest

The return value could be an NSArray or an NSError. The problem with this is that in almost all cases this puts some ugly conditional code into the API’s client—though only the same ugly condition you currently have to do in testing the return code before examining the error. This separation of success and failure handlers encapsulates that condition in code the client author doesn’t need to see.

Coda: related pattern

I realised after writing this post that the Smalltalk-esque ifTrue:ifFalse: style of conditional can be supported, and leads to some interesting possibilities. First, consider defining an abstract Outcome class:

@interface Outcome : NSObject

- (void)ifTrue:(successHandler)success ifFalse: (failureHandler)failure;

@end

You can now define two subclasses which know what outcome they represent and the supporting data:

@interface Success : Outcome

+ (instancetype)successWithResult: (id)result;

@end

@interface Failure : Outcome

+ (instancetype)failureWithError: (NSError *)error;

@end

The implementation of these two classes is very similar, you can infer the behaviour of Failure from the behaviour of Success:

@implementation Success
{
  id _result;
}

+ (instancetype)successWithResult: (id)result
{
  Success *success = [self new];
  success->_result = result;
  return success;
}

- (void)ifTrue:(successHandler)success ifFalse:(failureHandler)failure
{
  success(_result);
}
@end

But that’s not the end of it. You could use the -ifThis:then:otherwise: method above to implement a Deferred outcome, which doesn’t evaluate its result until someone asks for it. Or you could build a Pending result, which starts the evaluation in the background, resolving to success or failure on completion. Or you could do something else.

An apology to readers of Test-Driven iOS Development

I made a mistake. Not a typo or a bug in some pasted code (actually I’ve made some of those, too). I perpetuated what seems (now, since I analyse it) to be a big myth in software engineering. I uncritically quoted some work without checking its authority, and now find it lacking. As an author, it’s my responsibility to ensure that what I write represents the best of my knowledge and ability so that you can take what I write and use it to be better at this than I am.

In propagating incorrect information, I’ve let you down. I apologise unreservedly. It’s my fervent hope that this post explains what I got wrong, why it’s wrong, and what we can all learn from this.

The mistake

There are two levels to this. The problem lies in Table 1.1 at the top of page 4 (in the print edition) of Test-Driven iOS Development. Here it is:

Table 1-1 from TDiOSD

As explained in the book, this table is reproduced from an earlier publication:

Table 1.1, reproduced from Code Complete, 2nd Edition, by Steve McConnell (Microsoft Press, 2004), shows the results of a survey that evaluated the cost of fixing a bug as a function of the time it lay “dormant” in the product. The table shows that fixing bugs at the end of a project is the most expensive way to work, which makes sense…

The first mistake I made was simply that I seem to have made up the bit about it being the result of a survey. I don’t know where I got that from. In McConnell, it’s titled “Average Cost of Fixing Defects Based on When They’re Introduced and Detected” (Table 3-1, at the top of page 30). It’s introduced in a paragraph marked “HARD DATA”, and is accompanied by an impressive list of references in the table footer. McConnell:

The data in Table 3-1 shows that, for example, an architecture defect that costs $1000 to fix when the architecture is being created can cost $15,000 to fix during system test.

As already covered, the first problem is that I misattributed the data in the table. The second problem, and the one that in my opinion I’ve let down my readers the most by not catching, is that the table is completely false.

Examining the data

I was first alerted to the possibility that something was fishy with these data by the book the Leprechauns of Software Engineering by Laurent Bossavit. His Appendix B has a broad coverage of the literature that claims to report the exponential increase in cost of bug fixing.

It was this that got me worried, but I thought deciding that the table was broken on the basis of another book would be just as bad as relying on it from CC2E was in the first place. I therefore set myself the following question:

Is it possible to go from McConnell’s Table 3-1 to data that can be used to reconstruct the table?

My results

The first reference is Design and Code Inspections to Reduce Errors in Program Development. In a perennial problem for practicing software engineers, I can’t read this paper: I subscribe to the IEEE Xplore library, but they still want more cash to read this title. Laurent Bossavit, author of the aforementioned Leprechauns book, pointed out that the IEEE often charge for papers that are available for free elsewhere, and that this is the case with this paper (download link).

The paper anecdotally mentions a 10-100x factor as a result of “the old way” of working (i.e. without code inspections). The study itself looks at the amount of time saved by adding code reviews to projects that hitherto didn’t do code reviews; even if it did have numbers that correspond to this table I’d find it hard to say that the study (based on a process where the code was designed such that each “statement” in the design document corresponded to 3-10 estimated code statements, and all of the code was written in the PL/S language before a compiler pass was attempted) has relevance to modern software practice. In such a scheme, even a missing punctuation symbol is a defect that would need to be detected and reworked (not picked up by an IDE while you’re typing).

The next I discovered was Boehm and Turner’s “Balancing Agility and Discipline”. McConnell doesn’t tell us where in this book he was looking, and it’s a big book. Appendix E has a lot of secondary citations supporting “The steep version of the cost-of-change curve”, but quotes figures from “100:1” to “5:1” comparing “requirements phase” changes to “post-delivery” changes. All defect fixes are changes but not all changes are defect fixes, so these numbers can’t be used to build Table 3-1.

The graphs shown from studies for agile Java projects have “stories” on the x axis and “effort to change” in person/hour on the y-axis; again not about defects. These numbers are inconsistent with the table in McConnell anyway. As we shall see later, Boehm has trouble getting his data to fit agile projects.

“Software Defect Removal” by Dunn is another book, which I couldn’t find.

“Software Process Improvement at Hughes Aircraft” (Humphrey, Snyder, and Willis 1991) The only reference to cost here is a paragraph on “cost/performance index” on page 22. The authors say (with no supporting data; the report is based on a confidential study) that costs for software projects at Hughes were 6% over budget in 1987, and 3% over budget in 1990. There’s no indication of whether this was related to the costs of fixing defects, or the “spread” of defect discovery/fix phases. In other words this reference is irrelevant to constructing Table 3-1.

The other report from Hughes Aircraft is “Hughes Aircraft’s Widespread Deployment of a Continuously Improving So” by Ron R. Willis, Robert M. Rova et al.. This is the first reference I found to contain useful data! The report is looking at 38,000 bugs: the work of nearly 5,000 developers over dozens of projects, so this could even be significant data.

It’s a big report, but Figure 25 is the part we need. It’s a set of tables that relate the fix time (in person-days) of defects when comparing the phase that they’re fixed with the phase in which they’re detected.

Unfortunately, this isn’t the same as comparing the phase they’re discovered with the phase they’re introduced. One of the three tables (the front one, which obscures parts of the other two) looks at “in-phase” bugs: ones that were addressed with no latency. Wide differences in the numbers in this table (0.36 days to fix a defect in requirements analysis vs 2.00 days to fix a defect in functional test) make me question the presentation of Table 3-1: why put “1” in all of the “in-phase” entries in that table?

Using these numbers, and a little bit of guesswork about how to map the headings in this figure to Table 3-1, I was able to use this reference to construct a table like Table 3-1. Unfortunately, my “table like Table 3-1” was nothing like Table 3-1. Far from showing an incremental increase in bug cost with latency, the data look like a mountain range. In almost all rows the relative cost of a fix in System Test is greater than in maintenance.

I then looked at “Calculating the Return on Investment from More Effective Requirements Management” by Leffingwell. I have to hope that this 2003 webpage is a reproduction of the article cited by McConnell as a 1997 publication, as I couldn’t find a paper of that date.

This reference contains no primary data, but refers to “classic” studies in the field:

Studies performed at GTE, TRW, and IBM measured and assigned costs to errors occurring at various phases of the project life-cycle. These statistics were confirmed in later studies. Although these studies were run independently, they all reached roughly the same conclusion: If a unit cost of one is assigned to the effort required to detect and repair an error during the coding stage, then the cost to detect and repair an error during the requirements stage is between five to ten times less. Furthermore, the cost to detect and repair an error during the maintenance stage is twenty times more.

These numbers are promisingly similar to McConnell’s: although he talks about the cost to “fix” defects while this talks about “detecting and repairing” errors. Are these the same things? Was the testing cost included in McConnell’s table or not? How was it treated? Is the cost of assigning a tester to a project amortised over all bugs, or did they fill in time sheets explaining how long they spent discovering each issue?

Unfortunately Leffingwell himself is already relying on secondary citation: the reference for “Studies performed at GTE…” is a 520-page book, long out of print, called “Software Requirements—Objects Functions and States”. We’re still some degrees away from actual numbers. Worse, the citation at “confirmed in later studies” is to Understanding and controlling software costs by Boehm and Papaccio, which gets its numbers from the same studies at GTE, TRW and IBM! Leffingwell is bolstering the veracity of one set of numbers by using the same set of numbers.

A further reference in McConnell, “An Economic Release Decision Model” is to the proceedings on a 1999 conference on Applications of Software Measurement. If these proceedings have actually been published anywhere, I can’t find them: the one URL I discovered was to a “cybersquatter” domain. I was privately sent the powerpoint slides that comprise this citation. It’s a summary of then-retired Grady’s experiences with software testing, and again contains no primary data or even verifiable secondary citations. Bossavit describes a separate problem where one of the graphs in this presentation is consistently misattributed and misapplied.

The final reference provided by McConnell is What We Have Learned About Fighting Defects, a non-systematic literature review carried out in an online “e-Workshop” in 2002.

Section 3.1 of the report is “the effort to find and fix”. The 100:1 figure is supported by “general data” which are not presented and not cited. Actual cited figures are 117:1 and 135:1 for “severe” defects from two individual studies, and 2:1 for “non-severe” defects (a small collection of results).

The report concludes:

“A 100:1 increase in effort from early phases to post-delivery was a usable heuristic for severe defects, but for non-severe defects the effort increase was not nearly as large. However, this heuristic is appropriate only for certain development models with a clearly defined release point; research has not yet targeted new paradigms such as extreme programming (XP), which has no meaningful distinction between “early” and “late” development phases.”

A “usable heuristic” is not the verification I was looking for – especially one that’s only useful when practising software development in a way that most of my readers wouldn’t recognise.

Conclusion

If there is real data behind Table 3-1, I couldn’t find it. It was unprofessional of me to incorporate the table into my own book—thereby claiming it to be authoritative—without validating it, and my attempts to do so have come up wanting. I therefore no longer consider Table 1-1 in Test-Driven iOS Development to be representative of defect fixing in software engineering, I apologise for including it, and I resolve to critically appraise material I use in my work in the future.

Does that thing you like doing actually work?

Genuine question. I’ve written before about Test-Driven Development, and I’m sure some of you practice it: can you show evidence that it’s better than (or, for that matter, evidence that it’s worse than) some other practice? Statistically significant evidence?

How about security? Can you be confident that there’s a benefit to spending any money or time on information security countermeasures? On what should it be spent? Which interventions are most successful? Can you prove that?

I am, of course, asking whether there’s any evidence in software engineering. I ask rhetorically, because I believe that there isn’t—or there isn’t a lot that’s in a form useful to practitioners. A succinct summary of this position comes courtesy of Anthony Finkelstein:

For the most part our existing state-of-practice is based on anecdote. It is, at its very best quasi-evidence-based. Few key decisions from the choice of an architecture to the configuration of tools and processes are based on a solid evidential foundation. To be truthful, software engineering is not taught by reference to evidence either. This is unacceptable in a discipline that aspires to engineering science. We must reconstruct software engineering around an evidence-based practice.

Now there is a discipline of Evidence-Based Software Engineering, but herein lies a bootstrapping problem that deserves examination. Evidence-Based [ignore the obvious jokes, it’s a piece of specific jargon that I’m about to explain] practice means summarising the significant results in scientific literature and making them available to practitioners, policymakers and other “users”. The primary tools are the systematic literature review and its statistics-heavy cousin, the meta-analysis.

Wait, systematic literature review? What literature? Here’s the problem with trying to do EBSE in 2012. Much software engineering goes on behind closed doors in what employers treat as proprietary or trade-secret processes. Imagine that a particular project is delayed: most companies won’t publish that result because they don’t want competitors to know that their projects are delayed.

Even for studies, reports and papers that do exist, they’re not necessarily accessible to the likes of us common programmers. Let’s imagine that I got bored and decided to do a systematic literature survey of whether functional programming truly does lead to fewer concurrency issues than object-oriented programming.[*] I’d be able to look at articles in the ACM Digital Library, on the ArXiv pre-print server, and anything that’s in Leamington Spa library (believe me, it isn’t much). I can’t read IEEE publications, the BCS Computer Journal, or many others because I can’t afford to subscribe to them all. And there are probably tons of journals I don’t even know about.

[*]Results of asking about this evidence-based approach to paradigm selection revealed that either I didn’t explain myself very well or people don’t like the idea of evidence mucking up their current anecdotal world views.

So what do we do about this state of affairs? Actually, to be more specific: if our goal is to provide developers with better access to evidence from our field, what do we do?

I don’t think traditional journals can be the answer. If they’re pay-to-read, developers will never see them. If they’re pay-to-write, the people who currently aren’t supplying any useful evidence still won’t.

So we need something lighter weight, free to contribute to and free to consume; and we probably need to accept that it then won’t be subject to formal peer review (in exactly the same way that Wikipedia isn’t).

I’ve argued before that a great place for this work to be done is the Free Software Foundation. They’ve got the components in place: a desire to prove that their software is preferable to commercial alternatives; public development projects with some amount of central governance; volunteer coders willing to gain karma by trying out new things. They (or if not them, Canonical or someone else) could easily become the home of demonstrable quality in software production.

Could the proprietary software developers be convinced to open up on information about what practices do or don’t work for them? I believe so, but it wouldn’t be easy. Iteratively improving practices is a goal for both small companies following Lean Startup and similar techniques, and large enterprises interested in process maturity models like CMMI. Both of these require you to know what metrics are important; to measure, test, improve and iterate on those metrics. This can be done much more quickly if you can combine your results from those of other teams—see what already has or hasn’t worked elsewhere and learn from that.

So that means that everyone will benefit if everyone else is publishing their evidence. But how do you bootstrap that? Who will be first to jump from a culture of silence to a culture of sharing, the people who give others the benefit of their experience before they get anything in return?

I believe that this is the role of the platform companies. These are the companies whose value lies not only in their own software, but in the software created by ISVs on their platforms. If they can help their ISVs to make better software more efficiently, they improve their own position in the market.