I use mocks and I’m happy with that

Both Kent Beck and Martin Fowler have said that they don’t use mock objects in their test-driven development. I do. I use them mostly for the sense described first in my BNR blog post on Mock Objects, namely to stand in for a thing that can receive messages I want to send, but that does not yet exist.

If you look at the code in Test-Driven iOS Development, you’ll find that it uses plenty of test doubles but none of them is a mock object. What has changed in my worldview to move from not-mocking to mocking in that time?

The key information that gave me the insight was this message, pardon the pun! from Alan Kay on object-oriented programming:

The big idea is “messaging” – that is what the kernal[sic] of Smalltalk/Squeak
is all about (and it’s something that was never quite completed in our
Xerox PARC phase). The Japanese have a small word – ma – for “that which
is in between” – perhaps the nearest English equivalent is “interstitial”.
The key in making great and growable systems is much more to design how its
modules communicate rather than what their internal properties and
behaviors should be.

What I’m really trying to do is to define the network of objects connected by message sending, but the tool I have makes me think about objects and what they’re doing. To me, mock objects are the ability to subvert the tool, and force it to let me focus on the ma.

An apology to readers of Test-Driven iOS Development

I made a mistake. Not a typo or a bug in some pasted code (actually I’ve made some of those, too). I perpetuated what seems (now, since I analyse it) to be a big myth in software engineering. I uncritically quoted some work without checking its authority, and now find it lacking. As an author, it’s my responsibility to ensure that what I write represents the best of my knowledge and ability so that you can take what I write and use it to be better at this than I am.

In propagating incorrect information, I’ve let you down. I apologise unreservedly. It’s my fervent hope that this post explains what I got wrong, why it’s wrong, and what we can all learn from this.

The mistake

There are two levels to this. The problem lies in Table 1.1 at the top of page 4 (in the print edition) of Test-Driven iOS Development. Here it is:

Table 1-1 from TDiOSD

As explained in the book, this table is reproduced from an earlier publication:

Table 1.1, reproduced from Code Complete, 2nd Edition, by Steve McConnell (Microsoft Press, 2004), shows the results of a survey that evaluated the cost of fixing a bug as a function of the time it lay “dormant” in the product. The table shows that fixing bugs at the end of a project is the most expensive way to work, which makes sense…

The first mistake I made was simply that I seem to have made up the bit about it being the result of a survey. I don’t know where I got that from. In McConnell, it’s titled “Average Cost of Fixing Defects Based on When They’re Introduced and Detected” (Table 3-1, at the top of page 30). It’s introduced in a paragraph marked “HARD DATA”, and is accompanied by an impressive list of references in the table footer. McConnell:

The data in Table 3-1 shows that, for example, an architecture defect that costs $1000 to fix when the architecture is being created can cost $15,000 to fix during system test.

As already covered, the first problem is that I misattributed the data in the table. The second problem, and the one that in my opinion I’ve let down my readers the most by not catching, is that the table is completely false.

Examining the data

I was first alerted to the possibility that something was fishy with these data by the book the Leprechauns of Software Engineering by Laurent Bossavit. His Appendix B has a broad coverage of the literature that claims to report the exponential increase in cost of bug fixing.

It was this that got me worried, but I thought deciding that the table was broken on the basis of another book would be just as bad as relying on it from CC2E was in the first place. I therefore set myself the following question:

Is it possible to go from McConnell’s Table 3-1 to data that can be used to reconstruct the table?

My results

The first reference is Design and Code Inspections to Reduce Errors in Program Development. In a perennial problem for practicing software engineers, I can’t read this paper: I subscribe to the IEEE Xplore library, but they still want more cash to read this title. Laurent Bossavit, author of the aforementioned Leprechauns book, pointed out that the IEEE often charge for papers that are available for free elsewhere, and that this is the case with this paper (download link).

The paper anecdotally mentions a 10-100x factor as a result of “the old way” of working (i.e. without code inspections). The study itself looks at the amount of time saved by adding code reviews to projects that hitherto didn’t do code reviews; even if it did have numbers that correspond to this table I’d find it hard to say that the study (based on a process where the code was designed such that each “statement” in the design document corresponded to 3-10 estimated code statements, and all of the code was written in the PL/S language before a compiler pass was attempted) has relevance to modern software practice. In such a scheme, even a missing punctuation symbol is a defect that would need to be detected and reworked (not picked up by an IDE while you’re typing).

The next I discovered was Boehm and Turner’s “Balancing Agility and Discipline”. McConnell doesn’t tell us where in this book he was looking, and it’s a big book. Appendix E has a lot of secondary citations supporting “The steep version of the cost-of-change curve”, but quotes figures from “100:1” to “5:1” comparing “requirements phase” changes to “post-delivery” changes. All defect fixes are changes but not all changes are defect fixes, so these numbers can’t be used to build Table 3-1.

The graphs shown from studies for agile Java projects have “stories” on the x axis and “effort to change” in person/hour on the y-axis; again not about defects. These numbers are inconsistent with the table in McConnell anyway. As we shall see later, Boehm has trouble getting his data to fit agile projects.

“Software Defect Removal” by Dunn is another book, which I couldn’t find.

“Software Process Improvement at Hughes Aircraft” (Humphrey, Snyder, and Willis 1991) The only reference to cost here is a paragraph on “cost/performance index” on page 22. The authors say (with no supporting data; the report is based on a confidential study) that costs for software projects at Hughes were 6% over budget in 1987, and 3% over budget in 1990. There’s no indication of whether this was related to the costs of fixing defects, or the “spread” of defect discovery/fix phases. In other words this reference is irrelevant to constructing Table 3-1.

The other report from Hughes Aircraft is “Hughes Aircraft’s Widespread Deployment of a Continuously Improving So” by Ron R. Willis, Robert M. Rova et al.. This is the first reference I found to contain useful data! The report is looking at 38,000 bugs: the work of nearly 5,000 developers over dozens of projects, so this could even be significant data.

It’s a big report, but Figure 25 is the part we need. It’s a set of tables that relate the fix time (in person-days) of defects when comparing the phase that they’re fixed with the phase in which they’re detected.

Unfortunately, this isn’t the same as comparing the phase they’re discovered with the phase they’re introduced. One of the three tables (the front one, which obscures parts of the other two) looks at “in-phase” bugs: ones that were addressed with no latency. Wide differences in the numbers in this table (0.36 days to fix a defect in requirements analysis vs 2.00 days to fix a defect in functional test) make me question the presentation of Table 3-1: why put “1” in all of the “in-phase” entries in that table?

Using these numbers, and a little bit of guesswork about how to map the headings in this figure to Table 3-1, I was able to use this reference to construct a table like Table 3-1. Unfortunately, my “table like Table 3-1” was nothing like Table 3-1. Far from showing an incremental increase in bug cost with latency, the data look like a mountain range. In almost all rows the relative cost of a fix in System Test is greater than in maintenance.

I then looked at “Calculating the Return on Investment from More Effective Requirements Management” by Leffingwell. I have to hope that this 2003 webpage is a reproduction of the article cited by McConnell as a 1997 publication, as I couldn’t find a paper of that date.

This reference contains no primary data, but refers to “classic” studies in the field:

Studies performed at GTE, TRW, and IBM measured and assigned costs to errors occurring at various phases of the project life-cycle. These statistics were confirmed in later studies. Although these studies were run independently, they all reached roughly the same conclusion: If a unit cost of one is assigned to the effort required to detect and repair an error during the coding stage, then the cost to detect and repair an error during the requirements stage is between five to ten times less. Furthermore, the cost to detect and repair an error during the maintenance stage is twenty times more.

These numbers are promisingly similar to McConnell’s: although he talks about the cost to “fix” defects while this talks about “detecting and repairing” errors. Are these the same things? Was the testing cost included in McConnell’s table or not? How was it treated? Is the cost of assigning a tester to a project amortised over all bugs, or did they fill in time sheets explaining how long they spent discovering each issue?

Unfortunately Leffingwell himself is already relying on secondary citation: the reference for “Studies performed at GTE…” is a 520-page book, long out of print, called “Software Requirements—Objects Functions and States”. We’re still some degrees away from actual numbers. Worse, the citation at “confirmed in later studies” is to Understanding and controlling software costs by Boehm and Papaccio, which gets its numbers from the same studies at GTE, TRW and IBM! Leffingwell is bolstering the veracity of one set of numbers by using the same set of numbers.

A further reference in McConnell, “An Economic Release Decision Model” is to the proceedings on a 1999 conference on Applications of Software Measurement. If these proceedings have actually been published anywhere, I can’t find them: the one URL I discovered was to a “cybersquatter” domain. I was privately sent the powerpoint slides that comprise this citation. It’s a summary of then-retired Grady’s experiences with software testing, and again contains no primary data or even verifiable secondary citations. Bossavit describes a separate problem where one of the graphs in this presentation is consistently misattributed and misapplied.

The final reference provided by McConnell is What We Have Learned About Fighting Defects, a non-systematic literature review carried out in an online “e-Workshop” in 2002.

Section 3.1 of the report is “the effort to find and fix”. The 100:1 figure is supported by “general data” which are not presented and not cited. Actual cited figures are 117:1 and 135:1 for “severe” defects from two individual studies, and 2:1 for “non-severe” defects (a small collection of results).

The report concludes:

“A 100:1 increase in effort from early phases to post-delivery was a usable heuristic for severe defects, but for non-severe defects the effort increase was not nearly as large. However, this heuristic is appropriate only for certain development models with a clearly defined release point; research has not yet targeted new paradigms such as extreme programming (XP), which has no meaningful distinction between “early” and “late” development phases.”

A “usable heuristic” is not the verification I was looking for – especially one that’s only useful when practising software development in a way that most of my readers wouldn’t recognise.

Conclusion

If there is real data behind Table 3-1, I couldn’t find it. It was unprofessional of me to incorporate the table into my own book—thereby claiming it to be authoritative—without validating it, and my attempts to do so have come up wanting. I therefore no longer consider Table 1-1 in Test-Driven iOS Development to be representative of defect fixing in software engineering, I apologise for including it, and I resolve to critically appraise material I use in my work in the future.

BrowseOverflow as a Code Kata

This article was originally posted over at InformIT.

My goal in writing Test-Driven iOS Development was to take readers from not knowing how to write a test for their iOS apps, to understanding the TDD workflow and how it could work for them. That mirrored the journey that I had taken in learning about test-driven development, and that had led me to wanting to write a book to share what I’d learned with my peers.

This has an interesting effect on the structure of the book. Not all of the sample code from the BrowseOverflow is shown (though it’s all available on GitHub). This isn’t an accident we made in the editorial process. It’s a feature: for any test shown in the book, there are numerous different ways you could write application code that would all be just as good. Anything that causes the test to pass, and doesn’t make the other tests fail, is fine.

Just as the many-worlds model in quantum theory says that there are many branches of the universe that are created every time a decision is made, so there’s a “many-BrowseOverflows” model of Test-Driven iOS Development. Every time a test is written, there are many different solutions to passing the test, leading to there being multiple potential BrowseOverflows. The code that you see on GitHub is just one possible BrowseOverflow, but any other code that satisfies the same tests is one of the other possible BrowseOverflows.

This means that you can treat the book like a kata: the Japanese martial art technique of improving a practice by repeating it over and over. The first time you read Test-Driven iOS Development, you can choose to follow the example code very closely. Where the code isn’t given in the book, you might choose to look at the source code to understand how I solved the problems posed by the tests. But the end of the book is not the end of the journey: you can go back, taking the tests but implementing all of the app code yourself. You can do this as many times as you want, trying to find new ways to write code that produces a BrowseOverflow app.

Finally, when you’re more confident with the red-green-refactor way of working, you can write a BrowseOverflow app that’s entirely your own creation. You can define what the app should do, create tests that express those requirements and write the code to implement it however you like. This is a great way to test out new ways of working, for example different test frameworks like GHUnit or CATCH. It also lets you try out different ways of writing the application code: you could write the same app but trying to use more blocks, or try to use the smallest number of properties in any class, or any other challenge you want to set yourself. Because you know what the software should be capable of, you’re free to focus on whatever skill you’re trying to exercise.

The debugger of royalty

We’ve all got little libraries of code or scripts that help us with debugging. Often these are for logging information in a particular way, or wrapping logs/tests such that they’re only invoked in Debug builds but not in production. Or they clean up your IDE’s brainfarts.

Having created these debug libraries, how are you going to get your production code to use them? Are you really going to sprinkle MyCompanyDebugLog(fmt,…) messages throughout your app?

Introducing step one on the road to sanity: the debug proxy. This is useful when you want to find out how a particular class gets used, e.g. when it provides callbacks that will be invoked by a framework. You can intercept all the messages to the object, and inspect them as you see fit. Here’s the code (written with the assumption that ARC is enabled):

FZADebugProxy.h

#import <Foundation/Foundation.h>

@interface FZADebugProxy : NSProxy

- (id)initWithTarget: (NSObject *)aTarget;

@end

FZADebugProxy.m

#import "FZADebugProxy.h"

@implementation FZADebugProxy {
    NSObject *target;
}

- (id)initWithTarget:(NSObject *)aTarget {
    target = aTarget;
    return self;
}

- (NSMethodSignature *)methodSignatureForSelector:(SEL)sel {
    NSMethodSignature *signature = [target methodSignatureForSelector: sel];
    if (signature == nil) {
        signature = [super methodSignatureForSelector: sel];
    }
    return signature;
}

- (BOOL)respondsToSelector:(SEL)aSelector {
    return [target respondsToSelector: aSelector] ? YES : [super respondsToSelector: aSelector];
}

- (void)forwardInvocation:(NSInvocation *)invocation {
    invocation.target = target;
    SEL aSelector = [invocation selector];
    (void)aSelector;
    [invocation invoke];
}

@end

And no, there isn’t a bug in the -initWithTarget: method. The slightly clumsy extraction of the selector in -forwardInvocation: is done to avoid a common problem with using Objective-C inside the debugger where it decides it doesn’t know the return type of objc_msgSend() and refuses to call the method.

You would use it like this. Here, I’ve modified the app delegate from BrowseOverflow to use a proxy object for the object configuration – a sort of domain-specific IoC container.

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
    BrowseOverflowViewController *firstViewController = [[BrowseOverflowViewController alloc] initWithNibName: nil bundle: nil];
    firstViewController.objectConfiguration = (BrowseOverflowObjectConfiguration *)[[FZADebugProxy alloc] initWithTarget: [[BrowseOverflowObjectConfiguration alloc] init]];
    TopicTableDataSource *dataSource = [[TopicTableDataSource alloc] init];
    [dataSource setTopics: [self topics]];
    firstViewController.dataSource = dataSource;
    self.navigationController.viewControllers = [NSArray arrayWithObject: firstViewController];
    self.window.rootViewController = self.navigationController;
    [self.window makeKeyAndVisible];
    return YES;
}

The bold line is the important change. The cast silences the compiler’s strict type-checking when it comes to property assignment, because it doesn’t believe that NSProxy is of the correct type. Remember this is only debug code that you’re not going to commit: you could switch to a plain old setter, suppress the warning using diagnostic pragmas or do whatever you want here.

At this point, it’s worth running the unit tests and using the app to convince yourself that the behaviour hasn’t changed at all.

So, how do you use it? Shouldn’t there be an NSLog() or something in the proxy class so you can see when the target’s messaged?

No.

Step two on the road to sanity is to avoid printf()-based debugging in all of its forms. What you want to do here is to use Xcode’s debugger actions so that you don’t hard-code your debugging inspection capabilities into your source code.

Set a breakpoint in -[FZADebugProxy forwardInvocation:]. This breakpoint will be met whenever the target object is messaged. Now right-click on the breakpoint marker in the Xcode source editor’s gutter and choose “Edit Breakpoint…” to bring up this popover.

Xcode's breakpoint popover.

In this case, I’ve set the breakpoint to log the selector that was invoked, and crucially to continue after evaluation so that my app doesn’t stop in the debugger every time the target object is messaged. After a bit of a play with the app in the simulator, the debug log looks like this:

GNU gdb 6.3.50-20050815 (Apple version gdb-1752) (Sat Jan 28 03:02:46 UTC 2012)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin".sharedlibrary apply-load-rules all
Attaching to process 3898.
Pending breakpoint 1 - ""FZADebugProxy.m":37" resolved
Current language:  auto; currently objective-c
0x10271:	 "stackOverflowManager"
0x102a0:	 "avatarStore"
0x10271:	 "stackOverflowManager"
0x10271:	 "stackOverflowManager"
0x102a0:	 "avatarStore"

Pretty nifty, yes? You can do a lot with Xcode’s breakpoint actions: running a shell script or an AppleScript are interesting options (you could have Xcode send you an iMessage every time it send your target an Objective-C message). Speaking out the names of selectors is fun for a very short while, but not overly useful.

Xcode’s breakpoint actions give you a much more powerful debugging capability than NSLog(). By using breakpoint actions on an Objective-C proxy object, you can create highly customisable aspect-oriented techniques for debugging your code.

Test-Driven iOS Development

Here it is, after more than a year in the making, the book that they really did want you to read! Test-driven IOS Development (Developer’s Library) (affiliate link) has finally hit the stores[*].

I wrote this book for the simple reason that it didn’t exist. Like Professional Cocoa Application Security (Wrox Professional Guides) (another affiliate link), I knew that the topic was something many people were interested in, including myself. I wanted a single place I could go to find out about using Xcode for writing unit tests, and how the Test-Driven Development approach could be applied to writing Cocoa code with Objective-C.

Well, it turned out that this place didn’t exist, so I tried my best to create it. In the book you’ll see how the BrowseOverflow app was built entirely in a test-first fashion, using OCUnit and Xcode 4. Along the way, I hope you’ll see how TDD can be useful for your own projects.

If you’re interested in finding out more about test-driven development for iOS apps, I’ll be giving a talk on the subject at iOSDev UK in Aberystwyth in July. And of course I’m happy to field questions about the book or about TDD here or over on Twitter. And of course my unit testing video training course is still available from iDeveloper.tv.

Happy testing!

[*]The availability appears to vary by country and format, so if it isn’t in your favourite (e)bookstore yet just be patient and keep refreshing that page like it’s the WWDC site.