More on UIAutomation tests

Update

The information below is mostly redundant. After filing a bug report with Apple, their engineers determined that the Xcode-detected set of macro actions (find a text field, double click, enter text) weren’t working because the double click action wasn’t editing the text field. It is possible to use UIAutomation Tests, you just have to carefully review the UI actions and determine that they have the effect expected, particularly after letting Xcode record UI macros.

Original Post

Unfortunately my work to organise UIAutomation tests has hit the stumbling block that the UI Automation runner doesn’t use the main thread for main-thread-only APIs.

In Xcode 9 and High Sierra, the authors of that post I just linked found that it was possible to turn off the main thread checker in the Test configuration of the build scheme and get working tests anyway. Unfortunately that doesn’t work for me in Xcode 10 and Mojave: the main thread checker isn’t killing the app: the TSM subsystem is just refusing to do its thing. So my tests can’t do straightforward things like write text into a text field. Unfortunately this is a “it’s not me, it’s you” moment, and I don’t think I can carry on using Xcode’s UI tests for my goals.

However, I still want to be able to write “end-to-end” level tests to drive my development. I have (at least) three ways to proceed:

  • I could find a third party library and discover whether it has the main thread problem. Calabash doesn’t support Mac apps, and the other examples I can find (Cucumberish and TABTestKit) both rely on UI Automation so presumably don’t address the main thread problem.
  • I could write the tests in AppleScript. That would be a good way to build up the AppleScript UI for the app, but it doesn’t represent an end-to-end test of the GUI.
  • I could write the tests using NSApplication.sendEvent(_ event:) to simulate clicks, scrolls and text entry, and use the unit test runner to host them. That could work, but I have my doubts (I would guess that the runner is synchronous and stalls the main thread).

I discovered that it is possible to write the test “at the UI level” but in the unit runner, using a combination of key events and AppKit API like sendAction( to:). The trade-offs of this approach:

  • it takes longer, as the abstractions needed to easily find and use AppKit controls don’t (currently) exist
  • it doesn’t use the Accessibility interface so isn’t an accessibility audit at the same time as a correctness test
  • you don’t hit the same problems as with the UI Automation runner
  • it’s much faster

This may be the best approach for now, though I’d welcome other views.

given-when-then in XCTest

I started writing a new Mac app, and I started doing it by driving the implementation through Xcode UI Automation tests. But then it turned out I was driving the test infrastructure as much as the tests, and it’s that I want to talk about.

Given, When, Then

My (complete, Xcode UI Automation) test looks like this:

func testAddingANoteResultsInANoteBeingAdded() {
    given("An empty notebook")
    when("I add a note to the notebook")
    then("There is a note in the notebook")
}

The test case class has an object called a World, which holds, well, the test’s world. There are two parts to this.

The World holds regular expressions associated with blocks, where each block does some part of the test if its associated regular expression matched the description of the test. As an example, my test fixture sets up this association:

try world.then(matchingExpectation: "^There is a note in the notebook$",
                work: { _, world in
                guard let notebook:LabraryNotebook
                  = world.getFromState("TheNotebook") as? LabraryNotebook else {
                    XCTFail("No notebook to test")
                    return
                }
                XCTAssertEqual(notebook.countOfNotes(), 1,
                  "There should be one row in the notes table")
})

We’ll get back to how that block is implemented later. For the moment, I want to make it clear that this is a way to organise a UI test (or, indeed, any other functional test) using XCTest: it is not a new test framework. The test case class still subclasses XCTestCase, and assertions are still made with the XCTAssert* macros/functions. That’s just all wrapped up in this given/when/then structure.

Let’s look at the block’s two parameters: the first is an array of the regular expression’s capture groups so that you can find out information about the test specification, should you want.

The other argument is a reference to the World, which enables the second feature of the World: as state storage so that each part of the test can communicate with later parts. Notice that the when clause in my test says it adds a note to “the notebook”, and the then clause checks that there is a note in “the notebook”. How do they both use the same notebook object? The when clause stores it on the World using world.storeInState(), and the then clause retrieves it with world.getFromState().

Page Objects

Rather than putting XCUIElement goop directly in my test blocks, I use an abstraction called the Page Object pattern, popular among people writing browser tests in Selenium. This puts an adapter between my tests and my UI controls, so the test says (for example) app.newDocument() and the Application page object knows that that means finding the “File” menu, clicking it, then clicking the “New” menu item.

The way to create a new document in a Cocoa app has not changed since 1987 and may not change soon. But the details of my own UI surely will, and will change at a different rate than the goals of the people using it. While someone may want to add a note for the rest of time, there may not always be an “Add Note” button. So my test can continue to say:

when("I add a note to the notebook")

but the page object for a document can change from:

func addANote() {
    let app = XCUIApplication()
    let window = app.windows[documentName]
    let control = window.buttons["Add Note"]
    control.click()
}

to whatever will find and drive the interface in my redesigned application.

Would you like this?

I’m happy to package the given/when/then organisation up and release it under an open source licence so that you can use it in your own apps. As I’ve only just written the code, I’ve yet to do that, but it’s coming! I’m aware that there are multiple ways of getting/using Swift libraries, so if you’re interested please let me know whether you would expect to use an Xcode project that builds a framework, a Swift PM package, a CocoaPod or a Carthage…cart… so I can support you using the software in your way.

To become a beginner, first become an expert

We have a whole load of practices in programming that only really work well if you’re already good at whatever the process is supposed to help with.

Scrum is a process improvement framework, but only if you already know how to do process improvement. If you don’t, then Scrum is just the baseline mini-waterfall process with a chance to air your dirty laundry every fortnight.

Agile is good at helping you embrace change, but only if you’re already good enough at managing change to understand which changes should be embraced.

#NoEstimates helps you avoid the overhead of estimates, but only if you’re already good enough at estimates to know that you always write user stories that take 0.5-2 days to implement.

TDD helps you design your APIs, but only if you’re already good enough at API design to understand things like dependency injection and loose coupling.

Microservices help you isolate modules, but only if you’re already good enough at modularity not to get swamped in HTTP calls.

This is all very well for selling consultancy (“if your [agile] isn’t working, then you aren’t [agiling] hard enough, let me [agile] you some more”) but where’s the on-ramp?

Working Effectively with Legacy Code

I gave a talk to my team at ARM today on Working Effectively with Legacy Code by Michael Feathers. Here are some notes I made in preparation, which are somewhat related to the talk I gave.

This may be the most important book a software developer can
read. Why? Because if you don’t, then you’re part of the problem.

It’s obviously a lot easier and a lot more enjoyable to work on
greenfield projects all the time. You get to choose this week’s
favourite technologies and tools, put things together in the ways that
suit you now, and make progress because, well anything is progress
when there’s nothing there already. But throwing away an existing
system and starting from scratch makes it easy to throw away the
lessons learned in developing that system. It may be ugly, and patched
up all over the place, but that’s because each of those patches was
needed. They each represent something we learned about the product
after we thought we were done.

The new system is much more likely to look good from the developer’s
perspective
, but what about the users’? Do they want to pay again
for development of a new system when they already have one that mostly
works? Do they want to learn again how to use the software? We have
this strange introspective notion that professionalism in software
development means things that make code look good to other coders:
Clean Code, “well-crafted” code. But we should also have some
responsibility to those people who depend on us and who pay our way,
and that might mean taking the decision to fix the mostly-working
thing.

A digression: Lehman’s Laws

Manny Lehman identified three different categories of software system:
those that are exactly specified, those that implement
well-understood procedures, and those that are influenced by the
environment in which they run. Most software (including ours) comes
into that last category, and as the environment changes so must the
software, even if there were no (known) problems with it at an earlier
point in its evolution.

He expressed
Laws governing the evolution of software systems,
which govern how the requirements for new development are in conflict
with the forces that slow down maintenance of existing systems. I’ll
not reproduce the full list here, but for example on the one hand the
functionality of the system must grow over time to provide user
satisfaction, while at the same time the complexity will increase and
perceived quality will decline unless it is actively maintained.

Legacy Code

Michael Feather’s definition of legacy code is code without tests. I’m
going to be a bit picky here: rather than saying that legacy code is
code with no tests, I’m going to say that it’s code with
insufficient tests
. If I make a change, can I be confident that I’ll
discover the ramifications of that change?

If not, then it’ll slow me down. I even sometimes discard changes
entirely, because I decide the cost of working out whether my change
has broken anything outweighs the interest I have in seeing the change
make it into the codebase.

Feathers refers to the tests as a “software vice”. They clamp the
software into place, so that you can have more control when you’re
working on it. Tests aren’t the only tools that do this: assertions
(and particularly Design by Contract) also help pin down the software.

How do I test untested code?

The apparent way forward then when dealing with legacy code is to
understand its behaviour and encapsulate that in a collection of unit
tests. Unfortunately, it’s likely to be difficult to write unit tests
for legacy code, because it’s all tightly coupled, has weird and
unexpected dependencies, and is hard to understand. So there’s a
catch-22: I need to make tests before I make changes, but I need to
make changes before I can make tests.

Seams

Almost the entire book is about resolving that dilemma, and contains a
collection of patterns and techniques to help you make low-risk
changes to make the code more testable, so you can introduce the tests
that will help you make the high-risk changes. His algorithm is:

  1. identify the “change points”, the things that need modifying to
    make the change you have to make.
  2. find the “test points”, the places around the change points where
    you need to add tests.
  3. break dependencies.
  4. write the tests.
  5. make the changes.

The overarching model for breaking dependencies is the “seam”. It’s a
place where you can change the behaviour of some code you want to
test, without having to change the code under test itself. Some examples:

  • you could introduce a constructor argument to inject an object
    rather than using a global variable
  • you could add a layer of indirection between a method and a
    framework class it uses, to replace that framework class with a
    test double
  • you could use the C preprocessor to redefine a function call to use
    a different function
  • you can break an uncohesive class into two classes that collaborate
    over an interface, to replace one of the classes in your tests

Understanding the code

The important point is that whatever you, or someone else, thinks
the behaviour of the code should be, actually your customers have paid
for the behaviour that’s actually there and so that (modulo bugs) is
the thing you should preserve.

The book contains techniques to help you understand the existing code
so that you can get those tests written in the first place, and even
find the change points. Scratch refactoring is one technique: look
at the code, change it, move bits out that you think represent
cohesive functions, delete code that’s probably unused, make notes in
comments…then just discard all of those changes. This is like Fred
Brooks’s recommendation to “plan to throw one away”, you can take what
you learned from those notes and refactorings and go in again with a
more structured approach.

Sketching is another technique recommended in the book. You can draw
diagrams of how different modules or objects collaborate, and
particularly draw networks of what parts of the system will be
affected by changes in the part you’re looking at.

Yes, you may delete tests

A frequently-presented objection to the concept of writing automated tests is that it ossifies the implementation of the system under test. “If I’ve got all the tests you’re proposing,” I hear, “then I won’t be able to make any changes without breaking the tests.” There are two aspects to this problem.

Tests should make assertions about the externally-observable behaviour of the system under test. If you have tests that assert that a particular private method does its thing in a particular way, then yes, you are baking in the assumption that this private method works in a particular way. And yes, that makes it harder to change things. But it isn’t necessary.

Any test that is no longer useful is no longer useful: delete it. Those tests that helped you to build the thing were useful while you were building the thing: if they’re not helping you now then just get rid of them.

When you’re building a component, you’re probably thinking quite deeply about the algorithm it will implement, and how it will communicate with its collaborators. Thinking about those will mean writing tests about them, and whenever those tests are in the suite they’ll require the algorithm be implemented in a particular way, and that the module talk to its collaborators in a particular way. Do you no longer care about those details, as long as it does the right thing? Delete the tests, and just keep the ones that make sure it does the right thing.

Contractually-obligated testing

About a billion years ago, Bertrand Meyer (he of Open-Closed Principle fame) introduced a programming language called Eiffel. It had a feature called Design by Contract, that let you define constraints that your program had to adhere to in execution. Like you can convince C compilers to emit checks for rules like integer underflow everywhere in your code, except you can write your own rules.

To see what that’s like, here’s a little Objective-C (I suppose I could use Eiffel, as Eiffel Studio is in homebrew, but I didn’t). Here’s my untested, un-contractual Objective-C Stack class.

@interface Stack : NSObject

- (void)push:(id)object;
- (id)pop;

@property (nonatomic, readonly) NSInteger count;

@end

static const int kMaximumStackSize = 4;

@implementation Stack
{
    __strong id buffer[4];
    NSInteger _count;
}

- (void)push:(id)object
{
    buffer[_count++] = object;
}

- (id)pop
{
    id object = buffer[--_count];
    buffer[_count] = nil;
    return object;
}

@end

Seems pretty legit. But I’ll write out the contract, the rules to which this class will adhere provided its users do too. Firstly, some invariants: the count will never go below 0 or above the maximum number of objects. Objective-C doesn’t actually have any syntax for this like Eiffel, so this looks just a little bit messy.

@interface Stack : ContractObject

- (void)push:(id)object;
- (id)pop;

@property (nonatomic, readonly) NSInteger count;

@end

static const int kMaximumStackSize = 4;

@implementation Stack
{
    __strong id buffer[4];
    NSInteger _count;
}

- (NSDictionary *)contract
{
    NSPredicate *countBoundaries = [NSPredicate predicateWithFormat: @"count BETWEEN %@",
                                    @[@0, @(kMaximumStackSize)]];
    NSMutableDictionary *contract = [@{@"invariant" : countBoundaries} mutableCopy];
    [contract addEntriesFromDictionary:[super contract]];
    return contract;
}

- (void)in_push:(id)object
{
    buffer[_count++] = object;
}

- (id)in_pop
{
    id object = buffer[--_count];
    buffer[_count] = nil;
    return object;
}

@end

I said the count must never go outside of this range. In fact, the invariant must only hold before and after calls to public methods: it’s allowed to be broken during the execution. If you’re wondering how this interacts with threading: confine ALL the things!. Anyway, let’s see whether the contract is adhered to.

int main(int argc, char *argv[]) {
    @autoreleasepool {
        Stack *stack = [Stack new];
        for (int i = 0; i < 10; i++) {
            [stack push:@(i)];
            NSLog(@"stack size: %ld", (long)[stack count]);
        }
    }
}

2014-08-11 22:41:48.074 ContractStack[2295:507] stack size: 1
2014-08-11 22:41:48.076 ContractStack[2295:507] stack size: 2
2014-08-11 22:41:48.076 ContractStack[2295:507] stack size: 3
2014-08-11 22:41:48.076 ContractStack[2295:507] stack size: 4
2014-08-11 22:41:48.076 ContractStack[2295:507] *** Assertion failure in -[Stack forwardInvocation:], ContractStack.m:40
2014-08-11 22:41:48.077 ContractStack[2295:507] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException',
 reason: 'invariant count BETWEEN {0, 4} violated after call to push:'

Erm, oops. OK, this looks pretty useful. I’ll add another clause: the caller isn’t allowed to call -pop unless there are objects on the stack.

- (NSDictionary *)contract
{
    NSPredicate *countBoundaries = [NSPredicate predicateWithFormat: @"count BETWEEN %@",
                                    @[@0, @(kMaximumStackSize)]];
    NSPredicate *containsObjects = [NSPredicate predicateWithFormat: @"count > 0"];
    NSMutableDictionary *contract = [@{@"invariant" : countBoundaries,
             @"pre_pop" : containsObjects} mutableCopy];
    [contract addEntriesFromDictionary:[super contract]];
    return contract;
}

So I’m not allowed to hold it wrong in this way, either?

int main(int argc, char *argv[]) {
    @autoreleasepool {
        Stack *stack = [Stack new];
        id foo = [stack pop];
    }
}

2014-08-11 22:46:12.473 ContractStack[2386:507] *** Assertion failure in -[Stack forwardInvocation:], ContractStack.m:35
2014-08-11 22:46:12.475 ContractStack[2386:507] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException',
 reason: 'precondition count > 0 violated before call to pop'

No, good. Having a contract is a bit like having unit tests, except that the unit tests are always running whenever your object is being used. Try out Eiffel; it’s pleasant to have real syntax for this, though really the Objective-C version isn’t so bad.

Finally, the contract is implemented by some simple message interception (try doing that in your favourite modern programming language of choice, non-Rubyists!).

@interface ContractObject : NSObject
- (NSDictionary *)contract;
@end

static SEL internalSelector(SEL aSelector);

@implementation ContractObject

- (NSDictionary *)contract { return @{}; }

- (NSMethodSignature *)methodSignatureForSelector:(SEL)aSelector
{
    NSMethodSignature *sig = [super methodSignatureForSelector:aSelector];
    if (!sig) {
        sig = [super methodSignatureForSelector:internalSelector(aSelector)];
    }
    return sig;
}

- (void)forwardInvocation:(NSInvocation *)inv
{
    SEL realSelector = internalSelector([inv selector]);
    if ([self respondsToSelector:realSelector]) {
        NSDictionary *contract = [self contract];
        NSPredicate *alwaysTrue = [NSPredicate predicateWithValue:YES];
        NSString *calledSelectorName = NSStringFromSelector([inv selector]);
        inv.selector = realSelector;
        NSPredicate *invariant = contract[@"invariant"]?:alwaysTrue;
        NSAssert([invariant evaluateWithObject:self],
            @"invariant %@ violated before call to %@", invariant, calledSelectorName);
        NSString *preconditionKey = [@"pre_" stringByAppendingString:calledSelectorName];
        NSPredicate *precondition = contract[preconditionKey]?:alwaysTrue;
        NSAssert([precondition evaluateWithObject:self],
            @"precondition %@ violated before call to %@", precondition, calledSelectorName);
        [inv invoke];
        NSString *postconditionKey = [@"post_" stringByAppendingString:calledSelectorName];
        NSPredicate *postcondition = contract[postconditionKey]?:alwaysTrue;
        NSAssert([postcondition evaluateWithObject:self],
            @"postcondition %@ violated after call to %@", postcondition, calledSelectorName);
        NSAssert([invariant evaluateWithObject:self],
            @"invariant %@ violated after call to %@", invariant, calledSelectorName);
    }
}

@end

SEL internalSelector(SEL aSelector)
{
    return NSSelectorFromString([@"in_" stringByAppendingString:NSStringFromSelector(aSelector)]);
}

Reflections on “Is TDD Dead”

The first thing I noticed that I needed to change as a result of watching the Is TDD Dead? series is that I started out with a defensive mindset. If I believe in the dogma of a rule, then presumably I’m still a beginner who hasn’t yet learned that there are other rules, each of which has limited applicability. So lesson one is to practise more, to find the limitations of the techniques I use, and to understand what to do when those limits are met.

An unsurprising aspect of the discussion was that the thing I call TDD is not the thing that others call TDD. At some point recently I got converted to the Mockist school. Part of me (the part that doesn’t value my social life) is inclined to write a second edition of test-driven iOS development but then keep both editions available, because the style will have changed so much between the two.

Anyway it is more germane to note that none of the three speakers has a lot of time for mocks. I need to understand this, the experiences they have had that I have not, and the things that I do not know about mocks. Perhaps they view the sorts of tests that I build as the structural tests that they would delete once the system is working.

Is TDD Dead? My questions

These are my questions for parts 5 and 6 of Is TDD Dead?. I’d like to start by thanking the panellists for publishing their discussions.

TDD the Principle

Kent and Martin, why is it that you practise test-driven development? What do you get from it? David, how do you get those same things?

What could change about the way we write software to make TDD redundant or obsolete for Kent and Martin? What could change about the way that TDD is performed to make it useful or beneficial for David?

TDD the Practise

David, is there a way in which we could retain the test-first idea of TDD, but avoid the design problems you encounter in the production code? Are you able to distil a short list of design guidelines that avoid your “design damage”? Kent and Martin, how could TDDers follow those guidelines in their work?

TDD the Community

How have all three of you felt about the community reaction to this discussion? What has been good? What has been bad? What has been missing? Is there information the community could supply to help developers evaluate and choose or discard TDD for their work that is missing? Are there questions developers should ask before choosing whether to adopt or avoid TDD that you believe are not being raised?

I use mocks and I’m happy with that

Both Kent Beck and Martin Fowler have said that they don’t use mock objects in their test-driven development. I do. I use them mostly for the sense described first in my BNR blog post on Mock Objects, namely to stand in for a thing that can receive messages I want to send, but that does not yet exist.

If you look at the code in Test-Driven iOS Development, you’ll find that it uses plenty of test doubles but none of them is a mock object. What has changed in my worldview to move from not-mocking to mocking in that time?

The key information that gave me the insight was this message, pardon the pun! from Alan Kay on object-oriented programming:

The big idea is “messaging” – that is what the kernal[sic] of Smalltalk/Squeak
is all about (and it’s something that was never quite completed in our
Xerox PARC phase). The Japanese have a small word – ma – for “that which
is in between” – perhaps the nearest English equivalent is “interstitial”.
The key in making great and growable systems is much more to design how its
modules communicate rather than what their internal properties and
behaviors should be.

What I’m really trying to do is to define the network of objects connected by message sending, but the tool I have makes me think about objects and what they’re doing. To me, mock objects are the ability to subvert the tool, and force it to let me focus on the ma.

It’s about solving problems

As ever, there’s a touchstone issue on the programmers’ corner of the intarwebs (the programmers’ corner is actually the same intarwebs everyone else is using, just we model it with geometry so it can have a corner). Here it is:

Alan Kelly predicted that by 2022, TDD will become a prerequisite for employment as a programmer. Let’s leave aside for a moment the issue discussed here and in APPropriate Behaviour, that there is no arbiter of programmer employability.

Responses did not take long in coming. Some TDDers cannot believe that TDD is not already required. Some non-TDDers are incensed that non-TDD is not considered valid. I particularly like this tweet about TDD’s place:

I like to explain that a lot of [TDD is] just a substitute for a decent type system ;).

You know what? That might be true, I’m not sure. Functional programming education has, so far, been to me like a pure map of me onto a slightly older version of me without the side effect of imparting knowledge of functional programming. Bear in mind that I’ve been paid to write LISP in the past, and I struggle with FP. I just can’t get from knowing what functions are to big-picture views of functional programming.

I wouldn’t know the mathematics of a type system if it came up and quacked at me. As far as I know, Hindley is the name of a serial killer and now I’m looking warily at Milner. But I’m fine with that, and I’m fine with you not being fine with that. It’s just that we think about things in different ways.

Other people think about things in other different ways too. And I’m fine with that, too. Let’s look some more at tests.

Many forms of automated test follow a common pattern: Assemble, Act, Assert or Given, When, Then. Let’s talk about Eiffel, a language invented by the only person I know whose opinions have a stronger identity than the person who holds them.

An Eiffel programmer would look at Given and see that this is describing a particular state of the preconditions of the system under test. Or, using theories, a range of preconditions. And they would look at Then, and observe that this is making statements about the postconditions of the system under test.

This Eiffel programmer would probably then ask why you expect them to do all of this, when they already generalised this out in designing the system’s contract. Why write all these tests when the program itself can evaluate its own preconditions and postconditions as it’s chugging along, and you can design to those?

Just as this starts to sound reasonable, a Prolog programmer asks why you’d ever write the When bit at all. Surely you just tell the computer what you’ve got and what you want, and it works out how to get there? (I tried, and it said “NO.” Oh well.)

There are loads of different ways to write software. Many of these are not better nor worse than others. In fact, to even have that discussion you’d need to be able to express and agree upon what “better” means, and I’m not sure we’re there. Some are prevalent because they let people reason about their problems in a way that feels comfortable, or efficient. Others are prevalent because a vendor threw lots of money into marketing. Some of them feel comfortable or efficient because we’ve become used to thinking in those ways, rather than because they were a natural fit onto our ab initio thoughts.

But remember, your goal is not to write software. Your goal is to solve problems, and often software is part of the solution. The tools, practices and principles we have now may be adequate, but they may not be best. They might suit some of us, and not others.

That’s fine. I know how I like to work, and I’m happy to help other people find out about it and try it out. But I’m also happy to find out about and try out other things. I’ll champion what I do, while learning about what others do. And all along I’ll champion the higher-level goal of solving problems, and the professional standard of being able to demonstrate confidence in our solutions.

Maybe I should go and read about type systems now.