I gave a talk to my team at ARM today on Working Effectively with Legacy Code by Michael Feathers. Here are some notes I made in preparation, which are somewhat related to the talk I gave.
This may be the most important book a software developer can
read. Why? Because if you don’t, then you’re part of the problem.
It’s obviously a lot easier and a lot more enjoyable to work on
greenfield projects all the time. You get to choose this week’s
favourite technologies and tools, put things together in the ways that
suit you now, and make progress because, well anything is progress
when there’s nothing there already. But throwing away an existing
system and starting from scratch makes it easy to throw away the
lessons learned in developing that system. It may be ugly, and patched
up all over the place, but that’s because each of those patches was
needed. They each represent something we learned about the product
after we thought we were done.
The new system is much more likely to look good from the developer’s
perspective, but what about the users’? Do they want to pay again
for development of a new system when they already have one that mostly
works? Do they want to learn again how to use the software? We have
this strange introspective notion that professionalism in software
development means things that make code look good to other coders:
Clean Code, “well-crafted” code. But we should also have some
responsibility to those people who depend on us and who pay our way,
and that might mean taking the decision to fix the mostly-working
A digression: Lehman’s Laws
Manny Lehman identified three different categories of software system:
those that are exactly specified, those that implement
well-understood procedures, and those that are influenced by the
environment in which they run. Most software (including ours) comes
into that last category, and as the environment changes so must the
software, even if there were no (known) problems with it at an earlier
point in its evolution.
Laws governing the evolution of software systems,
which govern how the requirements for new development are in conflict
with the forces that slow down maintenance of existing systems. I’ll
not reproduce the full list here, but for example on the one hand the
functionality of the system must grow over time to provide user
satisfaction, while at the same time the complexity will increase and
perceived quality will decline unless it is actively maintained.
Michael Feather’s definition of legacy code is code without tests. I’m
going to be a bit picky here: rather than saying that legacy code is
code with no tests, I’m going to say that it’s code with
insufficient tests. If I make a change, can I be confident that I’ll
discover the ramifications of that change?
If not, then it’ll slow me down. I even sometimes discard changes
entirely, because I decide the cost of working out whether my change
has broken anything outweighs the interest I have in seeing the change
make it into the codebase.
Feathers refers to the tests as a “software vice”. They clamp the
software into place, so that you can have more control when you’re
working on it. Tests aren’t the only tools that do this: assertions
(and particularly Design by Contract) also help pin down the software.
How do I test untested code?
The apparent way forward then when dealing with legacy code is to
understand its behaviour and encapsulate that in a collection of unit
tests. Unfortunately, it’s likely to be difficult to write unit tests
for legacy code, because it’s all tightly coupled, has weird and
unexpected dependencies, and is hard to understand. So there’s a
catch-22: I need to make tests before I make changes, but I need to
make changes before I can make tests.
Almost the entire book is about resolving that dilemma, and contains a
collection of patterns and techniques to help you make low-risk
changes to make the code more testable, so you can introduce the tests
that will help you make the high-risk changes. His algorithm is:
- identify the “change points”, the things that need modifying to
make the change you have to make.
- find the “test points”, the places around the change points where
you need to add tests.
- break dependencies.
- write the tests.
- make the changes.
The overarching model for breaking dependencies is the “seam”. It’s a
place where you can change the behaviour of some code you want to
test, without having to change the code under test itself. Some examples:
- you could introduce a constructor argument to inject an object
rather than using a global variable
- you could add a layer of indirection between a method and a
framework class it uses, to replace that framework class with a
- you could use the C preprocessor to redefine a function call to use
a different function
- you can break an uncohesive class into two classes that collaborate
over an interface, to replace one of the classes in your tests
Understanding the code
The important point is that whatever you, or someone else, thinks
the behaviour of the code should be, actually your customers have paid
for the behaviour that’s actually there and so that (modulo bugs) is
the thing you should preserve.
The book contains techniques to help you understand the existing code
so that you can get those tests written in the first place, and even
find the change points. Scratch refactoring is one technique: look
at the code, change it, move bits out that you think represent
cohesive functions, delete code that’s probably unused, make notes in
comments…then just discard all of those changes. This is like Fred
Brooks’s recommendation to “plan to throw one away”, you can take what
you learned from those notes and refactorings and go in again with a
more structured approach.
Sketching is another technique recommended in the book. You can draw
diagrams of how different modules or objects collaborate, and
particularly draw networks of what parts of the system will be
affected by changes in the part you’re looking at.