Working Effectively with Legacy Code

I gave a talk to my team at ARM today on Working Effectively with Legacy Code by Michael Feathers. Here are some notes I made in preparation, which are somewhat related to the talk I gave.

This may be the most important book a software developer can
read. Why? Because if you don’t, then you’re part of the problem.

It’s obviously a lot easier and a lot more enjoyable to work on
greenfield projects all the time. You get to choose this week’s
favourite technologies and tools, put things together in the ways that
suit you now, and make progress because, well anything is progress
when there’s nothing there already. But throwing away an existing
system and starting from scratch makes it easy to throw away the
lessons learned in developing that system. It may be ugly, and patched
up all over the place, but that’s because each of those patches was
needed. They each represent something we learned about the product
after we thought we were done.

The new system is much more likely to look good from the developer’s
perspective
, but what about the users’? Do they want to pay again
for development of a new system when they already have one that mostly
works? Do they want to learn again how to use the software? We have
this strange introspective notion that professionalism in software
development means things that make code look good to other coders:
Clean Code, “well-crafted” code. But we should also have some
responsibility to those people who depend on us and who pay our way,
and that might mean taking the decision to fix the mostly-working
thing.

A digression: Lehman’s Laws

Manny Lehman identified three different categories of software system:
those that are exactly specified, those that implement
well-understood procedures, and those that are influenced by the
environment in which they run. Most software (including ours) comes
into that last category, and as the environment changes so must the
software, even if there were no (known) problems with it at an earlier
point in its evolution.

He expressed
Laws governing the evolution of software systems,
which govern how the requirements for new development are in conflict
with the forces that slow down maintenance of existing systems. I’ll
not reproduce the full list here, but for example on the one hand the
functionality of the system must grow over time to provide user
satisfaction, while at the same time the complexity will increase and
perceived quality will decline unless it is actively maintained.

Legacy Code

Michael Feather’s definition of legacy code is code without tests. I’m
going to be a bit picky here: rather than saying that legacy code is
code with no tests, I’m going to say that it’s code with
insufficient tests
. If I make a change, can I be confident that I’ll
discover the ramifications of that change?

If not, then it’ll slow me down. I even sometimes discard changes
entirely, because I decide the cost of working out whether my change
has broken anything outweighs the interest I have in seeing the change
make it into the codebase.

Feathers refers to the tests as a “software vice”. They clamp the
software into place, so that you can have more control when you’re
working on it. Tests aren’t the only tools that do this: assertions
(and particularly Design by Contract) also help pin down the software.

How do I test untested code?

The apparent way forward then when dealing with legacy code is to
understand its behaviour and encapsulate that in a collection of unit
tests. Unfortunately, it’s likely to be difficult to write unit tests
for legacy code, because it’s all tightly coupled, has weird and
unexpected dependencies, and is hard to understand. So there’s a
catch-22: I need to make tests before I make changes, but I need to
make changes before I can make tests.

Seams

Almost the entire book is about resolving that dilemma, and contains a
collection of patterns and techniques to help you make low-risk
changes to make the code more testable, so you can introduce the tests
that will help you make the high-risk changes. His algorithm is:

  1. identify the “change points”, the things that need modifying to
    make the change you have to make.
  2. find the “test points”, the places around the change points where
    you need to add tests.
  3. break dependencies.
  4. write the tests.
  5. make the changes.

The overarching model for breaking dependencies is the “seam”. It’s a
place where you can change the behaviour of some code you want to
test, without having to change the code under test itself. Some examples:

  • you could introduce a constructor argument to inject an object
    rather than using a global variable
  • you could add a layer of indirection between a method and a
    framework class it uses, to replace that framework class with a
    test double
  • you could use the C preprocessor to redefine a function call to use
    a different function
  • you can break an uncohesive class into two classes that collaborate
    over an interface, to replace one of the classes in your tests

Understanding the code

The important point is that whatever you, or someone else, thinks
the behaviour of the code should be, actually your customers have paid
for the behaviour that’s actually there and so that (modulo bugs) is
the thing you should preserve.

The book contains techniques to help you understand the existing code
so that you can get those tests written in the first place, and even
find the change points. Scratch refactoring is one technique: look
at the code, change it, move bits out that you think represent
cohesive functions, delete code that’s probably unused, make notes in
comments…then just discard all of those changes. This is like Fred
Brooks’s recommendation to “plan to throw one away”, you can take what
you learned from those notes and refactorings and go in again with a
more structured approach.

Sketching is another technique recommended in the book. You can draw
diagrams of how different modules or objects collaborate, and
particularly draw networks of what parts of the system will be
affected by changes in the part you’re looking at.

The package management paradox

There was no need to build a package management system since CPAN, and yet npm is the best.
Wait, what?

Every time a new programming language or framework is released, people seem to decide that:

  1. It needs its own package manager.

  2. Simple algorithms need to be rewritten from scratch in “pure” $language/framework and distributed as packages in this package manager.

This is not actually true. Many programming languages – particularly many of the trendy ones – have a way to call C functions, and a way to expose their own routines as C functions. Even C++ has this feature. This means that you don’t need any new packaging system, if you can deploy packages that expose C functions (whatever the implementation language) then you can use existing code, and you don’t need to rewrite everything.

So there hasn’t been a need for a packaging system since at least CPAN, maybe earlier.

On the other hand, npm is the best packaging system ever because people actually consume existing code with it. It’s huge, there are tons of libraries, and so people actually think about whether this thing they’re doing needs new code or the adoption of existing code. It’s the realisation of the OO dream, in which folks like Brad Cox said we’d have data sheets of available components and we’d pull the components we need and bind them together in our applications.

Developers who use npm are just gluing components together into applications, and that’s great for software.

Dogmatic paradigmatism

First, you put all of your faith in structured programming, and you got burned. You found it hard to associate the operations in your software with the data upon which they act, and to make sure that the expectations made on the data in one place are satisfied when that data has been modified in that other place, or over there in yet another place. Clearly structured programming is broken.

Then, you put all of your faith in object-oriented programming, and you got burned. You found it hard to follow the flow of a program when it jumps in and out of different classes, and to see which parts were coupled to what. Clearly object-oriented programming is broken.

Then, you put all of your faith in functional programming, and you got burned. You found it hard to represent real business processes in terms of immutable data structures and pure functions, and to express changes to the operating environment without using side effects. Clearly functional programming is broken.

Or maybe it’s you. Maybe, rather than relying on faith to make these conceptual thought frameworks do what you need from them, you could have thought about the concepts.

Layers of Distraction

A discussion I was involved in over on Facebook reminded me of some other issues I’d already drafted for this blog, so I stuck the two together and here we are.

Software systems can often be seen as aggregations of strata, with higher layers making use of the services in the lower layers. You’ll often see a layered architecture diagram looking like a flat and well-organised collection of boiled sweets.

As usual, it’s the interstices rather than the objects themselves that are of interest. Where two layers come together, there’s usually one of a very small number of different transformations taking place. The first is that components above the boundary can express instructions that any computer could run, and they are transformed into instructions suitable for this computer. That’s what the C compiler does, it’s what the x86 processor does (it takes IA-32 instructions, which any computer could run, and turns them into the microcode which it can run), it’s what device drivers do.

The second is that it turns one set of instructions any computer could run into another set that any computer could run. If you promise not to look too closely the Smalltalk virtual machine does this, by turning instructions in the Smalltalk bytecode into instructions in the host machine language.

The third is that it turns a set of computer instructions in a specific domain into the general-purpose instructions that can run on the computer (sometimes this computer, sometimes any computer). A function library turns requests to do particular things into the machine instructions that will do them. A GUI toolkit takes requests to draw buttons and widgets and turns them into requests to draw lines and rectangles. The UNIX shell turns an ordered sequence of suggestions to run programs into the collection of C library calls and machine instructions implied by the sequence.

The fourth is turning a model of a problem I might want solving into a collection of instructions in various computer domains. Domain-specific languages sit here, but usually this transition is handled by expensive humans.

Many transitions can be found in the second and third layers, so that we can turn this computer into any computer, and then build libraries on any computer, then build a virtual machine atop those libraries, then build libraries for the virtual machine, then build again in that virtual machine, then finally put the DOM and JavaScript on top of that creaking mess. Whether we can solve anybody’s problems from the top of the house of cards is a problem to be dealt with later.

You’d hope that from the outside of one boundary, you don’t need to know anything about the inside: you can use the networking library without needing to know what device is doing the networking, you can draw a button without needing to know how the lines get onto the screen, you can use your stock-trading language without needing know what Java byte codes are generated. In other words, both abstractions and refinements do not leak.

As I’ve gone through my computing career, I’ve cared to different extents about different levels of abstraction and refinement. That’s where the Facebook discussion came in: there are many different ways that a Unix system can start up. But when I’m on a desktop computer, I not only don’t care which way the desktop starts up, I don’t want to have to deal with it. Whatever the relative merits of SMF, launchd, SysV init, /etc/rc, SystemStarter, systemd or some other system, the moment I need to even know which is in play is the moment that I no longer want to use this desktop system.

I have books here on processor instruction sets, but the most recent (and indeed numerous) are for the Motorola 68k family. Later than that and I’ll get away with mostly not knowing, looking up the bits I do need ad hoc, and cursing your eyes if your debugger drops me into a disassembly.

So death to the trope that you can’t understand one level of abstraction (or refinement) without understanding the layers below it. That’s only true when the lower layers are broken, though I accept that that is probably the case.

The reasonable effectiveness of developer tools

In goals upon goals upon goals, I suggested that a fixation on developer tools is misplaced. This is not to say that developer tools are unhelpful, nor that they can’t have a significant impact on our work.

Consider the following, over-restricted, definition of what a programmer does:

A programmer’s responsibility is to turn a computer into a solution to somebody’s problem.

We have plenty of tools designed to stop you having to consider the details of this computer when doing that: assemblers, compilers, device drivers, hardware abstraction layers, virtual machines, memory managers and so on. Then we have tools to speed up aspects of working in those abstractions: build systems, IDEs and the like. And tools that help make sure you moved in the correct direction: testing tools, analysers and the like.

Whether we have tools that help you move from an abstract view of your computer to even an abstract view of your problem depends strongly on your problem domain, and the social norms of programmers in that space. Science is fairly well-supplied, for example, with both commercial and open source tools.

But many developers will be less lucky, or less aware of the tools at their disposal. Having been taken from “your computer…” to “any computer…” by any of a near-infinite collection of generic developer tools, they will then get to “…can solve this problem” by building their own representations of the aspects of the problem. In this sense, programming is still done the way we did it in the 1970s, by deciding what our problem is and how we can model bits of it in a computer.

It’s here, in the bit where we try to work out whether we’re building a useful thing that really solves the problems real people really have, that there are still difficulties, unnecessary costs and incidental complexity. Therefore it’s here where judicious selection and use of tools can be of benefit, as their goals support our goals of supporting our users’ goals.

And that’s why I think that developer tools are great, even while warning against fixating upon them. Fixate on the things that need to be done, then discover (or create) tools to make them faster, better and redundant.

Code longevity

I recently wrote about the impending centenary of applied computing; a time when we could reflect on the first hundred years to make it easier for people to progress beyond our position into the second hundred years. This necessitates looking at the things we’ve tried, the things that succeeded and the things that failed. It involves recalling and describing the good ideas and the bad ideas.

So, did the bad ideas fail and the good ideas succeed? Can we declare that because something worked, it must have been a success? Is length of service a great proxy for quality of principle?

Let’s start by looking at the lifetime of some of the trappings of applied computing. I’m writing this on the smartphone shown in the picture below. It is, among the many computers I own that claim to be computers and could reasonably be described as modern, one of only two that is not running a recent variant of a minicomputer game–loading system.

Surface RT and Lumia 925

Now is that a fair assessment? Certainly all the Macs, iOSes, Androids (and even routers and television streamy box things) in the house are based on Unix, and Unix is the thing of the 1970s minicomputer. I’ve even used that idea to explain why we still have to deal with PDP-8 problems in iPhones. But is it fair to assume that because the name has lasted, then the idea has been preserved? Did Unix succeed, or has it been replaced by different things with the same name? That happens a lot; is today’s ethernet really the same ethernet that Bob Metcalfe and colleagues at PARC invented? Conversely, just because the name changed is everything new? Does Windows NT really represent a clean break in 1993?

There’s certainly some core, a kernel (f’nar) of the modern Unix that, whether in code or philosophy, can be traced back to the original system (and indeed beyond). But is that there because it’s still a good idea, or because there’s no impetus to remove it? Or even because it’s a bad idea, but removing it would be expensive?

As we’re already talking about Unix, let’s talk about C. In his talk Null References: The Billion-Dollar Mistake, Tony Hoare describes his own mistake as being the introduction of a null reference. He then says that C’s mistake (C follows Algol in having null references, but it also lacks have subscript bounds checking) is an order of magnitude worse. In fact, Hoare also identified a third problem: he says that it’s a good idea to permit a program failure to be diagnosed just from the error message and the high-level program source text. However, runtime failures in C usually end up with a core dump and/or a stack trace through the instructions of the target machine environment.

We can easily wonder just how much (expensive) programmer time has been lost disassembling stack traces, matching up debugger symbols and interpreting core dumps, but without figures for that I’ll generously assume that it’s an order of magnitude smaller than the losses due to buffer overflows. Now that’s only a tens-of-billions-of-dollars value of mistake, and C is the substrate for trillions of dollars of value of industry. So do we say that on balance, C is 99% a Good Thing™? Is it a bad idea that nonetheless enabled plenty of good ones?

[Incidentally, and without wanting to derail the central thesis of this post, I disagree with Hoare’s numbers. Symantec is merely one of the largest companies in the information security sector, with annual revenue in their most recent report of $6.9B. That’s a small part of the total value sunk into that sector, which I’ll guess has an annual magnitude of multiple tens of billions. A large fraction of the problems addressed by infosec can be attributed to C’s lack of bounds checking, so that there’s probably just an annual impact of around ten billion dollars working on fixing the problem. Assuming those businesses have sustainable revenues over multiple years, the integrated cost is well into the hundreds of billions. That only revises the estimated impact on the C software industry from ‘fractions of a per cent’ to ‘a per cent’ though.]

Perhaps it’s fair to say that C was a good idea when it arose, and that it’s since been found to have deficiencies that haven’t yet become expensive enough to warrant decommissioning it. There’s an assumption of rational action in there that I think it’s fair to question, though: am I assuming that C is not worth replacing just because it has not been replaced? Might there actually be other factors involved?

Yes, there might. It’s possible that there are organisations out there for whom C is more expensive than its worth, but where the sunk cost fallacy stops them from moving on. Or organisations who stick with C because their platform vendor gives them a C toolset, even where free or paid alternatives would be cheaper [in fact that would point to a difficulty with any holistic evaluation: that the cost to the people who provide development environments and the cost to the people who consume development environments depends on different factors, and the power in the market is biased towards a few large providers. Welcome to economics]. Or organisations who stick with C because of a perception of a large community of users, which is (perceived to be) more useful than striking out alone with better tools.

It’s also possible that moves in the other direction are based on non-rational factors: organisations that seek novelty rather than improvement, or who move away from C because a vendor convinces them that their alternative is better regardless of objective truth.

It turns out that the simple question we wanted to ask about applied computing: “What works?” leads to such a complex and maybe even chaotic system of forces acting in multiple dimensions that answering it will be very difficult. This doesn’t mean that an answer should not be sought, but that finding the answer will combine expertise from many different fields. Particularly, something that survives for a long time doesn’t necessarily work: it could just be that people are afraid of the alternatives, or haven’t really considered them.

Preparing for Computing’s Big One-Oh-Oh

However you slice the pie, we’re between two and three decades away from the centenary celebration for applied computing (which is of course significantly after theoretical or hypothetical advances made by the likes of Lovelace, Turing and others). You might count the anniversary of Colossus in 2043, the ENIAC in 2046, or maybe something earlier (and arguably not actually applied) like the Z3 or ABC (both 2041). Whichever one you pick, it’s not far off.

That means that the time to start organising the handover from the first century’s programmers to the second is now, or perhaps a little earlier. You can see the period from the 1940s to around 1980 as a time of discovery, when people invented new ways of building and applying computers because they could, and because there were no old ways yet. The next three and a half decades—a period longer than my life—has been a period of rediscovery, in which a small number of practices have become entrenched and people occasionally find existing, but forgotten, tools and techniques to add to their arsenal, and incrementally advance the entrenched ones.

My suggestion is that the next few decades be a period of uncovery, in which we purposefully seek out those things that have been tried, and tell the stories of how they are:

  • successful because they work;
  • successful because they are well-marketed;
  • successful because they were already deployed before the problems were understood;
  • abandoned because they don’t work;
  • abandoned because they are hard;
  • abandoned because they are misunderstood;
  • abandoned because something else failed while we were trying them.

I imagine a multi-volume book✽, one that is to the art of computer programming as The Art Of Computer Programming is to the mechanics of executing algorithms on a machine. Such a book✽ would be mostly a guide, partly a history, with some, all or more of the following properties:

  • not tied to any platform, technology or other fleeting artefact, though with examples where appropriate (perhaps in a platform invented for the purpose, as MIX, Smalltalk, BBC BASIC and Oberon all were)
  • informed both by academic inquiry and practical experience
  • more accessible than the Software Engineering Body of Knowledge
  • as accepting of multiple dissenting views as Ward’s Wiki
  • at least as honest about our failures as The Mythical Man-Month
  • at least as proud of our successes as The Clean Coder
  • more popular than The Celestial Homecare Omnibus

As TAOCP is a survey of algorithms, so this book✽ would be a survey of techniques, practices and modes of thought. As this century’s programmer can go to TAOCP to compare algorithms and data structures for solving small-scale problems then use selected algorithms and data structures in their own work, so next century’s applier of computing could go to this book✽ to compare techniques and ways of reasoning about problems in computing then use selected techniques and reasons in their own work. Few people would read such a thing from cover to cover. But many would have it to hand, and would be able to get on with the work of invention without having to rewrite all of Doug Engelbart’s work before they could get to the new stuff.

It's dangerous to go alone! Take this.

✽: don’t get hung up on the idea that a book is a collection of quires of some pigmented flat organic matter bound into a codex, though.

On too much and too little

In the following text, remember that words like me or I are to be construed in the broadest possible terms.

It’s easy to be comfortable with my current level of knowledge. Or perhaps it’s not the value, but the derivative of the value: the amount of investment I’m putting into learning a thing. Anyway, it’s easy to tell stories about why the way I’m doing it is the right, or at least a good, way to do it.

Take, for example, object-oriented design. We have words to describe insufficient object-oriented design. Spaghetti Code, or a Big Ball of Mud. Obviously these are things that I never succumb to, but other people do. So clearly (actually, not clearly at all, but that’s beside the point) there is some threshold level of design or analysis practice that represents an acceptable minimum. Whatever that value is, it’s less than the amount that I do.

Interestingly there are also words to describe the over-application of object-oriented design. Architecture Astronauts, for example, are clearly people who do too much architecture (in the same way that NASA astronauts got carried away with flying and overdid it, I suppose). It’s so cold up in space that you’ll catch a fever, resulting in Death by UML Fever. Clearly I am only ever responsible for tropospheric architecture, thus we conclude that there is some acceptable maximum threshold for analysis and design too.

The really convenient thing is that my current work lies between these two limits. In fact, I’m comfortable in saying that it always has.

But wait. I also know that I’m supposed to hate the code that I wrote six months ago, probably because I wasn’t doing enough of whatever it is that I’m doing enough of now. But I don’t remember thinking six months ago that I was below the threshold for doing acceptable amounts of the stuff that I’m supposed to be doing. Could it be, perhaps, that the goalposts have conveniently moved in that time?

Of course they have. What’s acceptable to me now may not be in the future, either because I’ve learned to do more of it or because I’ve learned that I was overdoing it. The trick is not so much in recognising that, but in recognising that others who are doing more or less than me are not wrong, they could in fact be me at a different point on my timeline but with the benefit that they exist now so I can share my experiences with them and work things out together. Or they could be someone with a completely different set of experiences, which is even more exciting as I’ll have more stories to swap.

When it comes to techniques and devices for writing software, I tend to prefer overdoing things and then finding out which bits I don’t really need after all, rather than under-application. That’s obviously a much larger cognitive and conceptual burden, but it stems from the fact that I don’t think we really have any clear ideas on what works and what doesn’t. Not much in making software is ever shown to be wrong, but plenty of it is shown to be out of fashion.

Let me conclude by telling my own story of object-oriented design. It took me ages to learn object-oriented thinking. I learned the technology alright, and could make tools that used the Objective-C language and Foundation and AppKit, but didn’t really work out how to split my stuff up into objects. Not just for a while, but for years. A little while after that Death by UML Fever article was written, my employer sent me to Sun to attend their Object-Oriented Analysis and Design Using UML course.

That course in itself was a huge turning point. But just as beneficial was the few months afterward in which I would architecturamalise all the things, and my then-manager wisely left me to it. The office furniture was all covered with whiteboard material, and there soon wasn’t a bookshelf or cupboard in my area of the office that wasn’t covered with sequence diagrams, package diagrams, class diagrams, or whatever other diagrams. I probably would’ve covered the external walls, too, if it wasn’t for Enterprise Architect. You probably have opinions(TM) of both of the words in that product’s name. In fact I also used OmniGraffle, and dia (my laptop at the time was an iBook G4 running some flavour of Linux).

That period of UMLphoria gave me the first few hundred hours of deliberate practice. It let me see things that had been useful, and that had either helped me understand the problem or communicate about it with my peers. It also let me see the things that hadn’t been useful, that I’d constructed but then had no further purpose for. It let me not only dial back, but work out which things to dial back on.

I can’t imagine being able to replace that experience with reading web articles and Stack Overflow questions. Sure, there are plenty of opinions on things like OOA/D and UML on the web. Some of those opinions are even by people who have tried it. But going through that volume of material and sifting the experience-led advice from the iconoclasm or marketing fluff, deciding which viewpoints were relevant to my position: that’s all really hard. Harder, perhaps, than diving in and working slowly for a few months while I over-practice a skill.