What’s better than semver?

Many software libraries are released with version “numbers” that follow a scheme called Semantic Versioning. A semantic version is three numbers separated by dots, of the form x.y.z, where:

  • if x is zero, all bets are off. Otherwise;
  • z increments “if only backwards compatible bug fixes are introduced. A bug fix is defined as an internal change that fixes incorrect behavior.”

Problem one: there is no such thing as an “internal change that fixes incorrect behavior” that is “backwards compatible”. If a library has a function f() in its public API, I could be relying on any observable behaviour of f() (potentially but pathologically including its running time or memory use, but here I’ll only consider return values or environment changes for given inputs).

If they “fix” “incorrect” behaviour, the library maintainers may have broken the package for me. I would need a comprehensive collection of contract or integration tests to know that I can still use version x.y.z' if version x.y.z was working for me. This is the worst situation, because the API looks like it hasn’t changed: all of the places where I call functions or create objects still do something, they just might not do the right thing any more.

Problem two: as I relaxed the dependency on running time or memory use, a refactoring could represent a non-breaking change. Semver has nowhere to record truly backwards compatible changes, because bugfixes are erroneously considered backwards compatible

  • y increments “if new, backwards compatible functionality is introduced to the public API”.

This is fine. I get new stuff that I’m not (currently) using, but you haven’t broken anything I do use.

Problem three: an increment to y “MAY include patch level changes”. So I can’t just quietly take in the new functionality and decide whether I need it on my own time, because the library maintainers have rolled in all of their supposedly-backwards-compatible-but-not-really changes so I still don’t know whether this version works for me.

  • x increments “if any backwards incompatible changes are introduced to the public API”.

Problem four: I’m not looking at the same library any more. It has the same name, but it could be completely rewritten, have any number of internal behaviour changes, and any number of external interface changes. It might not do what I want any more, or might do it in a way that doesn’t suit the needs of my application.

On the plus side

The dots are fine. I’m happy with the dots. Please do not feel the need to leave a comment if you are unhappy with the dots or can come up with some contrived reason why “dots are harmful”, as I don’t care.

Better: meaningful versioning

I would prefer to use a version scheme that looks like z.w.y:

  • y has the meaning it does in semver, except that it MUST NOT include patch level changes. If a package maintainer has added new things or deprecated (but not removed) old things, then I can use the package still.
  • z has the meaning it does in semver, except that we stop pretending that bug fixes can be backwards compatible.
  • w is incremented if non-behavioural changes are implemented; for example if internals are refactored, caches are introduced or removed, or private data structures are changed. These are changes that probably mean I can use the package still, but if I needed particular performance attributes from the library then it is on me to discover whether the new version still meets my needs.

There is no room for x in this scheme. If a maintainer wants to write a new, incompatible library, they can use a new name.

Different: don’t use versions

This is more work for me, but less work for the package maintainer. If they are maintaining a change log (which they are, as they are using version control) and perhaps a medium for announcing important changes including security and bug fixes and new features, then I can pick the commit that I discover does what I need. I can maintain my own tree (and should be anyway, in case the maintainer decides to delete their upstream repo) and can cheery pick the changes that are useful for me, leaving out the ones that are harmful for me.

This is more work for me than the z.w.y scheme because now I have to understand the impact of each change. It is the same amount of work as the semver x.y.z scheme, because then I had to understand the impact of each change too, as changes to any of the three version component could potentially include supposedly-backwards-compatible-but-not-really changes.

To become a beginner, first become an expert

We have a whole load of practices in programming that only really work well if you’re already good at whatever the process is supposed to help with.

Scrum is a process improvement framework, but only if you already know how to do process improvement. If you don’t, then Scrum is just the baseline mini-waterfall process with a chance to air your dirty laundry every fortnight.

Agile is good at helping you embrace change, but only if you’re already good enough at managing change to understand which changes should be embraced.

#NoEstimates helps you avoid the overhead of estimates, but only if you’re already good enough at estimates to know that you always write user stories that take 0.5-2 days to implement.

TDD helps you design your APIs, but only if you’re already good enough at API design to understand things like dependency injection and loose coupling.

Microservices help you isolate modules, but only if you’re already good enough at modularity not to get swamped in HTTP calls.

This is all very well for selling consultancy (“if your [agile] isn’t working, then you aren’t [agiling] hard enough, let me [agile] you some more”) but where’s the on-ramp?

Reasoning about reasoning about software

Functional programmers like to claim that you can’t reason about mutable state programs. Some thoughts:

  • the first half of the book A Discipline of Programming by Edsger W. Dijkstra tells you how to do it. That half of the book is approximately 100 pages (the remainder of the book is worked examples).
  • object-oriented programming breaks a software system up into separate systems running miniature, message-driven programs as if on separate computers. Therefore the consideration of “mutable state” can be split in two: the state internal to the object and the state external to the object which sends messages to the object but is ignorant of its internals. If you can’t split the state that way, you have bad encapsulation.
  • The reasoning done about the external and internal behaviours had better match at the interface. Design by contract probably helps here.
  • Given a state S, an operation O can be defined as \(O(args \times S) \rightarrow (R \times S’)\), i.e. it returns a result R and updates the state to S’.
  • However, Bertrand Meyer introduced Command-Query Separation in the 1980s, so you only need to know \(O(args \times S) \rightarrow (R \times S)\) and \(O(args \times S) \rightarrow (\emptyset \times S’)\).
  • Various history “traces” can be considered equivalent and therefore a lot of knowledge about the historical state transitions elided, simplifying the reasoning. For example, given a well-designed stack, it is impossible to distinguish the history of stack.push(3); stack.pop(); stack.push(7) from stack.push(7).
  • Various operations on the state are irrelevant to the behaviour of an operation under consideration. In reasoning about the final operation in a = 3; b = 7; c = 9; stack.push(2) you do not need to consider the assignment operations (and indeed their presence may indicate a cohesion problem in your design).
  • The one remaining source of difficulty is aliasing; I do need to know about the elided operations in the sequence x = 7; *y = &x; ...; z=f(x). This is aliasing, not mutable state.

The Atoms of Programming

In the world of physics, there are many different models that can be used, though typically each of them has different applicability to different contexts. At the small scale, quantum physics is a very useful model, Newtonian physics will yield evidently incorrect predictions so is less valuable. Where a Newtonian model gives sufficiently accurate results, it’s a lot easier to work with than quantum or relativistic mechanics.

All of these models are used to describe the same universe – the same underlying collection of observations that can systematically be categorised, modelled and predicted.

Physical science (or experimental philosophy) does not work in the same way as computational philosophy. There are physical realisations of computational systems, typically manifested as electronic systems or pencil-and-paper simulations. But the software, the abstract configurations of ideas that run on those systems, exist in entirely separate space and are merely (though the fact that this is possible is immensely powerful) translated into the electronic or paper medium.

Of course one model for the software system is to abstract the electronic: to consider the movement of electrons as the presence of voltages at terminals; to group terminals as registers or busses; to further abstract this range of voltages as 0 and that range as 1. And indeed that model frequently is useful.

Frequently, that model is not useful. And the great thing is that we get to select from a panoply of other models, at some small or large remove from the physical model. We can use these models separately, or simultaneously. You can think of a software system as a network of messages passed between independent objects, as a flow of data through transformers, as a sequence of state changes, as a graph of single-argument functions, as something else, or as a combination of these things. Each is useful, each is powerful, all are applicable.

Sometimes, I can use these models to make decisions about representing the logical structure of these systems, transforming a concept into a representation that’s valid in the model. If I have a statement in a mathematical formulation of my problem, “for any a drawn from the set of Articles there exists a p drawn from the set of People such that p is the principal author of a” then I can build a function, or a method, or a query, or a predicate, or a procedure, or a subroutine, or a spreadsheet cell, or a process, that given an article will yield exactly one person who is the principal author of that article.

Sometimes, I use the models to avoid the conceptual or logical layers at all and express my problem as if it is a software solution. Object-oriented analysis and design, data flow modelling, and other techniques can be used to represent a logical model, or they can be used to bash the problem straight into a physical model without having thought about the problem in the abstract. “Shut up and code” is an extreme example of this approach, in which the physical model is realised without any attempt to tie it to a logical or conceptual design. I’ll know correct when I see it.

I don’t see a lot of value in collecting programming languages. I can’t count the number of different programming languages I’ve used, and many of them are entirely similar. C and JavaScript both have sequences of expressions that are built into statements that are built into procedures. Both let me build aggregations of data and procedures that either let me organise sequential programs, represent objects, represent functions, or do something else.

But collecting the models, the different representations of systems conceptually that can be expressed as software, sometimes called paradigms: this is very interesting. This is what lets me think about representing problems in different ways, and come up with efficient (conceptually or physically) solutions.

More paradigms, please.

In which new developer tools are dull

Over on lobste.rs I said that I don’t hold out much hope for another “blue plane” style event in developer tools. In one of Alan Kay’s presentations, he referred to the ordinary way of things as the pink plane, and incremental advances in the state of affairs being movements in that plane. Like the square in Edwin Abbot’s Flatland that encounters a sphere, a development could take us out of the pink plane into the (orthogonal) blue plane. These blue plane ideas are rare because like the square, it’s hard to even conceive of life outside the pink plane.

In what may just be a surprising coincidence, Apple engineers used Blue and Pink to refer to features in evolutionary and revolutionary developments of their operating system.

Software engineering tooling is, for the majority of developers, in a phase of conservative retreat

Build UIs on the web and you probably won’t use a graphical builder, you’ll type HTML and JavaScript (and maybe JSX) into a text editor.

Build native apps and even where there is a GUI builder, you’ll find people recommending against its use and wanting to do things “programmatically” (by which they mean “through typing”, even though the GUI builder tools are another way to construct a program).

In the last couple of decades, interest in CASE tooling has shrunk to conservative interest in text editors with some syntax highlighting, like vim or Atom. Gone even is the “build and run” button from IDEs, to be replaced with command-line invocations of grunt tasks (a fancy phrase meaning shell scripts), npm scripts (a fancy phrase meaning shell scripts) or rake tasks (you get the idea).

Where previously there were live development environments embedded in the deployment environment (and the Javascript VM is almost perfectly designed for that task), there is now console.log and unit tests. The height of advanced interaction with your programming tools are the REPL (an interactive shell) and the Playground/InstaREPL (an interactive shell that echoes stdin and stdout in different places).

For the most part, and I say that to avoid the inevitable commenter who thinks that a counterexample like LabView or Mathematica or that one person they met who uses Expression Blend renders the whole argument broken, developers have doubled down on the ceremony of programming: the typing of arcane text into an 80×24 character display. Now to be fair, text is an efficient and compact graphical representation of a linear sequence of connected concepts. But it is not the only one, nor the most efficient nor most compact, and neither are many software systems linear.

The rewards in making software to make software are scarce.

You can do like IntelliJ do, and make a better version of the 80×24 text entry thing. You can work for a platform vendor, and make their version of the 80×24 thing. You can go and get an engineering grade 6 or above job in Silicon Valley and tell your manager that whatever it is their business does, you’re going to focus on the 80×24 thing (“at scale”) instead.

What you don’t seem to be able to do is to disrupt the 80×24 thing. It’s free (at least as in beer), it’s ubiquitous, and whether or not it’s as good as it could be it certainly seems to be good enough for the people who not only get paid to make bad software, but get paid again to fix it.

Technical debt and jury service

We have the idea that in addition to the product development backlogs for our teams, there’s an engineering backlog where technical debt paydown, process/tooling improvements, and other sitewide engineering concerns get recorded. Working on them is done in time that is, by definition, taken away from the product backlogs (because of Sustainable Pace).

A colleague recently described the time spent on the engineering backlog as a “tax”, which is an interesting analogy. A pejorative interpretation is that, like a tax, centralised engineering work is a cost imposed that takes away from realising more value on my direct projects.

A positive spin is that taxes go toward funding the commons: no one of us can afford to build a road between our house and the office, but having roads connecting all the houses to all the offices has strategic benefit to society as a whole (higher productivity, lower unemployment, more opportunities) so if we all pay in a fraction of the cost of a road we can all have a road. Similarly, one product team might grind to a halt if they spend all of their time on the new CD pipeline infrastructure, but all teams will benefit if they all chip in a bit.

This version of the analogy implies that there might be, like the treasury, a central agency deciding how to spend the common wealth. Somebody needs to decide how much tax everyone should pay, what to do with dissenters (is it OK if your product team focuses on its sprint for a fortnight and doesn’t do any of the engineering backlog?), whether to accept overpayments, and what those tax dollars should go on.

Only it’s not tax dollars, it’s tax hours. In this sense, a better analogy is conscription (I originally thought of the Anglo-Saxon fyrd, maybe jury service or non-military national service is a less aggressive way to consider this). Taxation means that I give all of my work time to Wealth Wizards but give a chunk of my money to the government. Conscription means that I don’t get to give all of my time to my employer: some of it has to go to the commons. Maybe Jonathan and Rebecca can’t give any time to their product teams this week because they’ve been “called up” to the engineering backlog effort.

That seems like a useful analogy for these tasks. I can think about what resources are available for products or “the commons”, because I can think about whether someone is working on “the day job” or has been conscripted. Maybe it doesn’t make sense for everybody to have equal likelihood of being “called up”, in the same way that it’s easier for students to get out of jury service than for full-time employees.

Working Effectively with Legacy Code

I gave a talk to my team at ARM today on Working Effectively with Legacy Code by Michael Feathers. Here are some notes I made in preparation, which are somewhat related to the talk I gave.

This may be the most important book a software developer can
read. Why? Because if you don’t, then you’re part of the problem.

It’s obviously a lot easier and a lot more enjoyable to work on
greenfield projects all the time. You get to choose this week’s
favourite technologies and tools, put things together in the ways that
suit you now, and make progress because, well anything is progress
when there’s nothing there already. But throwing away an existing
system and starting from scratch makes it easy to throw away the
lessons learned in developing that system. It may be ugly, and patched
up all over the place, but that’s because each of those patches was
needed. They each represent something we learned about the product
after we thought we were done.

The new system is much more likely to look good from the developer’s
perspective
, but what about the users’? Do they want to pay again
for development of a new system when they already have one that mostly
works? Do they want to learn again how to use the software? We have
this strange introspective notion that professionalism in software
development means things that make code look good to other coders:
Clean Code, “well-crafted” code. But we should also have some
responsibility to those people who depend on us and who pay our way,
and that might mean taking the decision to fix the mostly-working
thing.

A digression: Lehman’s Laws

Manny Lehman identified three different categories of software system:
those that are exactly specified, those that implement
well-understood procedures, and those that are influenced by the
environment in which they run. Most software (including ours) comes
into that last category, and as the environment changes so must the
software, even if there were no (known) problems with it at an earlier
point in its evolution.

He expressed
Laws governing the evolution of software systems,
which govern how the requirements for new development are in conflict
with the forces that slow down maintenance of existing systems. I’ll
not reproduce the full list here, but for example on the one hand the
functionality of the system must grow over time to provide user
satisfaction, while at the same time the complexity will increase and
perceived quality will decline unless it is actively maintained.

Legacy Code

Michael Feather’s definition of legacy code is code without tests. I’m
going to be a bit picky here: rather than saying that legacy code is
code with no tests, I’m going to say that it’s code with
insufficient tests
. If I make a change, can I be confident that I’ll
discover the ramifications of that change?

If not, then it’ll slow me down. I even sometimes discard changes
entirely, because I decide the cost of working out whether my change
has broken anything outweighs the interest I have in seeing the change
make it into the codebase.

Feathers refers to the tests as a “software vice”. They clamp the
software into place, so that you can have more control when you’re
working on it. Tests aren’t the only tools that do this: assertions
(and particularly Design by Contract) also help pin down the software.

How do I test untested code?

The apparent way forward then when dealing with legacy code is to
understand its behaviour and encapsulate that in a collection of unit
tests. Unfortunately, it’s likely to be difficult to write unit tests
for legacy code, because it’s all tightly coupled, has weird and
unexpected dependencies, and is hard to understand. So there’s a
catch-22: I need to make tests before I make changes, but I need to
make changes before I can make tests.

Seams

Almost the entire book is about resolving that dilemma, and contains a
collection of patterns and techniques to help you make low-risk
changes to make the code more testable, so you can introduce the tests
that will help you make the high-risk changes. His algorithm is:

  1. identify the “change points”, the things that need modifying to
    make the change you have to make.
  2. find the “test points”, the places around the change points where
    you need to add tests.
  3. break dependencies.
  4. write the tests.
  5. make the changes.

The overarching model for breaking dependencies is the “seam”. It’s a
place where you can change the behaviour of some code you want to
test, without having to change the code under test itself. Some examples:

  • you could introduce a constructor argument to inject an object
    rather than using a global variable
  • you could add a layer of indirection between a method and a
    framework class it uses, to replace that framework class with a
    test double
  • you could use the C preprocessor to redefine a function call to use
    a different function
  • you can break an uncohesive class into two classes that collaborate
    over an interface, to replace one of the classes in your tests

Understanding the code

The important point is that whatever you, or someone else, thinks
the behaviour of the code should be, actually your customers have paid
for the behaviour that’s actually there and so that (modulo bugs) is
the thing you should preserve.

The book contains techniques to help you understand the existing code
so that you can get those tests written in the first place, and even
find the change points. Scratch refactoring is one technique: look
at the code, change it, move bits out that you think represent
cohesive functions, delete code that’s probably unused, make notes in
comments…then just discard all of those changes. This is like Fred
Brooks’s recommendation to “plan to throw one away”, you can take what
you learned from those notes and refactorings and go in again with a
more structured approach.

Sketching is another technique recommended in the book. You can draw
diagrams of how different modules or objects collaborate, and
particularly draw networks of what parts of the system will be
affected by changes in the part you’re looking at.