On the tyranny of autoincrementing integer primary keys

In designing a relational database schema, many people will automatically create a column id integer primary key for every table, using their database’s automatic increment feature to assign a new value to each row they insert. My assertion is that this choice of primary key should be the last resort, not the first.

A database schema is a design artifact, describing the data we want to store and the relationships between records (rows) in those data. It is also meta-design, because the schema constrains us in designing the queries we use to work with the data. Using the same, minimal-effort primary key type for every table then avoids communicating information about the structure and meaning of the data in that table and imposes irrelevant features in the queries we design.

The fact that people use the name id for this autoincrementing integer field gives away the fact that the primary key is used to identify a row in a database. The primary key for a table should ideally be the minimal subset of relevant information that uniquely identifies an individual record. If you have a single column, say name, with not null and unique constraints, that’s an indicator (though not a cast-iron guarantee) that this column may be the table’s primary key. Sometimes, the primary key can be a tuple of multiple columns. A glyph can be uniquely identified by the tuple (character, font, swash) for example (it can, regardless of whether this is how your particular favourite text system represents it, or whether you think that this is a weird way to store ligatures). The glyphs “(e, Times New Roman Regular 16pt, normal)” and “(ct, Irvin Demibold 24pt, fancy)” are more readily recognisable than the glyphs “146276” and “793651”, even if both are ways to refer to the same data. A music album is identified by the artist and the album name (he says, side-eyeing Led Zeppelin): “A Night at the Opera” is ambiguous while “(Blind Guardian, A Night at the Opera)” is definitely not “(Queen, A Night at the Opera)”.

Use an integer identifier where there is no other way to uniquely identify rows in a table. Note: sometimes there is another, more meaningful way, even where that just means using somebody else’s unique identifier: different copies of the same book will have unique shelfmarks if they’re part of a library, for example. People in an organisation may have an employee number, or a single sign-on user name; though there may be privacy reasons not to use these.

A side-effect of using useful information to identify rows in a database is that it can simplify your queries, because where your foreign keys would otherwise be meaningless numbers, they now actually carry useful information. Here’s an example, from a real application, in which I’m sad to say I designed both the “before” and “after” schemata.

The app is a risk management tool. There are descriptions of risks (I’d like to believe that they all at least have a distinct description but I can’t be sure, so those will use integer id PKs), and for each risk there are people in certain roles who bring particular skills to bear on mitigating the risk. The same role can be applied to more than one risk, the same skill can be applied by more than one role, and one role may apply multiple skills, so there’s a three-way join to work out, for a given risk, what roles and skills are relevant.

The before schema:

create table risk (id integer primary key, description varchar not null, weight integer, severity integer, likelihood integer); -- many fields elided
create table role (id integer primary key, name varchar not null, unique(name)); -- ruh roh
create table skill (id integer primary key, name varchar not null, unique(name)); -- the same anti-pattern twice
create table risk_role_skill (id integer primary key, risk_id integer, role_id integer, skill_id integer, foreign key(risk_id) references risk(id), foreign key(role_id) references role(id), foreign key(skill_id) references skill(id));

In this application, we start by looking at a list of risks then inspect one to see what roles are relevant to mitigating it, and then what skills. So a valid question is: “given a risk, what roles are relevant to it?”

select distinct role.name inner join risk_role_skill on role.id = risk_role_skill.role_id where risk_role_skill.risk_id = ?;

But if we notice the names of each role and skill are unique, then we can surmise that they are sufficient to identify a given role or skill. In fact, the only information we have about roles or skills are the names.

create table risk (id integer primary key, description varchar not null, weight integer, severity integer, likelihood integer); -- many fields elided
create table role (name varchar primary key); -- uhhh...
create table skill (name varchar primary key); -- this still looks weird...
create table risk_role_skill (id integer primary key, risk_id integer, role_name varchar, skill_name varchar, foreign key(risk_id) references risk(id), foreign key(role_name) references role(name), foreign key(skill_name) references skill(name));

Here’s the new query:

select distinct role_name from risk_role_skill where risk_id = ?;

We’ve removed the join completely!

Two remaining points:

  1. There’s literally no information carried in the role and skill tables now, other than their identifying names. Does that mean we need the tables at all? In this case no, but in general we need to think here. How are the names in the join table going to get populated otherwise? If there are a limited set of valid values to choose from, then keeping a table with the range of values and a foreign key constraint to that table may be a good way to express the intent that the column’s content be drawn from that range. As an example, a particular bookstore may have printed, ebook, and audiobook media, so could restrict the medium field in their stock table to one of those values.
  2. Why does the risk_role_skill table have an identifier at all? It is a collection of associations between values, so a row’s content is that row’s identity.

Here’s the after schema:

create table risk (id integer primary key, description varchar, weight integer, severity integer, likelihood integer); -- many fields elided
create table risk_role_skill (risk_id integer, role varchar, skill varchar, foreign key(risk_id) references risk(id), primary key(risk_id, role, skill));

And the after query:

select distinct role from risk_role_skill where risk_id = ?;

Two fewer tables, no joins, altogether a much simpler database to understand and work with.

First, Consider no Harmful.

Yesterday, we observed that the goal of considering the go to statement harmful was so that a programmer could write a correct program and have done with it. We noticed that this is never how computering works: many programs are not even instantaneously correct because they represent an understanding of a domain captured at an earlier time, before the context was altered by both external changes and the introduction of the software itself.

Today, let’s look at the benefits of removing the go to statement. Dijkstra again:

My second remark is that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.

This makes sense! Our source code is a static model of our software system, which is itself (we hope) a model of a problem that somebody has along with tools to help with solving that problem. But our software system is a dynamic actor that absorbs, transforms, and emits data, reacting to and generating events in communication with other human and non-human actors. We need to ensure that the dynamic behaviour is evident in the static model, so that we can “reason about” the development of the system. How does Dijkstra’s removal of go to help us achieve that?

Let us now consider how we can characterize the progress of a process. (You may think about this question in a very concrete manner: suppose that a process, considered as a time succession of actions, is stopped after an arbitrary action, what data do we have to fix in order that we can redo the process until the very same point?) If the program text is a pure concatenation of, say, assignment statements (for the purpose of this discussion regarded as the descriptions of single actions) it is sufficient to point in the program text to a point between two successive action descriptions. (In the absence of go to statements I can permit myself the syntactic ambiguity in the last three words of the previous sentence: if we parse them as “successive (action descriptions)” we mean successive in text space; if we parse as “(successive action) descriptions” we mean successive in time.) Let us call such a pointer to a suitable place in the text a “textual index.”

[…]

Why do we need such independent coordinates? The reason is – and this seems to be inherent to sequential processes – that we can interpret the value of a variable only with respect to the progress of the process. If we wish to count the number, n say, of people in an initially empty room, we can achieve this by increasing n by one whenever we see someone entering the room. In the in-between moment that we have observed someone entering the room but have not yet performed the subsequent increase of n, its value equals the number of people in the room minus one!

So the value of go to-less programming is that I can start at the entry point, and track every change in the system between there and the point of interest. But I only need to do that once (well, once for each different condition, loop, procedure code path, etc.) and then I can write down “at this index, these things have happened”. Conversely, I can start with the known state at a known point, and run the program counter backwards, undoing the changes I observe (obviously encountering problems if any of those are assignments). With go to statements present, I cannot know the history of how the program counter came to be here, so I can’t make confident statements about the dynamic evolution of the system.

This isn’t the only way to ensure that we understand a software system’s dynamic behaviour, which is lucky because it’s not a particularly good one. In today’s parlance, it “doesn’t scale”. Imagine being given the bug report “I clicked in the timeline on a YouTube video during an advert and the comments disappeared”, and trying to build a view of the stateful evolution of the entire YouTube system (or even just the browser, if you like, and if it turns out you don’t need the rest of the information) between main() and the location of the program counter where the bug emerged. Even if we pretend that a browser isn’t multithreaded, you would not have a good time.

Another approach is to encapsulate parts of the program, so that the amount we need to comprehend in one go is smaller. When you do that, you don’t need to worry about where the global program counter is or how it got there. Donald Knuth demonstrated this in Structured Programming with go to Statements, and went on to say that removing all instances of go to is solving the wrong problem:

One thing we haven’t spelled out clearly, however, is what makes some go to’s bad and others acceptable. The reason is that we’ve really been directing our attention to the wrong issue, to the objective question of go to elimination instead of the important subjective question of program structure.

In the words of John Brown [here, Knuth cites an unpublished note], “The act of focusing our mightiest intellectual resources on the elusive goal of go to-less programs has helped us get our minds off all those really tough and possibly unresolvable problems and issues with which today’s professional programmer would otherwise have to grapple.”

Much has been written on structured programming, procedural programming, object-oriented programming, and functional programming, which all have the same goal: separate a program into “a thing which uses this little bit of software, according to its contract” and “the thing that you would like to believe implements this contract”. OOP and FP additionally make explicit the isolation of state changes, so that you don’t need to know the whole value of the computer’s memory to assert conformance to the contract. Instead, you just need to know the value of the little bit of memory in the fake standalone computer that runs this one object, or this one function (or indeed model the behaviour of the object or function without reference to computer details like memory).

Use or otherwise of go to statements in a thoughtfully-designed (I admit that statement opens a can of worms) is orthogonal to understanding the behaviour of the program. Let me type part of an Array class based on a linked list directly into my blog editor:

Public Class Array Of ElementType
  Private entries As LinkedList(Of ElementType)
  Public Function Count() As Integer
    Dim list As LinkedList(Of ElementType) = entries
    Count = 0
  nonempty:
    If list.IsEmpty() Then GoTo empty
    Count = Count + 1
    list = list.Next()
    GoTo nonempty
  empty:
    Exit Function
  End Function

  Public Function At(index As Integer) As ElementType
    Dim cursor As Integer = index
    Dim list As LinkedList(Of ElementType) = entries
  next:
    If list.IsEmpty() Then Err.Raise("Index Out of Bounds Error")
    If cursor = 0 Then Return list.Element()
    list = list.Next()
    cursor = cursor - 1
    GoTo next
  End Function
End Class

While this code sample uses go to statements, I would suggest it’s possible to explore the assertion “objects of class Array satisfy the contract for an array” without too much difficulty. As such, the behaviour of the program anywhere that uses arrays is simplified by the assumption “Array behaves according to the contract”, and the behaviour anywhere else is simplified by ignoring the Array code entirely.

Whatever harm the go to statement caused, it was not as much harm as trying to define a “correct” program by understanding all of the ways in which the program counter arrived at the current instruction.

The unreasonable ineffectiveness of considering things harmful

Dijkstra didn’t claim to consider the go to statement harmful, not in those words. The title of his letter to CACM was provided by the editor, Niklaus Wirth, who did such a great job that the entire industry knows that go to is “Considered Harmful”, and that you can quickly rack up the clicks by considering other things harmful.

A deeper reading of his short (~1400 words) article raises some interesting points, that did not as yet receive as much airing. Here, in the interests of writing an even shorter letter, is just one.

My first remark is that, although the programmer’s activity ends when he has constructed a correct program, the process taking place under control of his program is the true subject matter of his activity, for it is this process that has to accomplish the desired effect; it is this process that in its dynamic behavior has to satisfy the desired specifications. Yet, once the program has been made, the “making’ of the corresponding process is delegated to the machine.

There are many difficulties with this statement, including the presumed gender of the programmer. Let us also consider the idea of a “correct” program, which does not exist for the majority of programmers. Eight years after Dijkstra’s letter was published, Belady and Lehman published the first law of program evolution dynamics:

_ Law of continuing change_. A system that is used undergoes continuing change until it is judged more cost effective to freeze and recreate it. Software does not face the physical decay problems that hardware faces. But the power and logical flexibility of computing systems, the extending technology of computer applications, the ever-evolving hardware, and the pressures for the exploitation of new business opportunities all make demands. Manufacturers, therefore, encourage the continuous adaptation of programs to keep in step with increasing skill, insight, ambition, and opportunity. In addition to such external pressures for change, there is the constant need to repair system faults, whether they are errors that stem from faulty implementation or defects that relate to weaknesses in design or behavior. Thus a programming system undergoes continuous maintenance and development, driven by mutually stimulating changes in system capability and environmental usage. In fact, the evolution pattern of a large program is similar to that of any other complex system in that it stems from the closed-loop cyclic adaptation of environment to system changes and vice versa.

This model of programming looks much more familiar to me when I reflect on my experience than the Dijkstra model. If Dijkstra’s programmer stopped programming when they have “constructed a correct program”, then their system would fail as it didn’t adapt to “increasing skill, insight, ambition, and opportunity”.

The programmer who would thrive in this environment is more akin to Ward Cunningham’s opportunistic rewriter, based on his experience of the WyCash Portfolio Management System. That programmer rewrites every module they touch, to ensure that it represents the latest information they have. We recognise the genesis of Ward’s “technical debt” concept in this quote, and also perhaps what we would now call “refactoring”:

Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite. Objects make the cost of this transaction tolerable. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.

The traditional waterfall development cycle has endeavored to avoid programming catastrophe by working out a program in detail before programming begins. We watch with some interest as the community attempts to apply these techniques to objects. However, using our debt analogy, we recognize this amounts to preserving the concept of payment up-front and in-full. The modularity offered by objects and the practice of consolidation make the alternative, incremental growth, both feasible and desirable in the competitive financial software market.

Ward also doesn’t use go to statements, his programming environment doesn’t supply them. But it is not the ability of his team to avoid incorrect programs by using other control structures that he finds valuable; rather the willingness of his programmers to jettison old code and evolve their system with its context.

Empowered free software

Free and open source software has traditionally been defined as the opposite of something else: proprietary (or commercially-licensed) software. That’s particularly obvious in the name of the GNU project, which calls itself “Not UNIX” – a popular AT&T-owned commercial software property of the time. The GNU manifesto goes deeper on the specific ways in which it is Not Unix:

I consider that the Golden Rule requires that if I like a program I must share it with other people who like it. Software sellers want to divide the users and conquer them, making each user agree not to share with others. I refuse to break solidarity with other users in this way. I cannot in good conscience sign a nondisclosure agreement or a software license agreement. For years I worked within the Artificial Intelligence Lab to resist such tendencies and other inhospitalities, but eventually they had gone too far: I could not remain in an institution where such things are done for me against my will.

The focus of this document, and of the various free software licenses, is on the distribution and redistribution rights associated with the software. Thus the focus of the Free Software Foundation, Open Source Initiative, and the other organisations that promote free or open source software is on ensuring appropriate licences are used, and that distributors comply with the licence terms.

Otherwise, free software development is not particularly distinguished from other forms of software development. There are, of course, some outliers, but mostly you’ll see a “core team”, perhaps with a formal structure (component owners, project/release managers, and other roles are common). This team organises its work around proprietary work-allocation tooling like Jira, Github, Slack, Trello, and so on. In many cases, an organisation such as a commercial company or business-interest foundation exists to centralise ownership and decision making, like a good old-fashioned Fordist company. The only visible difference in operations is that it’s possible to read the source code and get changes from outside through the gates and into their repository.

The reason so much work in open source software looks a whole lot like commercial software is that it is work in commercial software. Sometimes the open source project exists exactly to drive adoption of commercially-licensed equivalents, replacements, or upgrades. “Open core” products exist so that you find a need for the commercial features. Developer advocates build open source workflow tools as a sort of loss leading pre-sales activity. And many companies publish open source software “off the money path” as a recruitment activity: do some free work on our pull requests then come here and get paid to reject pull requests from your peers!

Note that I’m not saying all free software or open source software is like this, but that plenty is. Free and open source software development is informed by, and mimics, commercial software engineering to a large extent.

Of course it’s this way, you say, we software engineers need our expensive software engineering lifestyles supported by our high software engineering salaries, so we learn our craft in corporate service and that informs all of our development, including open source. Or that we are only able to contribute to open source when it’s in pursuit of our employers’ goals, because Computering Time is something we do in the office. So it’s no surprise that a lot of open source software development looks like, or is, corporate software development. If only there were proper, sustainable funding for open source software!

To which I say: that is an interesting, and problematic, request, and I wish to put it to one side. Maybe I’ll revisit it in a later post. What I’m more interested in, is what an open source movement that was centered around the Four Freedoms, rather than software-licensing concerns, would look like. One in which you didn’t merely have some abstract legal freedom to do the four activities, but were empowered to do so.

Power Zero: The power to run the program for any purpose.

How much free or open source software cannot directly be used by anyone who isn’t a programmer? Think of all of the things that are only distributed as developer libraries, for developers to incorporate into (commercial or open source) applications. The things that get duplicated as a pod, an egg, a crate, and in whatever new packaging system and programming language comes along. This is the repository namespace as a virtual landgrab, enabling open source as corporate recruitment tool: you should hire me, you’ve heard of me because I wrote the language-your-programmers-use version of library-your-programmers-use.

How much free or open source software is just plain difficult, due to accidental rather than essential complexity? Developers will probably have all sorts of (developer tools) examples here, like GNU autoconf. I find it very hard to make a nice-looking document using open source tools, too. I’ve put enough effort into learning emacs and LaTeX to get somewhat proficient, but still spend a lot of time looking up TeX syntax and getting it wrong. And making (hopefully nice-looking) documents is something I do fairly frequently.

And I’m playing this game on the easy setting. I’ve got years of experience trying to make software work. I am willing (and often paid) to put time into understanding why software doesn’t do what I want. I speak English, the home language of much software, and can use a screen and keyboard without difficulty.

A free software movement in which Freedom 0 is replaced by Empowerment 0 would make getting, trying, and adopting free software trivial. Much simpler than an app store, which has the necessary gatekeeper of payments.

Power One: The power to study how the program works, and change it to make it do what you wish.

Access to the source is a precondition for this. Also, the source being in a readily comprehensible format, amenable to experimentation and adaptation are preconditions. It needs to be possible for a finance person, not a computer person, to look at a finance application like GNUcash, understand its model of finance, and adapt it to their model of finance.

This means elevating automated testing from a thing that developers claim they do sometimes, to the principal mode of hypothesis-driven change in software. Remember when Dan North said that Cucumber would allow business analysts and developers to collaborate on writing the tests? That, but collaborating on writing the software. Currently modes of software development, including free and open source software, are predicated on the division of society into three classes: “developers” who make software, “the business” who sponsor software making, and “users” who do whatever it is they do. An enabling free software movement would erase these distinctions, because it would give the ability (not merely the freedom) to study and change the software to anyone who wanted or needed it.

Software developed with this in mind would have – nay, require – a very clear architecture. If you want a finance person to critique the finance model used in a finance application, everything needs to scream finance. It needs to stop screaming model, view, controller, or memory management, or database transaction, and start shouting credit, debit, accounts receivable.

That doesn’t mean that there isn’t space for experts in memory management or database transactions, but it does mean that you can understand a music score pagesetting application to the point where you can improve its representation of Klezmer modes without needing to understand memory management or database transactions.

Power Two: The power to redistribute and make copies so you can help your neighbour.

An enabling free software movement would make it easy for me to package my computer, or a part of it, and send it to someone else: “here, I find this useful, you give it a go”. So I don’t have to grub around (pardon the pun) for whatever guix, apt, yum, pkcon, brew, pkg or whatever command I had to run to make the software work and tell that person to run the same command, hoping it works. I just give them the thing, and they use it.

Power Three: The power to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits.

Currently there is a blessed edifice, called “upstream”, the fount of all that is good in software. Any corporate programmer who wants to juice their proprietary github profile longs for their “requests” to be “pulled” by upstream, so that all of those other programmers who are in thrall to upstream finally get to see the fruits of this individual’s labour.

It feels weird to say this in 2020, when the idea was presented as fait accompli in 1997, but an enabling open source software movement would operate more like a bazaar than a cathedral. There wouldn’t be an “upstream”, there would be different people who all had the version of the software that worked best for them. It would be easy to evaluate, compare, combine and modify versions, so that the version you end up with is the one that works best for you, too.

The Four Powers

The combination of these powers points to an open source software movement that erases power dynamics implicit and explicit in our current modes of producing software. It’s as easy for someone who understands the domain of a software system to acquire, understand, improve and share that system as it is for someone who understands the computer it runs on. The infinite malleability of software is deployed to allow people, teams and communities to produce and share their own versions that work best for them.

The antagonism between “the developers” and “the business” documented in the principles behind the agile manifesto is left to commercial software companies. Users, developers, and sponsors are on the same team, and produce software that works better for them than if they worked in other ways.

Pairing in Github

In the world of free software, it’s good to appropriately credit contributors to your community for the work they do.

git makes this hard when you pair program. I was at a hackathon recently, and while I didn’t make a single commit, I sat next to a lot of other people who made plenty of commits based on conversations we had, and suggested a lot of things to try to debug problems, and invented solutions that made it into those commits. No highly-nutritious green squares in github for me, no external evidence that I had contributed two days of my time to these free software projects.

When I pair, if I’m committing, I make sure that I acknowledge the contribution my pair makes as equal to my own. In the github UI, it looks like this. You can see that both of us contributed to the commit.

How do I do this? I commit like this:

git commit -m 'We fixed this thing' --author 'Jennifer H. Pair <jenny.pair@example.com>'

Now both accounts are linked in the UI, because I’m the committer and my pair is the author. This isn’t perfect, because github doesn’t acknowledge the author in their contribution graph, only the committer. If there’s a more egalitarian way to acknowledge my pair I’d want to follow that, but for the moment I’m happy to at least demonstrate that they authored the change I typed into a text editor.

Everyone rejecting everyone else

It’s common in our cooler-than-Agile, post-Agile community to say that Agile teams who “didn’t get it” eschewed good existing practices in their rush to adopt new ways of thinking. We don’t need UML, we’re Agile! Working software over comprehensive documentation!

This short post is a reminder that it ran both ways, and that people used to the earlier ways of thinking also eschewed integrating their tools into the Agile methodology. We don’t need Agile, we’re Model-Driven! Here’s an excerpt from 2004’s UML 2 Toolkit:

Certain object-oriented methods on the market today, such as The Unified Process, might be considered processes. However, other lightweight methods, such as Extreme Programming (XP), are not robust enough to be considered oricesses in the way we use the term here. Although they provide a valuable set of interrelated techniques, they are often referred to as “small m methodologies” rather than as software-engineering processes.

This is in the context of saying that UML supports software-engineering processes in which a sequence of activities produce “documentation…about the system being built” and “a product that solves the initial problems is introduced and delivered”. So XP is not robust enough at doing those things for UML advocates to advocate UML in the context of XP.

So it’s not just that Agilistas abandoned existing practices, but there’s an extent to which existing practitioners abandoned Agilistas too.

The feature constraint

If you’re in a purely software business, your constraining resource is often (not always, not even necessarily in most cases, but often) the rate at which software gets changed. Well, specifically, the rate at which software gets changed in a direction your customers or potential customers are interested in. This means that the limiting factor on growth is likely to be rate at which you can add features or fixes that attract new customers, or maintain old customers.

It’s common to see business where this constraint is not understood throughout management, particularly manifesting in sales. In a business to business context, symptoms include:

  • sales teams close deals with promises of features that don’t exist, and can’t exist soon.
  • there’s no time to fix bugs or otherwise clean up because of the new feature backlog.
  • new features get added to the backlog based on the size of the requesting customer, not the cost/benefit of the customer.
  • the product roadmap is “what we said we’d have, to whom, by when”, not “what we will have”.

As Eliyahu Goldratt says, you have to subordinate the whole process to the constraint. That means incentivising people to sell something a lot like what you have now, over selling a bigger number of things you don’t have now and won’t have soon.

Immutable changes

The Fixed-Term Parliaments Act was supposed to bring about a culture change in the parliament and politics of the United Kingdom. Moving for the second reading of the bill that became this Act, Nick Clegg (then deputy prime minister, now member for Facebook Central) summarized that culture shift.

The Bill has a single, clear purpose: to introduce fixed-term Parliaments to the United Kingdom to remove the right of a Prime Minister to seek the Dissolution of Parliament for pure political gain. This simple constitutional innovation will none the less have a profound effect because for the first time in our history the timing of general elections will not be a plaything of Governments. There will be no more feverish speculation over the date of the next election, distracting politicians from getting on with running the country. Instead everyone will know how long a Parliament can be expected to last, bringing much greater stability to our political system. Crucially, if, for some reason, there is a need for Parliament to dissolve early, that will be up to the House of Commons to decide. Everyone knows the damage that is done when a Prime Minister dithers and hesitates over the election date, keeping the country guessing. We were subjected to that pantomime in 2007. All that happens is that the political parties end up in perpetual campaign mode, making it very difficult for Parliament to function effectively. The only way to stop that ever happening again is by the reforms contained in the Bill.

As we hammer out the detail of these reforms, I hope that we are all able to keep sight of the considerable consensus that already exists on the introduction of fixed-term Parliaments. They were in my party’s manifesto, they have been in Labour party manifestos since 1992, and although this was not an explicit Conservative election pledge, the Conservative manifesto did include a commitment to making the use of the royal prerogative subject to greater democratic control, ensuring that Parliament is properly involved in all big, national decisions—and there are few as big as the lifetime of Parliament and the frequency of general elections.

When a parliament is convened, the date of the next general election automatically gets scheduled for the first Thursday in May, five years out. The Commons could vote, with a qualified majority, to hold an election earlier, or an election would automatically be triggered if the government lost a no-confidence vote, but the prime minister cannot unilaterally declare an election date to suit their popularity with the franchise.

Observed behaviour shows that the Act has been followed to the letter, up to the current dissolution which required a specific change to the rules. Has the spirit of the Act, the motivation presented above, survived intact? The dates of elections since the Act passed were:

  • 7 May 2015, the first Thursday in May at the end of a five-ish-year Parliament, chosen to bring the existing behaviour into sync with the planned behaviour.
  • 8 June 2017, after a qualified majority vote within the terms of the Act.
  • 12 December 2019, after the aforementioned Early Parliamentary General Election Act.

The reason for the disparity is that the intended goal—a predictable release schedule that makes it easier for everyone involved to prepare—doesn’t match the cultural drivers. The desire to release when we’re ready, and have the features that we want to see, remains immutable, and means that even though we’ve adopted the new rules, we aren’t really playing by them.

I was tempted to hit “publish” at this point and leave the software engineering analogy unspoken. I powered on: here are a few examples I’ve seen where the rule changes have been imposed but the cultural support for the new rules hasn’t been nurtured.

  • Regular releases, but the release is “internal only” or completely unreleased until all of the planned features are ready;
  • Short sprints, where everything that has gone from development into QA is declared “done”;
  • Sprint commitments, where the team also describe “stretch goals” that are expected to be delivered;
  • Sustainable pace, where the “velocity” is expected to increase monotonically;
  • Self-organizing teams, where the manager feeds back on everybody’s status update at the daily stand-up;
  • Continuous integration, where the team can disable or skip tests that fail.

All of these can be achieved without the attached sabotage, but that requires more radical changes than adding a practice to the team’s menu. Radical, because you have to go to the root of why you’re doing what you do. Ask what you’re even trying to achieve by having a software team working on your software, then compare how well your existing practice and your proposed practice support that value. If the proposed practice is better, then adopt it, but there’s going to be a transition period where you continually explain why you’re adopting it, show the results, and (constructively, politely, and firmly) guide people toward acceptance of and commitment to the new practice. Otherwise you end up with a new fixed-term parliament starting whenever people feel like it.

On exploding boilers

Throughout our history, it has always been standardisation of components that has enabled creations of greater complexity.

This quote, from Simon Wardley’s finding a path, reminded me of the software industry’s relationship with interchangeable parts.

Brad Cox, in both Object-Oriented Programming: an Evolutionary Approach and Superdistribution, used physical manufacturing analogies (to integrated circuits, and to rifles) to invoke the concept of a “software industrial revolution” that would allow end users to assemble off-the-shelf parts to solve their problems. His “software ICs” built on ideas expressed at least as early as 1968 by Doug McIlroy. Joe Armstrong talked about a universal function registry, so that if someone writes sin/1 everybody else can use it.

Of course we have a lot of reusable components in software engineering now, and we can thank the Free Software movement at least as much as any paradigm of organising programming instructions. CTAN, CPAN, and later repositories act as the “component catalogues” that Cox discussed. If you want to make a computer do something, you can probably find an npm module or a Ruby gem that does most of the work for you. The vast majority of such components have free licenses, it’s rare to pay for a reusable component.

The extent to which they’re “standard parts”, on the model of interchangeable nuts and bolts or integrated circuits, is debatable. Let’s say that you download a package from the NPM. We know that you use it by calling require (or maybe import)…but what does that give you? An object? A constructor? A regular function? Does it run anything as a result of calling require? Does it work in your node/ionic/electron/etc. context? Is it even a lump of regular javascript, or is it a Real, or to have access to a JVM, or some other niche requirement?

Whatever these “standard parts” and however they’re used, you’re probably still doing a bunch of coding. These parts will do computery stuff, or maybe generic behaviour like authentication, date UIs, left-padding strings and the like. Usually we still have to develop ours apps as “engineered” software projects with significant levels of custom coding, to make those “standard parts” actually solve a useful problem. There are still people working for retail companies maintaining online store applications across the four corners of the globe, despite the fact that globes don’t have corners, these things all work the same way, and the risks associated with getting them wrong are significant.

Perhaps this is because software is a distinct thing, and we can never treat it like industrial product manufacturing.

Perhaps this is because our ambition always runs out ahead of our capability. Whatever we can reproducibly build, we’d like to be building something greater.

Perhaps this is because we’re still in the cottage industry stage, where we don’t yet know whether or how to standardise the parts, and occasionally the boilers explode.