The Logical Fallacy

Nary a week goes by without seeing a post by a programmer, for programmers, on the subject of logical fallacies in arguments. This week’s, courtesy of hacker news, is not egregious, enlightening, or indeed different in any way from the usual torrent. It is merely the one that prompted me into writing this article. The most frequent, and most severe, logical fallacy I encounter among programmers is this one:

  • basing your argument on logic.

Now, obviously, for a fallacy to be recognised it needs to have a Latin name, so I’m going to call this one argumentum ex logica.

Argumentum ex logica is the fallacious reasoning that the best course of action for a group of people is the one that can be arrived at by logical deduction. No need to consider the emotions of the people involved, or the aesthetic properties of any potential solutions. Just treat your workplace like your high school debating club, pick (seemingly arbitrarily) some axioms, and batter your way through to your preferred conclusion.

If people disagree with you on (unreasonable) emotional grounds, just name their logical fallacies so you can overrule their arguments, like you’re in an episode of Ally McBeal and want their comments stricken from the record. If people manage to find a flaw in the logic of your argument, just pick a new axiom you’d never mentioned before and carry on.

The application of the argumentum ex logica fallacy is frequently accompanied by descriptions of the actions of “the brain”, that strange impish character that sits inside each of us and causes us to divert from the true path of Syrran of Vulcan. Post hoc ergo propter hoc, we are told, is an easy mistake to make because “the brain” sees successive events as related.

Here’s the weird thing. We all have a “the brain” inside us, as an important part of our being. By writing off “the brain” as a mistaken and impure lump of wet fat, programmers are saying that they are building their software not for humans. There must be some other kind of machine that functions on purely logical grounds, for whom their software is intended. It should not be.

Why 80?

80 characters per line is a standard worth sticking to, even today. OK, why?

Well, back up. Let’s examine the axioms. Is 80 characters per line a standard? Not really, it’s a convention. IBM cards (which weren’t just made by IBM or read by IBM machines) were certainly 80 characters wide, as were DEC video terminals, which Macs etc. emulate. Actually, that’s not even true. The DEC VT-05 could display 72 characters per line, their later VT-50 and successor models introduced 80 characters. The VT-100 could display 132 characters per line, the same quantity as a line printer (including the ones made by IBM). Other video terminals had 40 or 64 character lines. Teletypewriters typically had shorter lines, like 70 characters.

Typewriters were typically limited to \((\mathrm{width\ of\ page} – 2 \times \mathrm{margin\ width}) \times \mathrm{character\ density}\) characters per line. With wide margins and narrow US paper, you might get 50 characters: with narrow margins and wide A4 paper, maybe 100.

IBM were not the only people to make cards, punches, and readers. Other manufacturers did, with other numbers of characters per card. IBM themselves made 40, 45 and 96 column cards. Remington Rand made cards with 45 or 90 columns.

So, axiom one modified, “80 characters per line is a particular convention out of many worth sticking to, even today.” Is it worth sticking to?

Hints are that it isn’t. The effects of line length on reading online news explored screen-reading with different line lengths: 35, 55, 75 and 95 cpl. They found, from the abstract:

Results showed that passages formatted with 95 cpl resulted in faster reading speed. No effects of line length were found for comprehension or satisfaction, however, users indicated a strong preference for either the short or long line lengths.

However that isn’t a clear slam dunk. Quoting their reference to prior work:

Research investigating line length for online text has been inconclusive. Several studies found that longer line lengths (80 – 100 cpl) were read faster than short line lengths (Duchnicky and Kolers, 1983; Dyson and Kipping, 1998). Contrary to these findings, other research suggests the use of shorter line lengths. Dyson and Haselgrove (2001) found that 55 characters per line were read faster than either 100 cpl or 25 cpl conditions. Similarly, a line length of 45-60 characters was recommended by Grabinger and Osman-Jouchoux (1996) based on user preferences. Bernard, Fernandez, Hull, and Chaparro (2003) found that adults preferred medium line length (76 cpl) and children preferred shorter line lengths (45 cpl) when compared to 132 characters per line.

So, long lines are read faster than short lines, except when they aren’t. They also found that most people preferred the longest or shortest lines the most, but also that everybody preferred the shortest or longest lines the least.

But is 95cpl a magic number? What about 105cpl, or 115cpl? What about 273cpl, which is what I get if I leave my Terminal font settings alone and maximise the window in my larger monitor? Does it even make sense for programmers who don’t have to line up the comment markers in Fortran-77 code to be using monospaced fonts, or would we be better off with proportional fonts?

And that article was about online news articles, a particular and terse form of prose, being read by Americans. Does it generalise to code? How about the observation that children and adults prefer different lengths, what causes that change? Does this apply to people from other countries? Well, who knows?

Buse and Weimer found that “average line length” was “strongly negatively correlated” with perceived readability. So maybe we should be aiming for one-character lines! Or we can offset the occasional 1,000 character line by having lots and lots of one-character lines:

}
}
}
}
}
}

It sounds like there’s information missing from their analysis. What was the actual shape of the data? What were the maximum and minimum line lengths considered, what distribution of line lengths was there?

We’re in a good place to rewrite the title from the beginning of the post: 80 characters per line is a particular convention out of many that we know literally nothing about the benefit or cost of, even today. Maybe our developer environments need a bit of that UX thing we keep imposing on everybody else.

More speed, lower velocity

I frequently meet software teams who describe themselves as “high velocity”, they even have graphs coming from Jira to prove it, and yet their ability to ship great software, to delight their customers, or even to attract their customers, doesn’t meet their expectations. A little bit of sleuthing usually discovers the underlying problem.

Firstly, let’s take a look at that word, “velocity”. I, like Kevlin Henney, have a background in Physics, and therefore I agree with him that Velocity is a vector, and has a direction. But “agile” velocity only measures amount of stuff done to the system over time, not the direction in which it takes the system. That story may be “5 points” when measured in terms of heft, but is that five points of increasing existing customer satisfaction? Five points of new capability that will be demoed at next month’s trade show? Five points of attractiveness to prospects in the sales funnel?

Or is it five points of making it harder for a flagship customer to get their work done? Five points of adding thirty-five points of technical debt work later? Five points of integrating the lead engineer’s pet technology?

All of these things look the same in this model, they all look like five points. And that means that for a “high-velocity” (but really low-velocity, high-speed) team, the natural inclination is to jump on it, get it done, and get those five points under their belt and onto the burn down chart. The faster they burn everything down, the better they look.

Some of the presenting symptoms of a high-speed, low-velocity team are listed below. If you recognise these in your team, book yourself in for office hours and we’ll see if we can get you unstuck.

  • “The Business”: othering the rest of the company. The team believes that their responsibility is to build the thing that they were asked for, and “the business” needs to tell them what to build, and to sell it.
  • Work to rule: we build exactly what was asked for, no more, no less. If the tech debt is piling up it’s because “the business” (q.v.) doesn’t give us time to fix it. If we built the wrong thing it’s because “the business” put it at the top of the backlog. If we built the thing wrong it’s because the acceptance criteria weren’t made clear before we started.
  • Nearly done == done: look, we know our rolling average velocity is 20 bushels of software, and we only have 14 furlongs and two femtocandela of software to show at this demo. But look over here! These 12 lumens and 4 millitesla of software are in QA, which is nearly done, so we’ve actually been working really hard. The fact that you can’t use any of that stuff is unimportant.
  • Mini-waterfall: related to work to rule (q.v.), this is the requirement that everyone do their bit of the process in order, so that the software team can optimise for requirements in -> software out and get that sweet velocity up. We don’t want to be doing discovery in engineering, because that means uncertainty, uncertainty means rework, and rework means lower velocity.
  • Punitive estimation: we’re going to rename “ambiguity” to “risk”, and then punish our product owner for giving us risky stories by boosting their estimates to account for the “risk”. Such stories will never get scheduled, because we’ll never be asked to do that one risky thing when we can get ten straightforward things done in what we are saying is the same time.
  • Story per dev: as a team, our goal is to shovel as much software onto the runtime furnace as possible. Therefore we are going to fan out the tasks to every individual. We are each capable of wielding our own shovel, and very rarely do we accidentally hit each other in the face while shovelling.

Coming to terms with fewer terms

I was on a “Leadership in Architecture” panel organised by RP International recently, and was asked about problems we face using new techniques like Microservices, serverless and machine learning in the financial technology sector. The biggest blocker I see is the RFP (Request for Proposals), RFI (Request for Information), the MSA (Master Service Agreement), any document with a three-letter acronym. We would do better if they disappeared.

I’m fully paid up in my tithe to the church of “customer collaboration over contract negotiation”, and I believe that this needs to extend beyond the company boundary. If we’re going to spend a few months going back and forth over waving our certifications about, deciding who gets to contact whom within what time, and whether the question they asked constitutes a “bug report” or a “feature request”, then I don’t think it matters whether the development team use two-week sprints or not. We’ve already lost.

We’ve lost because we know that the interactions between the people involved are going to be restricted to the terms agreed during that negotiation. No longer are people concerned about whether the thing we’re making is valuable; they’re concerned with making sure their professional indemnity insurance is up to date before sending an email to the DRI (Definitely Responsibility-free Inbox).

We’ve lost because we had a team sitting on its hands during the negotiation, and used that time “productively” by designing the product, putting epics and stories in a backlog, grooming that backlog, making wireframes, and all of those other things that aren’t working software.

We’ve lost because each incompatibility between the expectation and our intention is a chance to put even more contract negotiation in place, instead of getting on with making the working software. When your RFI asks which firewall ports you need to open into your DMZ, and our answer is none because the software runs outside of your network on a cloud platform, we’re not going to get into discussions of continuous delivery and whether we both read the Phoenix Project. We’re going to get into discussions of whether I personally will warrant against Amazon outages. But here’s the thing: we don’t need the software to be 100% up yet, we don’t even know whether it’s useful yet.

Here’s an alternative.

  1. We, collectively, notice that the software we make solves the problem you have.
  2. We, collectively, agree that you can use the software we have now for a couple of weeks.
  3. We, collectively, discuss the things that would make the software better at solving the problem.
  4. We, collectively, get those things done.
  5. We, collectively, GO TO 2.

Notice that you may have to pay for steps 2-4.

What’s better than semver?

Many software libraries are released with version “numbers” that follow a scheme called Semantic Versioning. A semantic version is three numbers separated by dots, of the form x.y.z, where:

  • if x is zero, all bets are off. Otherwise;
  • z increments “if only backwards compatible bug fixes are introduced. A bug fix is defined as an internal change that fixes incorrect behavior.”

Problem one: there is no such thing as an “internal change that fixes incorrect behavior” that is “backwards compatible”. If a library has a function f() in its public API, I could be relying on any observable behaviour of f() (potentially but pathologically including its running time or memory use, but here I’ll only consider return values or environment changes for given inputs).

If they “fix” “incorrect” behaviour, the library maintainers may have broken the package for me. I would need a comprehensive collection of contract or integration tests to know that I can still use version x.y.z' if version x.y.z was working for me. This is the worst situation, because the API looks like it hasn’t changed: all of the places where I call functions or create objects still do something, they just might not do the right thing any more.

Problem two: as I relaxed the dependency on running time or memory use, a refactoring could represent a non-breaking change. Semver has nowhere to record truly backwards compatible changes, because bugfixes are erroneously considered backwards compatible

  • y increments “if new, backwards compatible functionality is introduced to the public API”.

This is fine. I get new stuff that I’m not (currently) using, but you haven’t broken anything I do use.

Problem three: an increment to y “MAY include patch level changes”. So I can’t just quietly take in the new functionality and decide whether I need it on my own time, because the library maintainers have rolled in all of their supposedly-backwards-compatible-but-not-really changes so I still don’t know whether this version works for me.

  • x increments “if any backwards incompatible changes are introduced to the public API”.

Problem four: I’m not looking at the same library any more. It has the same name, but it could be completely rewritten, have any number of internal behaviour changes, and any number of external interface changes. It might not do what I want any more, or might do it in a way that doesn’t suit the needs of my application.

On the plus side

The dots are fine. I’m happy with the dots. Please do not feel the need to leave a comment if you are unhappy with the dots or can come up with some contrived reason why “dots are harmful”, as I don’t care.

Better: meaningful versioning

I would prefer to use a version scheme that looks like z.w.y:

  • y has the meaning it does in semver, except that it MUST NOT include patch level changes. If a package maintainer has added new things or deprecated (but not removed) old things, then I can use the package still.
  • z has the meaning it does in semver, except that we stop pretending that bug fixes can be backwards compatible.
  • w is incremented if non-behavioural changes are implemented; for example if internals are refactored, caches are introduced or removed, or private data structures are changed. These are changes that probably mean I can use the package still, but if I needed particular performance attributes from the library then it is on me to discover whether the new version still meets my needs.

There is no room for x in this scheme. If a maintainer wants to write a new, incompatible library, they can use a new name.

Different: don’t use versions

This is more work for me, but less work for the package maintainer. If they are maintaining a change log (which they are, as they are using version control) and perhaps a medium for announcing important changes including security and bug fixes and new features, then I can pick the commit that I discover does what I need. I can maintain my own tree (and should be anyway, in case the maintainer decides to delete their upstream repo) and can cheery pick the changes that are useful for me, leaving out the ones that are harmful for me.

This is more work for me than the z.w.y scheme because now I have to understand the impact of each change. It is the same amount of work as the semver x.y.z scheme, because then I had to understand the impact of each change too, as changes to any of the three version component could potentially include supposedly-backwards-compatible-but-not-really changes.

To become a beginner, first become an expert

We have a whole load of practices in programming that only really work well if you’re already good at whatever the process is supposed to help with.

Scrum is a process improvement framework, but only if you already know how to do process improvement. If you don’t, then Scrum is just the baseline mini-waterfall process with a chance to air your dirty laundry every fortnight.

Agile is good at helping you embrace change, but only if you’re already good enough at managing change to understand which changes should be embraced.

#NoEstimates helps you avoid the overhead of estimates, but only if you’re already good enough at estimates to know that you always write user stories that take 0.5-2 days to implement.

TDD helps you design your APIs, but only if you’re already good enough at API design to understand things like dependency injection and loose coupling.

Microservices help you isolate modules, but only if you’re already good enough at modularity not to get swamped in HTTP calls.

This is all very well for selling consultancy (“if your [agile] isn’t working, then you aren’t [agiling] hard enough, let me [agile] you some more”) but where’s the on-ramp?

Reasoning about reasoning about software

Functional programmers like to claim that you can’t reason about mutable state programs. Some thoughts:

  • the first half of the book A Discipline of Programming by Edsger W. Dijkstra tells you how to do it. That half of the book is approximately 100 pages (the remainder of the book is worked examples).
  • object-oriented programming breaks a software system up into separate systems running miniature, message-driven programs as if on separate computers. Therefore the consideration of “mutable state” can be split in two: the state internal to the object and the state external to the object which sends messages to the object but is ignorant of its internals. If you can’t split the state that way, you have bad encapsulation.
  • The reasoning done about the external and internal behaviours had better match at the interface. Design by contract probably helps here.
  • Given a state S, an operation O can be defined as \(O(args \times S) \rightarrow (R \times S’)\), i.e. it returns a result R and updates the state to S’.
  • However, Bertrand Meyer introduced Command-Query Separation in the 1980s, so you only need to know \(O(args \times S) \rightarrow (R \times S)\) and \(O(args \times S) \rightarrow (\emptyset \times S’)\).
  • Various history “traces” can be considered equivalent and therefore a lot of knowledge about the historical state transitions elided, simplifying the reasoning. For example, given a well-designed stack, it is impossible to distinguish the history of stack.push(3); stack.pop(); stack.push(7) from stack.push(7).
  • Various operations on the state are irrelevant to the behaviour of an operation under consideration. In reasoning about the final operation in a = 3; b = 7; c = 9; stack.push(2) you do not need to consider the assignment operations (and indeed their presence may indicate a cohesion problem in your design).
  • The one remaining source of difficulty is aliasing; I do need to know about the elided operations in the sequence x = 7; *y = &x; ...; z=f(x). This is aliasing, not mutable state.

The Atoms of Programming

In the world of physics, there are many different models that can be used, though typically each of them has different applicability to different contexts. At the small scale, quantum physics is a very useful model, Newtonian physics will yield evidently incorrect predictions so is less valuable. Where a Newtonian model gives sufficiently accurate results, it’s a lot easier to work with than quantum or relativistic mechanics.

All of these models are used to describe the same universe – the same underlying collection of observations that can systematically be categorised, modelled and predicted.

Physical science (or experimental philosophy) does not work in the same way as computational philosophy. There are physical realisations of computational systems, typically manifested as electronic systems or pencil-and-paper simulations. But the software, the abstract configurations of ideas that run on those systems, exist in entirely separate space and are merely (though the fact that this is possible is immensely powerful) translated into the electronic or paper medium.

Of course one model for the software system is to abstract the electronic: to consider the movement of electrons as the presence of voltages at terminals; to group terminals as registers or busses; to further abstract this range of voltages as 0 and that range as 1. And indeed that model frequently is useful.

Frequently, that model is not useful. And the great thing is that we get to select from a panoply of other models, at some small or large remove from the physical model. We can use these models separately, or simultaneously. You can think of a software system as a network of messages passed between independent objects, as a flow of data through transformers, as a sequence of state changes, as a graph of single-argument functions, as something else, or as a combination of these things. Each is useful, each is powerful, all are applicable.

Sometimes, I can use these models to make decisions about representing the logical structure of these systems, transforming a concept into a representation that’s valid in the model. If I have a statement in a mathematical formulation of my problem, “for any a drawn from the set of Articles there exists a p drawn from the set of People such that p is the principal author of a” then I can build a function, or a method, or a query, or a predicate, or a procedure, or a subroutine, or a spreadsheet cell, or a process, that given an article will yield exactly one person who is the principal author of that article.

Sometimes, I use the models to avoid the conceptual or logical layers at all and express my problem as if it is a software solution. Object-oriented analysis and design, data flow modelling, and other techniques can be used to represent a logical model, or they can be used to bash the problem straight into a physical model without having thought about the problem in the abstract. “Shut up and code” is an extreme example of this approach, in which the physical model is realised without any attempt to tie it to a logical or conceptual design. I’ll know correct when I see it.

I don’t see a lot of value in collecting programming languages. I can’t count the number of different programming languages I’ve used, and many of them are entirely similar. C and JavaScript both have sequences of expressions that are built into statements that are built into procedures. Both let me build aggregations of data and procedures that either let me organise sequential programs, represent objects, represent functions, or do something else.

But collecting the models, the different representations of systems conceptually that can be expressed as software, sometimes called paradigms: this is very interesting. This is what lets me think about representing problems in different ways, and come up with efficient (conceptually or physically) solutions.

More paradigms, please.

In which new developer tools are dull

Over on lobste.rs I said that I don’t hold out much hope for another “blue plane” style event in developer tools. In one of Alan Kay’s presentations, he referred to the ordinary way of things as the pink plane, and incremental advances in the state of affairs being movements in that plane. Like the square in Edwin Abbot’s Flatland that encounters a sphere, a development could take us out of the pink plane into the (orthogonal) blue plane. These blue plane ideas are rare because like the square, it’s hard to even conceive of life outside the pink plane.

In what may just be a surprising coincidence, Apple engineers used Blue and Pink to refer to features in evolutionary and revolutionary developments of their operating system.

Software engineering tooling is, for the majority of developers, in a phase of conservative retreat

Build UIs on the web and you probably won’t use a graphical builder, you’ll type HTML and JavaScript (and maybe JSX) into a text editor.

Build native apps and even where there is a GUI builder, you’ll find people recommending against its use and wanting to do things “programmatically” (by which they mean “through typing”, even though the GUI builder tools are another way to construct a program).

In the last couple of decades, interest in CASE tooling has shrunk to conservative interest in text editors with some syntax highlighting, like vim or Atom. Gone even is the “build and run” button from IDEs, to be replaced with command-line invocations of grunt tasks (a fancy phrase meaning shell scripts), npm scripts (a fancy phrase meaning shell scripts) or rake tasks (you get the idea).

Where previously there were live development environments embedded in the deployment environment (and the Javascript VM is almost perfectly designed for that task), there is now console.log and unit tests. The height of advanced interaction with your programming tools are the REPL (an interactive shell) and the Playground/InstaREPL (an interactive shell that echoes stdin and stdout in different places).

For the most part, and I say that to avoid the inevitable commenter who thinks that a counterexample like LabView or Mathematica or that one person they met who uses Expression Blend renders the whole argument broken, developers have doubled down on the ceremony of programming: the typing of arcane text into an 80×24 character display. Now to be fair, text is an efficient and compact graphical representation of a linear sequence of connected concepts. But it is not the only one, nor the most efficient nor most compact, and neither are many software systems linear.

The rewards in making software to make software are scarce.

You can do like IntelliJ do, and make a better version of the 80×24 text entry thing. You can work for a platform vendor, and make their version of the 80×24 thing. You can go and get an engineering grade 6 or above job in Silicon Valley and tell your manager that whatever it is their business does, you’re going to focus on the 80×24 thing (“at scale”) instead.

What you don’t seem to be able to do is to disrupt the 80×24 thing. It’s free (at least as in beer), it’s ubiquitous, and whether or not it’s as good as it could be it certainly seems to be good enough for the people who not only get paid to make bad software, but get paid again to fix it.

Technical debt and jury service

We have the idea that in addition to the product development backlogs for our teams, there’s an engineering backlog where technical debt paydown, process/tooling improvements, and other sitewide engineering concerns get recorded. Working on them is done in time that is, by definition, taken away from the product backlogs (because of Sustainable Pace).

A colleague recently described the time spent on the engineering backlog as a “tax”, which is an interesting analogy. A pejorative interpretation is that, like a tax, centralised engineering work is a cost imposed that takes away from realising more value on my direct projects.

A positive spin is that taxes go toward funding the commons: no one of us can afford to build a road between our house and the office, but having roads connecting all the houses to all the offices has strategic benefit to society as a whole (higher productivity, lower unemployment, more opportunities) so if we all pay in a fraction of the cost of a road we can all have a road. Similarly, one product team might grind to a halt if they spend all of their time on the new CD pipeline infrastructure, but all teams will benefit if they all chip in a bit.

This version of the analogy implies that there might be, like the treasury, a central agency deciding how to spend the common wealth. Somebody needs to decide how much tax everyone should pay, what to do with dissenters (is it OK if your product team focuses on its sprint for a fortnight and doesn’t do any of the engineering backlog?), whether to accept overpayments, and what those tax dollars should go on.

Only it’s not tax dollars, it’s tax hours. In this sense, a better analogy is conscription (I originally thought of the Anglo-Saxon fyrd, maybe jury service or non-military national service is a less aggressive way to consider this). Taxation means that I give all of my work time to Wealth Wizards but give a chunk of my money to the government. Conscription means that I don’t get to give all of my time to my employer: some of it has to go to the commons. Maybe Jonathan and Rebecca can’t give any time to their product teams this week because they’ve been “called up” to the engineering backlog effort.

That seems like a useful analogy for these tasks. I can think about what resources are available for products or “the commons”, because I can think about whether someone is working on “the day job” or has been conscripted. Maybe it doesn’t make sense for everybody to have equal likelihood of being “called up”, in the same way that it’s easier for students to get out of jury service than for full-time employees.