Impossibility and Uncertainty in AI

About this paper

Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function), Peter Eckersley. Submitted to the ArXiV on December 31, 2018.

Notes

Ethical considerations in artificial intelligence applications have arguably been present since the birth of the field, if not earlier. Karel Čapek wrote R.U.R., the play that introduced both the word “robot” to our lexicon, and the question of whether such creations would want to work. Isaac Asimov and Mary Shelley both asked how a society with “natural” and “artificial” people would function.

Getting Trollied

More recently, as we’ve been able to create things that we can both identify as solving problems and market under the term Artificial Intelligence, the question of AI ethics has reappeared with different specialisations. Should we apply AI techniques to particular fields of endeavour at all, such as autonomous weaponry? Who should be responsible for a decision made, for an example, by an autonomous vehicle? How should such a vehicle make a decision when faced by the classic trolley problem?

Eckersley opens this paper with an aside, important to those who consider AI as an engineering discipline (which is not a clear statement in itself): autonomous vehicle research is a long, long way away from even contemplating the trolley problem. Currently, reducing the uncertainty with which obstacles are detected and identified from noisy input signals is a much more pressing issue.

However, human drivers are better at object identification, so may be sufficiently advanced to address the trolley problem should it arise in piloting a vehicle. And this is really an important point in thinking philosophically about how AIs should act: we already have intelligent actors, and already think about how they act.

Utilitarianism

Indeed, we already have artificial intelligences, including society-level AIs. A government bureaucracy; an enterprise; a policy document; all of these are abstractions of human intelligence behind machines, rules, and assumptions of how to work in given contexts. Any time that someone relies on external input to make a decision, whether that input is a checklist, a company handbook, a ready reckoner tool or a complex software-built decision support tool, you could replace that tool with an AI system with no change in the ethics of the situation.

Anywhere you could consider an AI making choices that are suboptimal to the welfare of the affected people, you could imagine those choices being suggested by the checklist or ready reckoner. None of this is to minimise the impact or applicability of AI ethics, quite the contrary: if we are only considering these problems now because we think AI has become useful, we are late to the party. Very late: what was the Soviet Gosplan other than a big decision support system trying to optimise a society along utilitarian lines?

The early sections of this paper summarise earlier proofs that it is impossible to define an objective utility function that will optimise for multiple, conflicting output dimensions. Not because it is hard to discover or compute the function, but because an acceptable trade-off may not exist. The proofs take this form:

  • imagine a guess at a solution, s.
  • there is a rule r1 which, applied to s, yields a better solution, s1.
  • there is a rule r2 which, applied to s1, yields a better solution, s2.
  • there is a rule rN which, applied to s(n-1), yields a better solution, s.

In other words, the different states that represent solutions cannot be “totally ordered” from “best” to “worst”; a cycle exists such that the question “which is the best” is paradoxical. While that has clear impact on AIs that use objective utility functions to choose solutions to their inputs, the wider implication is that no utilitarian system can objectively choose the “best” arrangement for welfare.

If you’re building the multivac from Isaac Asimov’s Franchise, then I’m going to have to disappoint you now: it’s impossible. But so is having a room full of people with slide rules and 383 different folders containing next year’s production targets. It’s not building an AI to optimise for the utility function that cannot be done; it’s finding a maximum on the utility function.

Finding a Workaround

A few techniques for addressing this problem are discussed and rejected: we could ignore the problem on the basis that our AIs are not currently important enough to have huge impact on social welfare. But AI is being used in welfare systems, in criminal justice systems, in financial management. And even where AI is not used, corporations and bureaucracies are already being applied, and these will have their decision support tools, which will eventually include AI.

Similarly, simplifying the problem is possible, but undesirable. You could define some kind of moral exchange rate to reduce the dimensionality of your problem: this much wealth in retirement is worth this much poverty today; this much social equality can be traded for this much value of statistical life. Or you could accept one or more of the undesirable properties of a solution as axiomatic; we can solve for inequality if we accept poverty, perhaps. Neither of these are ethically unambiguous.

Embracing Uncertainty

Ultimately, Eckersley concludes that the only feasible approach is to relax the requirement for objective, total ordering of the possible outcomes. One way is to introduce partial ordering of at least two of the comparison dimensions: to go from saying that a solution is { better than, worse than, the same as } another along some axis to saying it is { better than, worse than, the same as, incomparable to } the other solution. While this breaks the deadlock it’s difficult to work with, and domain-specific interpretations of partial ordering will be needed. And now we’re back at the trolley problem: there are multiple outcomes to choose from, we don’t know (because it’s impossible to say) which is better, and yet a decision must be taken. Which?

A second model replaces the definite ordering of solutions with probabilistic ordering. Rather than the impossible statement “outcome B is better than outcome A”, the pair of statements “outcome B has probability p of being better than outcome A” and “outcome A has probability (1-p) of being better than outcome B” are given. Now a system can find the outcome with the highest likelihood of being best, or pick from some with sufficient likelihood of being best, even though it is still impossible to find the best. There will always be some chance of violating a constraint, but those should at least be free of systematic bias.

Implications

If your AI strategy is based on the idea that your domain can be “solved numerically”, by finding some utility function or by training an AI to approximate the utility function, then you need to ask yourself whether this is true without introducing some biases or unethical choices into the solutions. Consider whether your system should find what is “best” and do it, or find actions that are “good enough” and either choose between them, or present them for human supervision.

The App that Wasn’t (Yet)

One of the early goals written into the mission statement of the Labrary was an eponymous app for organising research notes. I’ve used Mekentosj Springer Readcube Papers for years, and encountered Mendeley and others, and found that they were all more focussed on the minutiae of reference management, rather than the activity of studying and learning from the material you’re collecting in your library. Clearly those are successful apps that have an audience, but is there space for something more lightweight?

I talked to a few people, and the answer was yes. There were people in software engineering, data science, and physics who identified as “light” consumers of academic literature, people who read the primary literature to learn from and find techniques to apply, but do not need or even want the full cognitive weight of bibliographic reference management. They (well, “we”, I wanted it too) wanted to make notes while they were reading papers, and find those notes again. We wanted to keep tags on interesting references to follow up. We wanted to identify the questions we had, and whether they were answered. And we wanted to have enough information—but not more—to help us find the original article again.

My first prototype was as simple as I could make it. There’s a picture below: it’s a ring binder, with topic dividers, and paper notes (at least one separate sheet for each article) which quickly converged on a pro forma layout as shown.

An early prototype of the Labrary app.
An early prototype of the Labrary app.

I liked it, in fact I quickly got to a point where I wouldn’t read an article unless I had access to a pad and pen to add a page to my binder. People I showed it to liked it, too. So this seemed like a good time to crack open the software making tools!

The first software prototype was put together in spare time using GNUstep and Renaissance, and evinced two problems:

  • The UI design led back down the route of “bibliopedantry”, forcing students to put more effort into getting the citation details correct than they wanted to.
  • Renaissance lacked support for some Cocoa controls it would have been helpful to use, so there was a choice to be made to invest more into improving Renaissance or finding a different UI layout tool.
A screenshot of the ill-fated "Library" window in Labrary's GNUstep prototype.
A screenshot of the ill-fated “Library” window in Labrary’s GNUstep prototype.

This experience made me look for other inspiration for ways to organise the user interface so that students get the experience of taking notes, not of fiddling with citation data. I considered writing Labrary as a plugin for the free Calibre e-reader app, so that Labrary could focus on being about study notes and Calibre could focus on being about library management. But ultimately I found the tool that solved the problem best: Apple’s Finder.

The Labrary pro forma note as Finder stationery.
The Labrary pro forma note as Finder stationery.

I’ve recreated the pro forma note from the binder as a text file, and set the “Stationery Pad” flag in the Finder. When I open this file, Finder creates a duplicate and opens that instead, in my editor of choice: ready to become a new study note! I put this in a folder with a Zim index file, so I can get the “shoebox” view of all the notes by opening the folder in Zim. It also does full-content searching, so the goal of finding a student’s notes again is achieved.

Zim open on my research notes folder.
Zim open on my research notes folder.

I’m glad I created the lo-fi paper prototype. It let me understand what I was trying to achieve, and show very quickly that my software implementation was going in the wrong direction. And I’m always happy to be the person to say “do we need to write this, or can it be built out of other bits?”, as I explored for this project with Zim and Calibre.

Research Watch, and Java by Contract

I introduced Java by Contract, a tool for building design-by-contract style invariants, preconditions and postconditions in Java using annotations. It’s MIT licensed, contributions are welcome, and I hope this helps lots of people to introduce stronger correctness checking into your software. And book office hours if you’d like me to help you with that.

Java by Contract came about as part of Research Watch, a new blog series over at The Labrary where I talk about academic work and how us “practitioners” (i.e. people who computer who aren’t in academia) can make use of the results. The first post considers a report of Teaching Quality Object-Oriented Programming to computer science students.

By the way, I will be speaking at Coventry Tech Meetup on 10th January on the topic “Beyond TDD”, and Java by Contract will make an appearance there.

Long-time SICPers readers will remember Programming Literate, a Tumblr discussing results from empirical software engineering. And if you don’t, you’ll probably remember your feeds exploding on July 15, 2013 when I imported all of the posts from there to here. You can think of Research Watch as a reboot of Programming Literate. There’ll be papers new and vintage, empirical and opinionated, on a range of computing topics. If that sounds interesting, subscribe to the Labrary’s RSS feed.

Teaching Quality Object-Oriented Programming

About this paper

Teaching Quality OOP by Yishai A. Feldman, published March 2005 (see the link for full citation).

Notes

One of the points made in my book Object-Oriented Programming the Easy Way is that objects should be specified by their interfaces through contracts, which say what messages the objects respond to, how you use them, and what happens as a result.

While it is up to any one object to decide how it responds to messages, we need to know whether that object represents a useful addition to our system. In other words, we want to know what the object will do in response to what messages. (Page 61)

The book Structure and Interpretation of Computer Programs says the same thing. An abstract data type has a collection of things that can be done, and stuff that happens when you do it.

In general, we can think of data as defined by some collection of selectors and constructors, together with specified conditions that these procedures must fulfill in order to be a valid representation. (§2.1.3)

This seems evident. Knowing that I have a int count(), a boolean contains(Object o) and a void add(Object o) method is insufficient, I need to know how they interact before I can use them. For an array, given:

int x = a.count();
a.add(anObject);
int y = a.count();

you would always expect y - x == 1. For a set, it would depend on the content of the collection before addition; it could be 1 or 0. Knowing what methods are called is insufficient to know what type I am dealing with.

Why is it, then, that programming languages give you types for expressing the methods on an object, but not what they do or how they relate? Why does a Java interface, a Swift protocol, or a C++ abstract class only have the part of the contract related to names, not behaviours?

Famously, the Eiffel language addresses this, and it’s here that we come into contact with the paper that is the topic of this post. Eiffel, and its underlying theory, is well-described in the book Object-Oriented Software Construction by Bertrand Meyer, CTO of Eiffel Software and researcher at ETH Zurich. Feldman wanted to teach his students the theory of “Quality Software” based on two principles that are well-described in OOSC:

  • Design by Contract, because it’s more approachable and usable than formal methods, and more useful than testing; and
  • Command-Query Separation, because it isolates state changes, making it easier to draw conclusions about the behaviour of a software system.

But he didn’t want to teach them Eiffel, because:

it might leave the students with the mistaken and harmful impression that quality programming is confined to one language. (§3)

Additionally, OOSC does not include any exercises, so is not useful as a teaching support book. I will note here that Meyer has also written A Touch of Class, which does have supporting teaching material including exercises, but still uses the Eiffel language and therefore only solves half of the author’s problems.

Feldman taught the theory from OOSC using Eiffel notation, and encouraged students to complete exercises that were in variants of the Java programming language. Tools available at the time read the contract out of special additions to the class’s Javadoc comments, and modified the source code to include assertions at the relevant points in execution.

This led me to wonder about modern Java syntax, and whether it’s possible to make a similar tool using Java’s annotation features so that programmers don’t have to worry whether a source conversion tool has introduced errors, or trace changes to the source code when working back from a failure report to the broken source.

The answer is yes, and so now the Labrary can offer Java by Contract as a tool for designing Java types by contract. It encodes the parts of the contract as names of methods to invoke that return boolean, failing if the answer is false. A rewrite of the Map interface from page 13 of the paper is given below.

public interface Map {
  /**
   * Does the key k appear in the map?
   */
  @Precondition(name = "nonNullK")
  @Postcondition(name = "inMapIffInKeys")
  boolean has(Object k);

  default boolean nonNullK(Object k) {
    return (k != null);
  }
  default boolean inMapIffInKeys(Object k, Boolean result) {
    return (result == this.keys().has(k));
  }

  /**
   * The value of the map at key k, null if undefined.
   */
  @Precondition(name = "nonNullK")
  @Postcondition(name = "nonNullValueIffHasKey")
  Object item(Object k);

  default boolean nonNullValueIffHasKey(Object k, Object ret) {
    return ((ret != null) == this.has(k));
  }

  /**
   * Associate key k with value v.
   */
  @Precondition(name = "nonNullKeyAndValue")
  @Postcondition(name = "hasK")
  @Postcondition(name = "itemForKIsNowV")
  void put(Object k, Object v);

  default boolean nonNullKeyAndValue(Object k, Object v) {
    return ((k != null) && (v != null));
  }
  default boolean hasK(Object k, Object v, Void result) {
    return this.has(k);
  }
  default boolean itemForKIsNowV(Object k, Object v, Void result) {
    return (this.item(k) == v)
  }

  /**
   * Remove key k and associated value from map.
   */
  @Precondition(name = "nonNullK")
  @Postcondition(name = "doesNotHaveK")
  void prune(Object k);

  default boolean doesNotHaveK(Object k, Void result) {
    return !this.has(k);
  }

  /**
   * The set of all keys in the map.
   */
  @Postcondition(name = "nonNullReturn")
  ReadOnlySet keys();

  default boolean nonNullReturn(ReadOnlySet result) {
    return (result != null);
  }
}

This approach lets us write contract conditions that have full access to the internal state of the objects they are implemented on, so internal invariants can be verified in addition to the properties explored through the interface (by specifying them on the implementing class, not on the declaring interface). As a few pages of the paper are dedicated to the author’s (and the class’s) experience with various design by contract tools and their shortcomings, it’s valuable to explore this space and come up with better approaches. I hope that Java by Contract is a useful addition.

Feldman additionally notes that the Java class library, including the Collections library, violates the Command-Query Separation principle. He even notes that the language does: the construct:

x++;

is both a command (increment x) and a query (what value did x previously have?) in a single expression. The early exercises in his class instruct students to write CQS-satisfying library objects (hence the Map example above), which the subsequent exercises build on.

Feldman left academia the year after this paper was written, and is now at IBM Research. That means the trail of development of this class has run out, and we cannot say how the teaching would have adapted to the subsequent decade of progress in Java.

Cleaner Code

Readers of OOP the easy way will be familiar with the distinction between object-oriented programming and procedural programming. You will have read, in that book, about how what we claim is OOP in the sentence “OOP has failed” is actually procedural programming: imperative code that you could write in Pascal or C, with the word “class” used to introduce modularity.

Here’s an example of procedural-masquerading-as-OOP, from Robert C. Martin’s blog post FP vs. OO List Processing:

void updateHits(World world){
  nextShot:
  for (shot : world.shots) {
    for (klingon : world.klingons) {
      if (distance(shot, klingon) <= type.proximity) {
        world.shots.remove(shot);
        world.explosions.add(new Explosion(shot));
        klingon.hits.add(new Hit(shot));
        break nextShot;
      }
    }
  }
}

The first clue that this is a procedure, not a method, is that it isn’t attached to an object. The first change on the road to object-orientation is to make this a method. Its parameter is an instance of World, so maybe it wants to live there.

public class World {
  //...

  public void updateHits(){
    nextShot:
    for (Shot shot : this.shots) {
      for (Klingon klingon : this.klingons) {
        if (distance(shot, klingon) <= type.getProximity()) {
          this.shots.remove(shot);
          this.explosions.add(new Explosion(shot));
          klingon.hits.add(new Hit(shot));
          break nextShot;
        }
      }
    }
  }
}

The next non-object-oriented feature is this free distance procedure floating about in the global namespace. Let’s give the Shot the responsibility of knowing how its proximity fuze works, and the World the knowledge of where the Klingons are.

public class World {
  //...

  private Set<Klingon> klingonsWithin(Region influence) {
    //...
  }

  public void updateHits(){
    for (Shot shot : this.shots) {
      for (Klingon klingon : this.klingonsWithin(shot.getProximity())) {
        this.shots.remove(shot);
        this.explosions.add(new Explosion(shot));
        klingon.hits.add(new Hit(shot));
      }
    }
  }
}

Cool, we’ve got rid of that spaghetti code label (“That’s the first time I’ve ever been tempted to use one of those” says Martin). Incidentally, we’ve also turned “loop over all shots and all Klingons” to “loop over all shots and nearby Klingons”. The World can maintain an index of the Klingons by location using a k-dimensional tree then searching for nearby Klingons is logarithmic in number of Klingons, not linear.

By the way, was it weird that a Shot would hit whichever Klingon we found first near it, then disappear, without damaging other Klingons? That’s not how Explosions work, I don’t think. As it stands, we now have a related problem: a Shot will disappear n times if it hits n Klingons. I’ll leave that as it is, carry on tidying up, and make a note to ask someone what should really happen when we’ve discovered the correct abstractions. We may want to make removing a Shot an idempotent operation, so that we can damage multiple Klingons and only end up with a Shot being removed once.

There’s a Law of Demeter violation, in that the World knows how a Klingon copes with being hit. This unreasonably couples the implementations of these two classes, so let’s make it our responsibility to tell the Klingon that it was hit.

public class World {
  //...

  private Set<Klingon> klingonsWithin(Region influence) {
    //...
  }

  public void updateHits(){
    for (Shot shot : this.shots) {
      for (Klingon klingon : this.klingonsWithin(shot.getProximity())) {
        this.shots.remove(shot);
        this.explosions.add(new Explosion(shot));
        klingon.hit(shot);
      }
    }
  }
}

No, better idea! Let’s make the Shot hit the Klingon. Also, make the Shot responsible for knowing whether it disappeared (how many episodes of Star Trek are there where photon torpedoes get stuck in the hull of a ship?), and whether/how it explodes. Now we will be in a position to deal with the question we had earlier, because we can ask it in the domain language: “when a Shot might hit multiple Klingons, what happens?”. But I have a new question: does a Shot hit a Klingon, or does a Shot explode and the Explosion hit the Klingon? I hope this starship has a business analyst among its complement!

We end up with this World:

public class World {
  //...

  public void updateHits(){
    for (Shot shot : this.shots) {
      for (Klingon klingon : this.klingonsWithin(shot.getProximity())) {
        shot.hit(klingon);
      }
    }
  }
}

But didn’t I say that the shot understood the workings of its proximity fuze? Maybe it should search the World for nearby targets.

public class World {
  //...

  public void updateHits(){
    for (Shot shot : this.shots) {
      shot.hitNearbyTargets();
    }
  }
}

As described in the book, OOP is not about adding the word “class” to procedural code. It’s a different way of working, in which you think about the entities you need to model to solve your problem, and give them agency. Obviously the idea of “clean code” is subjective, so I leave it to you to decide whether the end state of this method is “cleaner” than the initial state. I’m happy with one fewer loop, no conditions, and no Demeter-breaking coupling. But I’m also happy that the “OO” example is now object-oriented. It’s now looking a lot less like enterprise software, and a lot more like Enterprise software.

Product teams: our products are not our products

Woah, too many products. Let me explain. No, it will take too long, let me summarise.

Sometimes, people running software organisations call their teams “product teams”, and organise them around particular “products”. I do not believe that this is a good idea. Because we typically aren’t making products, we’re solving problems.

The difference is that a product is “done”. If you have a “product team”, they probably have a “definition of done”, and then release software that has satisfied that definition. Even where that’s iterative and incremental, it leads to there being a “product”. The thing that’s live represents as much of the product as has been done.

The implications of there being a “product” that is partially done include optimising for getting more “done”. Particularly, we will prioritise adding new stuff (getting more “done”) over fixing old stuff (shuffling the deckchairs). We will target productish metrics, like number of daily actives and time spent.

Let me propose an alternative: we are not making products, we are solving problems. And, as much out of honesty as job preservation, let me assure you that the problems are very difficult to solve. They are problems in cybernetics, in other words in communication and control in a complex system. The system is composed of three identifiable, interacting subsystems:

  1. The people who had the problem;
  2. The people who are trying to solve the problem;
  3. The software created to present the current understanding of the solution.

In this formulation, we don’t want “amount of product” to be a goal, we want “sufficiency of solution” to be a goal. We accept that the software does not represent the part of the “product” that has been “done”. The software represents our best effort to date at modelling our understanding of the solution as we comprehend it to date.

We therefore accept that adding more stuff (extending the solution) is one approach we could consider, along with fixing old stuff (reflecting new understanding in our work). We accept that introducing the software can itself change the problem, and that more people using it isn’t necessarily a goal: maybe we’ve helped people to understand that they didn’t actually need that problem solved all along.

Now our goals can be more interesting than bushels of software shovelled onto the runtime furnace: they can be about sufficiency of the solution, empowerment of the people who had the problem, and improvements to their quality of life.

Mapping software engineering tools

Despite the theory that everything can be done in software (and of course, anything that can’t be done could in principle be approximated using numerical methods, or fudged using machine learning), software engineering itself, the business of writing software, seems to be full of tools that are accepted as de facto standards but, nonetheless, begrudgingly accepted by many teams. What’s going on? Why, if software is eating the world, hasn’t it yet found an appealing taste for the part of the world that makes software?

Let’s take a look at some examples. Jira is very popular among many people. I found a blog post literally called Why I Love Jira. And yet, other people say that Jira is an anti pattern, a sentiment that gets reasonable levels of community support.

Jenkins is almost certainly the (“market”, though it’s free) leader among continuous delivery tools, a position it has occupied since ousting Hudson, from which it was forked. Again, it’s possible to find people extolling the virtues and people hating on it.

Lastly, for some quantitative input, we can find that according to the Stack Overflow 2018 survey, most respondents (78.9%) love Rust, but most people use JavaScript (69.8%). From this we draw the interesting conclusion that the most popular tool in the programming language realm is not, actually, the one that wins the popularity contest.

So, weird question, why does everybody do this to themselves? And then more specifically, why is your team doing it to yourselves, and what can you do about it?

My hypothesis is that all of these tools succeed because they are highly configurable. I mean, JavaScript is basically a configuration language for Chromium (don’t @ me) to solve/cause your problem. Jira’s workflows are ridiculously configurable, and if Jenkins doesn’t do what you want then you can find a plugin to do it, write a plugin to do it or make a Groovy script that will do it.

This appeals to the desire among software engineers to find generalisations. “Look,” we say, “Jenkins is popular, it can definitely be made to do what we want, so let’s start there and configure it to our needs”.

Let’s take the opposing view for the moment. I’m going to drop the programming language example of JS/Rust, because all programming languages are, roughly speaking, entirely interchangeable. The detail is in the roughness. The argument below still applies, but requires more exposition which will inevitably lead to dissatisfaction that I didn’t cover some weird case. So, for the moment, let’s look at other tools like Jira and Jenkins.

The exact opposing view is that our project is distinct, because it caters to the needs of our customers and their (or these days, probably our) environment, and is understood and worked on by our people with our processes, which is not true for any other project. So rather than pretend that some other tool fits our needs or can be bent into shape, why don’t we build our own?

And, for our examples, building such a tool doesn’t appear to be a big deal. Using the expansive software engineering term “just”, a CD tool is “just” a way to run each step in the deployment pipeline and tell someone when a step fails. A development-tracking tool is “just” a way to list the things the team is or could be working on.

This is more or less a standard “build or buy” question, with just one level of indirection: both building and buying are actually measured in terms of time. How long would it take the team to write a new CD tool, and to maintain it? How long would it take the team to configure Jenkins, and to maintain it?

The answer should be fairly easy to consider. Let’s look at the map:

We are at x, of course. We are a short way from the Path of Parsimony, the happy path along which the generic tools work out of the box. That distance is marked on the map as .

Think about how you would measure for your team. You would consider the expectations of the out-of-the-box tool. You would consider the expectations of your team, and of your project. You would look at how those expectations differ, and try to quantify the result.

This tells you something about the gap between what the tool provides by default and what you need, which will help you quantify the amount of customisation needed (the cost of building a spur out from the Path of Parsimony to x). You can then compare that with the cost of building a tool that supports your position directly (the cost of building your own path, running through x).

But the map also suggests another option: why don’t we move from x closer to the path, and make smaller? Which of our distinct assumptions are incidental and can be abandoned, which are essential and need to be supported, and which are historical and could be revised? Is there a way to change the context so that adopting the popular tool is cheaper?

[Left out of the map but just as important is the related question: has somebody else already charted a different path, and how far are we from that? In other words, is there a different off-the-shelf product which needs less configuration than the one we’ve picked, so the total migration-plus-configuration cost is less than sticking where we are?]

My impression is that these questions tend to get asked once at the start of a project or initiative, then not again until the team is so far away from the Path of Parsimony that they are starting to get tangled and stung by the Weeds of Woe. Teams that change tooling such as their issue trackers or CD pipeline tend to do it once the existing way is already hurting too much, and the route back to the path no longer clear.

More speed, lower velocity

I frequently meet software teams who describe themselves as “high velocity”, they even have graphs coming from Jira to prove it, and yet their ability to ship great software, to delight their customers, or even to attract their customers, doesn’t meet their expectations. A little bit of sleuthing usually discovers the underlying problem.

Firstly, let’s take a look at that word, “velocity”. I, like Kevlin Henney, have a background in Physics, and therefore I agree with him that Velocity is a vector, and has a direction. But “agile” velocity only measures amount of stuff done to the system over time, not the direction in which it takes the system. That story may be “5 points” when measured in terms of heft, but is that five points of increasing existing customer satisfaction? Five points of new capability that will be demoed at next month’s trade show? Five points of attractiveness to prospects in the sales funnel?

Or is it five points of making it harder for a flagship customer to get their work done? Five points of adding thirty-five points of technical debt work later? Five points of integrating the lead engineer’s pet technology?

All of these things look the same in this model, they all look like five points. And that means that for a “high-velocity” (but really low-velocity, high-speed) team, the natural inclination is to jump on it, get it done, and get those five points under their belt and onto the burn down chart. The faster they burn everything down, the better they look.

Some of the presenting symptoms of a high-speed, low-velocity team are listed below. If you recognise these in your team, book yourself in for office hours and we’ll see if we can get you unstuck.

  • “The Business”: othering the rest of the company. The team believes that their responsibility is to build the thing that they were asked for, and “the business” needs to tell them what to build, and to sell it.
  • Work to rule: we build exactly what was asked for, no more, no less. If the tech debt is piling up it’s because “the business” (q.v.) doesn’t give us time to fix it. If we built the wrong thing it’s because “the business” put it at the top of the backlog. If we built the thing wrong it’s because the acceptance criteria weren’t made clear before we started.
  • Nearly done == done: look, we know our rolling average velocity is 20 bushels of software, and we only have 14 furlongs and two femtocandela of software to show at this demo. But look over here! These 12 lumens and 4 millitesla of software are in QA, which is nearly done, so we’ve actually been working really hard. The fact that you can’t use any of that stuff is unimportant.
  • Mini-waterfall: related to work to rule (q.v.), this is the requirement that everyone do their bit of the process in order, so that the software team can optimise for requirements in -> software out and get that sweet velocity up. We don’t want to be doing discovery in engineering, because that means uncertainty, uncertainty means rework, and rework means lower velocity.
  • Punitive estimation: we’re going to rename “ambiguity” to “risk”, and then punish our product owner for giving us risky stories by boosting their estimates to account for the “risk”. Such stories will never get scheduled, because we’ll never be asked to do that one risky thing when we can get ten straightforward things done in what we are saying is the same time.
  • Story per dev: as a team, our goal is to shovel as much software onto the runtime furnace as possible. Therefore we are going to fan out the tasks to every individual. We are each capable of wielding our own shovel, and very rarely do we accidentally hit each other in the face while shovelling.

Figurative Programming and Gloom: the [G]raphical [LOOM]

Donald Knuth is pretty cool. One of the books he wrote that I own and have actually read[*] is Literate Programming, in which he describes (among other things) weaving program text and documentation together in a single narrative.

Two of his books that I own and have sort of dipped into here and there are TeX: the Program, and METAFONT: the Program. These are literate programs, created from webs in which Human text and Computer text are interleaved to tell the story of what the program does.

Human text and computer text, but not images. If you want pictures, you have to carry them around separately. Even though we are highly visual organisms, and many of the programs we produce have significant graphical components, very few programming environments treat images as anything other than external files that can be looked at and maybe previewed. The only programming environment I know of that lets you include images in program source is TempleOS.

I decided to extend the idea of the Literate web to the realm of Figurative Programming. A gloom (graphical loom) web can contain human text, computer text, and image descriptions (e.g. graphviz, plantuml, GLE…) which get included in the human-readable document as figures.

The result is gloom. It’s written in itself, so the easiest way to get started is with the Xcode project at gloomstrap which can extract the proper gloom sources from the gloom web. Alternatively, you can dive in and read the PDF it made about itself.

Because I built gloomstrap first, gloom is really a retelling of that program in a Figurative Programming web, rather than a program that was designed figuratively. Because of that, I don’t really have experience yet of trying to design a system in gloom. My observation was that the class hierarchy I came up with in building gloomstrap didn’t always lend itself to a linear storytelling for inclusion in a web. I expect that were I to have designed it in noweb rather than Xcode, I would have had a different hierarchy or even no classes at all.

Similarly, I didn’t try test-firsting in gloom, and nor did I port the tests that I did write into the web. Instinct tells me that it would be a faff, but I will try it and find out. I think richer expressions of program intention can only be a good thing, and if Figurative Programming is not the way in which that can be done, then at least we will find out something about what to do instead.

[*] Coming up in January’s De Programmatica Ipsum: The Art of _The Art of Computer Programming_, an article about a book that I have _definitely_ read _quite a few bits here and there_ of.

Two books

A member of a mailing list I’m on recently asked: what two books should be on every engineer’s bookshelf? Here’s my answer.

Many software engineers, the ones described toward the end of Code Complete 2, would benefit most from Donald Knuth’s The Art of Computer Programming and Computers and Typesetting. It is truly astounding that one man has contributed so comprehensively to the art of variable-height monitor configurations.

If, to misquote Bill Hicks, “you’ve got yourself a reader”, then my picks are coloured by the fact that I’ve been trying to rehabilitate Object-Oriented Design for the last few years, by re-introducing a couple of concepts that got put aside over the recent decades:

  1. Object orientation; and
  2. Design.

With that in mind, my two recommendations are the early material from that field that I think shows the biggest divergence in thinking. Readers should be asking themselves “are these two authors really writing about the same topic?”, “where is the user of the software system in this book?”, “who are the users of the software system in this book?”, and “do I really need to choose one or other of these models, why not both or bits of both?”

  1. “Object-Oriented Programming: an evolutionary approach” by Brad Cox (there is another edition with Andrew Novobilski as a co-author). Cox’s model is the npm/CPAN model: programmers make objects (“software ICs”), describe their characteristics in a data sheet, and publish them in a catalogue. Integrators choose likely-looking objects from the catalogue and assemble an application out of them.

  2. “Object-Oriented Software Construction” by Bertrand Meyer. Meyer’s model is the “software engineering” model: work out what the system should do, partition that into “classes” based on where the data should naturally live, and design and build those classes. In designing the classes, pay particular attention to the expectations governing how they communicate: the ma as Alan Kay called the gaps between the objects.