Mature Optimization

This comment on why NetNewsWire is fast brings up one of the famous tropes of computer science:

The line between [performance considerations pervading software design] and premature optimization isn’t clearly defined.

If only someone had written a whole paper about premature optimization, we’d have a bit more information. …wait, they did! The idea that premature optimization is the root of all evil comes from Donald Knuth’s Structured Programming with go to Statements. Knuth attributes it to C.A.R. Hoare in The Errors of TeX, though Hoare denied that he had coined the phrase.

Anyway, the pithy phrase “premature optimization is the root of all evil”, which has been interpreted as “optimization before the thing is already running too slow is to be avoided”, actually appears in this context:

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, [they] will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.

Indeed this whole subsection on efficiency opens with Knuth explaining that he does put a lot of effort into optimizing the critical parts of his code.

I now look with an extremely jaundiced eye at every operation in a critical inner loop, seeking to modify my program and data structure […] so that some of the operations can be eliminated. The reasons for this approach are that: a) it doesn’t take long, since the inner loop is short; b) the payoff is real; and c) I can then afford to be less efficinet in the other parts of my programs, which therefore are more readable and more easily written and debugged. Tools are being developed to make this critical-loop identification job easy (see for example [Dan Ingalls, The execution time profile as a programming tool] and [E. H. Satterthwaite, Debugging tools for high level languages]).

So yes, optimize your code, but optimize the bits that benefit from optimization. NetNewsWire is a Mac application, and Apple’s own documentation on improving your app’s performance describe an iterative approach for finding underperforming characteristics (note: not “what is next to optimize”, but “what are users encountering that needs improvement”), making changes, and verifying that the changes led to an improvement:

Plan and implement performance improvements by approaching the problem scientifically:

  1. Gather information about the problems your users are seeing.
  2. Measure your app’s behavior to find the causes of the problems.
  3. Plan one change to improve the situation.
  4. Implement the change.
  5. Observe whether the app’s performance improves.

I doubt that this post will change the “any optimization is the root of all evil” narrative, because there isn’t a similarly-trite epithet for the “optimize the parts that need it” school of thought, but at least I’ve tried.

SICPers podcast episode 4

We’re back to Amiga-Smalltalk today, as the moment when it runs on a real Amiga inches closer. Listen here.

I think I’ve isolated all extraneous sound except the nearby motorway, which I can’t do much about. I hope the experience is better!

SICPers Podcast Episode 3

The latest episode of SICPers, in which I muse on what programming 1980s microcomputers taught me about reading code, is now live. Here’s the podcast RSS feed.

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration. Edsger Dijkstra, “How do we tell truths that might hurt?”

As always, your feedback and suggestions are most welcome.

SICPers podcast episode one

I made a podcast! Full show notes here due to the character limit at podbean.

  • Amiga-Smalltalk project on GitHub

  • Free books on Smalltalk: the three Addison-Wesley books “Smalltalk-80: The Interactive Programming Environment”, “Smalltalk-80: The Language and its Implementation” and “Smalltalk-80: Bits of History, Words of Advice” are mentioned in this podcast, and the second of those is “the blue book” at the centre of the episode.

  • AROS Research Operating System is the Amiga-compatible open source operating system. It can boot on (i386 or m68k) hardware, or run hosted in Linux, FreeBSD, macOS, and Windows.

Please let me know what you think! You can find me on twitter at @iwasleeg, and I gave out my email address in the podcast. You could also comment here!

Errata: I said the Amiga 1000 had 128kB of RAM but it had 256kB, sorry!

Stay on target…

I introduce the kind of customer who needs the Labrary’s advice with the following description:

Your software team was a sight to behold, when it started out. You very quickly got to an MVP, validated its fit with early successes, iterated on the user experience and added the missing features. You hired a few more developers to cover the demand.

Now, things are starting to feel slower. The team insists they’re still continuing apace, but you haven’t kept that initial excitement. Developers are grumbling about technical debt. The backlog keeps growing. Testers aren’t keeping up – despite automation. The initial customers aren’t getting the benefits they first expected, and new customers aren’t being won at the rate you’d like.

The problem was that the way you hit it out of the park worked well in the early days, when you had a green field project and no existing code or customers to support. Now your customers expect all new features and surprising and delightful interactions: but they also expect nothing to change, and certainly not to break. The desired qualities of your software have changed, and so the quality of your software must change.

Plenty of people, typically CTOs and heads of software development, typically at growth scale, identify with this description, so it’s worth digging deeper into how it comes about.

At the early stage, your company has a small team, a vague idea of what the product is, and no customers. Literally anything your engineering team can do will be valuable: it will either be a product that fits the market, or tell you where the market isn’t. Obviously there’s some hand-waving about being able to market and sell whatever it is that your engineering team build, but by and large anything you come up with is somehow useful. You are either defining a new market, in which case all work is market-leading, or entering an established market, in which case your direction is clear. It’s hard to make a wrong decision at this stage, but very easy to stick to one.

And while we all know the horror stories about shipping your prototype to production, it’s actually not a bad plan at this stage. You don’t know what will or won’t work, so getting something out there quickly is exactly the right thing to do. And your developers probably have a base standard of maturity even for prototype projects, so you’ll have version control, some form of testing infrastructure and CI, external libraries for data storage, it won’t be a complete wild west.

Things go, loosely speaking, in one of two directions here. If you fail to find the right customers and the right product, you’re out of money, thanks for playing. If you find the right product for the right people, then congratulations! You get more money, either through revenue or a funding round, and you grow the company. Maybe the programmer you were paying before becomes the CTO, maybe one or two of the contractors you worked with come on as perms, and you get a couple of new people come on in return for an OK salary and the promise of the stock sometime being worth something. One of those people is even a QA!

Of course, the cash injection (particularly if it came as a lump through funding that can be drawn down as necessary) gives you the headroom to do things properly. Technologies are chosen, an architecture is designed (usually just by connecting the technologies with arrows), and an attempt is made to build the new thing, support the old thing, and continue adding new features and differentiate in the market. A key customer demographic is sold the promise of the new system (it being exciting and more capable, at least that is what the roadmap says), takes delivery of the old system (it being ready), then takes up time asking for the new features. You either divert resources from the new system to the old to add the features there, or invent some unholy hybrid where your existing thing makes calls to the new thing for the new features with a load of data consistency glue binding the two together. We’ll call these customers “saps” for now. Also, whether you’ve caught up to your competitors or your competitors to them, you’re now having to maintain an edge.

Let’s take stock here. You have:

  1. Some saps, giving us actual money, on the old platform.
  2. Some hope that things will be easier once everything’s on the new platform.
  3. Pressure to stay ahead of/catch up to the competition in both places.

That’s more work! But it’s OK, you’ve got more people. But where you add to the old system (which pleases your saps) you take away from the new, so tend to favour unintrusive patches rather than deeper changes there. Which makes it harder to understand, and harder to support, which is more work! OK, so hire more people! But now engineering costs more. OK, so sell the original thing (not the new thing, it isn’t ready yet) to a few more customers! But now there are more customers, demanding more features, and more support. OK, so hire more people!

Run through that cycle a few times and you end up in the place I described in the Labrary blurb. You’ve got a big team, filled with capable engineers, working hard, and delivering…not as much as anyone would like. The problem is that working on the software is pulling them away from working for the company. The other problem is that you’re measuring how much the software gets worked on.

On the tyranny of autoincrementing integer primary keys

In designing a relational database schema, many people will automatically create a column id integer primary key for every table, using their database’s automatic increment feature to assign a new value to each row they insert. My assertion is that this choice of primary key should be the last resort, not the first.

A database schema is a design artifact, describing the data we want to store and the relationships between records (rows) in those data. It is also meta-design, because the schema constrains us in designing the queries we use to work with the data. Using the same, minimal-effort primary key type for every table then avoids communicating information about the structure and meaning of the data in that table and imposes irrelevant features in the queries we design.

The fact that people use the name id for this autoincrementing integer field gives away the fact that the primary key is used to identify a row in a database. The primary key for a table should ideally be the minimal subset of relevant information that uniquely identifies an individual record. If you have a single column, say name, with not null and unique constraints, that’s an indicator (though not a cast-iron guarantee) that this column may be the table’s primary key. Sometimes, the primary key can be a tuple of multiple columns. A glyph can be uniquely identified by the tuple (character, font, swash) for example (it can, regardless of whether this is how your particular favourite text system represents it, or whether you think that this is a weird way to store ligatures). The glyphs “(e, Times New Roman Regular 16pt, normal)” and “(ct, Irvin Demibold 24pt, fancy)” are more readily recognisable than the glyphs “146276” and “793651”, even if both are ways to refer to the same data. A music album is identified by the artist and the album name (he says, side-eyeing Led Zeppelin): “A Night at the Opera” is ambiguous while “(Blind Guardian, A Night at the Opera)” is definitely not “(Queen, A Night at the Opera)”.

Use an integer identifier where there is no other way to uniquely identify rows in a table. Note: sometimes there is another, more meaningful way, even where that just means using somebody else’s unique identifier: different copies of the same book will have unique shelfmarks if they’re part of a library, for example. People in an organisation may have an employee number, or a single sign-on user name; though there may be privacy reasons not to use these.

A side-effect of using useful information to identify rows in a database is that it can simplify your queries, because where your foreign keys would otherwise be meaningless numbers, they now actually carry useful information. Here’s an example, from a real application, in which I’m sad to say I designed both the “before” and “after” schemata.

The app is a risk management tool. There are descriptions of risks (I’d like to believe that they all at least have a distinct description but I can’t be sure, so those will use integer id PKs), and for each risk there are people in certain roles who bring particular skills to bear on mitigating the risk. The same role can be applied to more than one risk, the same skill can be applied by more than one role, and one role may apply multiple skills, so there’s a three-way join to work out, for a given risk, what roles and skills are relevant.

The before schema:

create table risk (id integer primary key, description varchar not null, weight integer, severity integer, likelihood integer); -- many fields elided
create table role (id integer primary key, name varchar not null, unique(name)); -- ruh roh
create table skill (id integer primary key, name varchar not null, unique(name)); -- the same anti-pattern twice
create table risk_role_skill (id integer primary key, risk_id integer, role_id integer, skill_id integer, foreign key(risk_id) references risk(id), foreign key(role_id) references role(id), foreign key(skill_id) references skill(id));

In this application, we start by looking at a list of risks then inspect one to see what roles are relevant to mitigating it, and then what skills. So a valid question is: “given a risk, what roles are relevant to it?”

select distinct role.name inner join risk_role_skill on role.id = risk_role_skill.role_id where risk_role_skill.risk_id = ?;

But if we notice the names of each role and skill are unique, then we can surmise that they are sufficient to identify a given role or skill. In fact, the only information we have about roles or skills are the names.

create table risk (id integer primary key, description varchar not null, weight integer, severity integer, likelihood integer); -- many fields elided
create table role (name varchar primary key); -- uhhh...
create table skill (name varchar primary key); -- this still looks weird...
create table risk_role_skill (id integer primary key, risk_id integer, role_name varchar, skill_name varchar, foreign key(risk_id) references risk(id), foreign key(role_name) references role(name), foreign key(skill_name) references skill(name));

Here’s the new query:

select distinct role_name from risk_role_skill where risk_id = ?;

We’ve removed the join completely!

Two remaining points:

  1. There’s literally no information carried in the role and skill tables now, other than their identifying names. Does that mean we need the tables at all? In this case no, but in general we need to think here. How are the names in the join table going to get populated otherwise? If there are a limited set of valid values to choose from, then keeping a table with the range of values and a foreign key constraint to that table may be a good way to express the intent that the column’s content be drawn from that range. As an example, a particular bookstore may have printed, ebook, and audiobook media, so could restrict the medium field in their stock table to one of those values.
  2. Why does the risk_role_skill table have an identifier at all? It is a collection of associations between values, so a row’s content is that row’s identity.

Here’s the after schema:

create table risk (id integer primary key, description varchar, weight integer, severity integer, likelihood integer); -- many fields elided
create table risk_role_skill (risk_id integer, role varchar, skill varchar, foreign key(risk_id) references risk(id), primary key(risk_id, role, skill));

And the after query:

select distinct role from risk_role_skill where risk_id = ?;

Two fewer tables, no joins, altogether a much simpler database to understand and work with.

First, Consider no Harmful.

Yesterday, we observed that the goal of considering the go to statement harmful was so that a programmer could write a correct program and have done with it. We noticed that this is never how computering works: many programs are not even instantaneously correct because they represent an understanding of a domain captured at an earlier time, before the context was altered by both external changes and the introduction of the software itself.

Today, let’s look at the benefits of removing the go to statement. Dijkstra again:

My second remark is that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.

This makes sense! Our source code is a static model of our software system, which is itself (we hope) a model of a problem that somebody has along with tools to help with solving that problem. But our software system is a dynamic actor that absorbs, transforms, and emits data, reacting to and generating events in communication with other human and non-human actors. We need to ensure that the dynamic behaviour is evident in the static model, so that we can “reason about” the development of the system. How does Dijkstra’s removal of go to help us achieve that?

Let us now consider how we can characterize the progress of a process. (You may think about this question in a very concrete manner: suppose that a process, considered as a time succession of actions, is stopped after an arbitrary action, what data do we have to fix in order that we can redo the process until the very same point?) If the program text is a pure concatenation of, say, assignment statements (for the purpose of this discussion regarded as the descriptions of single actions) it is sufficient to point in the program text to a point between two successive action descriptions. (In the absence of go to statements I can permit myself the syntactic ambiguity in the last three words of the previous sentence: if we parse them as “successive (action descriptions)” we mean successive in text space; if we parse as “(successive action) descriptions” we mean successive in time.) Let us call such a pointer to a suitable place in the text a “textual index.”

[…]

Why do we need such independent coordinates? The reason is – and this seems to be inherent to sequential processes – that we can interpret the value of a variable only with respect to the progress of the process. If we wish to count the number, n say, of people in an initially empty room, we can achieve this by increasing n by one whenever we see someone entering the room. In the in-between moment that we have observed someone entering the room but have not yet performed the subsequent increase of n, its value equals the number of people in the room minus one!

So the value of go to-less programming is that I can start at the entry point, and track every change in the system between there and the point of interest. But I only need to do that once (well, once for each different condition, loop, procedure code path, etc.) and then I can write down “at this index, these things have happened”. Conversely, I can start with the known state at a known point, and run the program counter backwards, undoing the changes I observe (obviously encountering problems if any of those are assignments). With go to statements present, I cannot know the history of how the program counter came to be here, so I can’t make confident statements about the dynamic evolution of the system.

This isn’t the only way to ensure that we understand a software system’s dynamic behaviour, which is lucky because it’s not a particularly good one. In today’s parlance, it “doesn’t scale”. Imagine being given the bug report “I clicked in the timeline on a YouTube video during an advert and the comments disappeared”, and trying to build a view of the stateful evolution of the entire YouTube system (or even just the browser, if you like, and if it turns out you don’t need the rest of the information) between main() and the location of the program counter where the bug emerged. Even if we pretend that a browser isn’t multithreaded, you would not have a good time.

Another approach is to encapsulate parts of the program, so that the amount we need to comprehend in one go is smaller. When you do that, you don’t need to worry about where the global program counter is or how it got there. Donald Knuth demonstrated this in Structured Programming with go to Statements, and went on to say that removing all instances of go to is solving the wrong problem:

One thing we haven’t spelled out clearly, however, is what makes some go to’s bad and others acceptable. The reason is that we’ve really been directing our attention to the wrong issue, to the objective question of go to elimination instead of the important subjective question of program structure.

In the words of John Brown [here, Knuth cites an unpublished note], “The act of focusing our mightiest intellectual resources on the elusive goal of go to-less programs has helped us get our minds off all those really tough and possibly unresolvable problems and issues with which today’s professional programmer would otherwise have to grapple.”

Much has been written on structured programming, procedural programming, object-oriented programming, and functional programming, which all have the same goal: separate a program into “a thing which uses this little bit of software, according to its contract” and “the thing that you would like to believe implements this contract”. OOP and FP additionally make explicit the isolation of state changes, so that you don’t need to know the whole value of the computer’s memory to assert conformance to the contract. Instead, you just need to know the value of the little bit of memory in the fake standalone computer that runs this one object, or this one function (or indeed model the behaviour of the object or function without reference to computer details like memory).

Use or otherwise of go to statements in a thoughtfully-designed (I admit that statement opens a can of worms) is orthogonal to understanding the behaviour of the program. Let me type part of an Array class based on a linked list directly into my blog editor:

Public Class Array Of ElementType
  Private entries As LinkedList(Of ElementType)
  Public Function Count() As Integer
    Dim list As LinkedList(Of ElementType) = entries
    Count = 0
  nonempty:
    If list.IsEmpty() Then GoTo empty
    Count = Count + 1
    list = list.Next()
    GoTo nonempty
  empty:
    Exit Function
  End Function

  Public Function At(index As Integer) As ElementType
    Dim cursor As Integer = index
    Dim list As LinkedList(Of ElementType) = entries
  next:
    If list.IsEmpty() Then Err.Raise("Index Out of Bounds Error")
    If cursor = 0 Then Return list.Element()
    list = list.Next()
    cursor = cursor - 1
    GoTo next
  End Function
End Class

While this code sample uses go to statements, I would suggest it’s possible to explore the assertion “objects of class Array satisfy the contract for an array” without too much difficulty. As such, the behaviour of the program anywhere that uses arrays is simplified by the assumption “Array behaves according to the contract”, and the behaviour anywhere else is simplified by ignoring the Array code entirely.

Whatever harm the go to statement caused, it was not as much harm as trying to define a “correct” program by understanding all of the ways in which the program counter arrived at the current instruction.