Episode 59: The NATO Software Engineering conferences, part 3

We’re closing in on the end of the 1968 conference report, in this section discussion software service, maintenance, and other “special topics” including educating software engineers, and whether it’s reasonable to pay for software at all. Along the way, we discover that there’s no silver bullet 18 years before Fred Brooks told us; decide whether 1968 software needed more blockchain; and find the horrific truth behind beta testing.

The episode is supported by members of the Chiron Codex Patreon (use this gift link for your first month free), so please do join the community or hit the Ko-Fi button to make a one-off donation.

Transcript

Hello, and welcome to episode 59 of the Structure and Interpretation of Computer Programmers podcast. I’m Graham Lee, and this episode is the third part of a mini-series discussing the 1968 and 1969 NATO Conferences on Software Engineering. It’s sponsored by the members of my Patreon, which can include you.

We’re up to sections 6 and 7 of the report, which cover software service, that is, the business of satisfying customers by delivering software, particularly maintenance, and special topics, which don’t fit into the subjects of the three
workgroups. Service was one of the workgroups, with the others being design and production, the topics of the previous episode of this podcast.

We join the service section with a subsection that has the provocative title, The Virtue of Realistic Goals. Essentially, nobody at the conference blames the programmers for failed projects. Either the customers or the users had unrealistic expectations, or the manufacturers made unrealistic claims about their system’s capabilities.

Klaus Samuelsson is particularly harsh in blaming the users, saying it’s their fault for accepting a system before they’ve satisfied themselves of its correctness. Brian Randall, who edited the conference report, agrees, saying, “the users are as much to blame for premature acceptance of systems as the manufacturers for premature release”. But what choice do customers have?

Indeed, even in this day and age, we only have a partial solution to this problem. There’s free software, where you can inspect the code, or get someone else to do it, and know it’s correct, then choose whether you pay the creators, and of course many people don’t. Or there’s shareware, where you can use the software and check it roughly works the way you want, or at least the trial features that you have access to do, and then pay to unlock the rest, which you haven’t been able to try.

Every other distribution mechanism or purchase model for software is either buy now or regret later, or buy now and hope that every back-end update keeps the bits you need working the way that you need them to work.

This, by the way, also came up in the report, where d’Agapeyeff says it’s generally a problem that we haven’t worked out how to make sure that any release of software is a strict subset of later releases. In other words, that newer versions can do more than previous versions, and do all of the existing things in the same way that the previous versions did.

I wrote a blog post back in 2018, which is linked in the show notes, about the way in which semantic version encodes the intention to introduce breaking changes between versions. I proposed meaningful versioning in which the smallest increment number, the z in x.y.z, applies to additional features, the middle increment, the y, applies to behaviour-preserving refactorings, and the largest increment, the x, to bug fixes. There’s no room for backwards incompatible changes in this scheme, unlike in semantic versioning. If you want to do that, you should release a different product.

This idea is related to the discussion of the open-closed principle that we had in episode 58. The whole software system should, according to d’Agapeyeff, be open to extension, you add capabilities in subsequent issues, the word he uses for releases of the software, and it should be closed to modification. The bit you’ve already released ought to work well enough
that you don’t need to change it. I should insert a placeholder call-out here to a subsequent episode of this podcast that I haven’t planned or recorded yet on Bertrand Meyer’s book, Object-Oriented Software Construction, because the idea of open and closed is all over this 1968 conference.

Next up are discussions on the state of the initial release of software and the frequency and nature of subsequent releases. The general sentiment is that the initial release should work well, even if it doesn’t have all of the planned capabilities. Growing outward from a high-quality core, preferably by adapting a modular design, is better than releasing a poor-quality version of everything.

However, François Genuys points out that people need to access pre-release versions of the software for training. On the one hand, I thought that this could refer to the interim systems that the conference discussed in the design and production sections, which we talked about in episode 58 of the podcast, where some of the components are real and the others are simulations. In this way, customers or support staff train with working systems, just not completely working. In the same way that the subsequent initial release is a high-quality subset of the total system behavior, so the pre-release training versions would be high-quality subsets of the initial system behavior.

Then I remembered that Genuys was at IBM, and that the idea of alpha and beta tests supposedly come from IBM, so I wondered about the timing of that. Wikipedia suggests that the terminology came from the 1950s, but it offers two citations, neither of which actually back up that claim. Jeff Atwood, at his coding horror blog, corroborates the IBM origin of these terms, but he does so by citing the Wikipedia article. Everybody else either cites the Wikipedia article, or plagiarizes it, or plagiarizes Jeff’s post, or plagiarizes both of them. So I think we’re stuck here. Unless anyone who’s listening has a contemporary source, in which case please do send it in and let me know in the comments, we don’t actually know where the terms alpha and beta testing come from.

Not that it’s important. The idea that you have pre-release alpha or beta tests doesn’t mean that you let people use the versions that fail those tests, nor necessarily that your testing criteria at those times are any less stringent than those for the initial release or for subsequent system releases. Speaking of the subsequent system releases, there’s a bit of tension over how frequently these should appear. Ashiroplar sums up the tension well. More releases means more churn, but it also means getting corrections into the hands of customers sooner.

Generally, people at the conference are in favour of fast corrections and infrequent major upheaval updates. This puts me in mind of my experience managing Debian systems, where it’s easy to accept the in-release updates, that is apt-get update and apt-get upgrade, without worrying that anything will break. And infrequently, you have to cross your fingers and do a dist-upgrade.

Or even the times when I’ve managed Solaris systems, you take the stream of minor updates forever and never reinstall the major version of the operating environment. It also calls to mind Microsoft’s Patch Tuesday approach, releasing interim updates at predictable times, so that administrators can prepare to deal with their installation.

This section on release frequency ends with an extract from Control Data Corporation’s H.R. Gillette on defining metrics for release quality. Here’s the quote.

“Below, I have written a copy of one of the paragraphs which has been put into a product objectives document. We struggled a great deal to define measurable objectives in the document, and this is an example. The numbers used do have relevance historically, and that is all that need be said about them. Finally, our objectives may not have been high enough in this particular area. We tried to push our luck while at the same time being realistic.

The total number of unique bugs reported for all releases in one year on ECS SCOPE will not be greater than the number given by the following formula. Number of bugs is less than or equal to 500 minus 45 divided by in brackets I plus 10 close brackets, where I is the number of installations using ECS SCOPE. 85% of the reported PSRs will be corrected within 30 days, and 50% of these will be corrected within 15 days. All PSRs will be corrected within 60 days.”

In that quote, a PSR is a bug report. What this is trying to get at is laudable. There won’t be many bugs, even if we have lots of different customers, and we’ll fix the bugs quickly. Unfortunately, measuring bugs reported is one of those meaningless metrics that’s easy to parody, and indeed Scott Adams covered this one in a Dilbert comic back in the 1990s, when the software team write themselves a minivan.

You game this metric by choosing quiescent customers who don’t report bugs, or by including obvious defects like typos that are quick to fix, so that you achieve your turnaround time goals, and customers don’t have time to fill out their PSR forms with more meaningful bug reports.

More recently, software engineers have finally read a 2001 article from Dr. Dobbs’ journal, and are shifting tests left, incorporating test design and implementation throughout the development process, and particularly from the beginning. This brings me on to the money quote from this section of the conference report, from Alick Glennie, who’s the inventor of AutoCode, an early family of programming languages and compilers, which may even have included the world’s first compiler, though that is a contested claim.

The quote is, “Software manufacturers should desist from using customers as their means of testing systems.”

Okay, me speaking again now. It’s better to see customer bug reports as a feedback mechanism than as a goal. You’ll always get them, as long as you have customers, and you have a reporting channel. But you don’t need to optimise for or control them. If you get a lot of bug reports for some module, that might indicate quality issues, or that it’s really popular. If you get not many bug reports for some module, that might indicate high quality, or that nobody uses it. And perhaps the reason that nobody uses it is because it’s too buggy.

As with the rest of this section of the report, the important thing to worry about is how you get working software delivered to your customers. Back in 2001, some people even said that this should be our highest priority.

I originally skim read the next sections, which were on replication and distribution of software, because they’re problems that don’t exist anymore. You put your software on the internet and distribute it for somewhere between zero and near zero cost, or you don’t distribute it at all and let people access it on your computer via their browsers. Gone are the days of the hologrammatic Windows XP CD, with its licence key that’s long enough to identify each atom in the universe uniquely.

But even back in 1968, duplicating software was so much cheaper than duplicating hardware, that Brian Randell recognised that economics is a reason that software quality was given less consideration by manufacturers than hardware quality. It’s that much more expensive to fix a hardware problem in the field.

And I have a vague recollection that at some point, Sun Microsystems swapped the SCSI IDs for recognising which drive is addressed by which number zero and three between machine architectures. And that might have been between the Motorola 68000
and the SPARC. And while it was possible for customers to switch some jumpers around to get back to the original behaviour, they did offer to send field engineers to customers to make the changes for them. But I can’t find documentary evidence for them doing so, so maybe it was just a scary story told at sysadmin camp when I was a young initiate.

A particular problem that distributing software on physical media had was ensuring correctness. If a tape even had one bit flipped or removed, it was useless and potentially dangerous if the software still ran, even though it was incorrect. In principle, most software that’s distributed electronically now has all sorts of digital signatures and checksums. And in practice, that’s wisely hidden from the customer. So you kind of have to trust the vendor that everything checks out.

Moving on to maintenance. And again, the blame for getting it wrong falls squarely with the customer. Each maintenance depends upon the proper recording of programming errors by the user and upon the quality of such records, says Mr. H. Köhler of AEG Telefunken. We know this not to be reliable, and so now we use telemetry and automated diagnostics to get the information we need in the form we need. It turns out that each maintenance depends upon the proper recording of programming errors by the programmer, and upon the quality of such records, but also upon the programmers choosing to act upon the records.

And also to a large extent, it depends upon the error reports actually going to the right place. In the 1960s, this would have been a simple problem. The customer blamed IBM for any bug. IBM blamed the customer’s local applications or modifications. Eventually, one or other of them either fixed a problem or worked around it. IBM didn’t unbundle their software from their hardware until 1969. We’ll see this discussion play out in real time later in the episode.

And there were only a handful of ISVs in the 1960s, mostly concentrating on filling in gaps in hardware vendor offerings. For example, Ken Kolence, one of the attendees at the conference, founded a company called Boole & Babbage, which created
profiling software.

Once you got more integrated software from more vendors on a computer, it became harder to decide who to blame for a problem. Is it Valve’s fault that Steam crashes on Windows or Microsoft’s? Or is it the fault of some third-party vendor because the customer installed a haxie that loads code into the app? In the 2000s, I worked for an antivirus software company, and we all got all of the bug reports from all of the software. Either the customer blamed the antivirus company because they didn’t like our software, or the software vendor blamed the antivirus company because they saw our kernel extension was loaded and used that as an excuse not to investigate their own customers’ problems.

I therefore spent a lot of the time as I was on support demonstrating that other people’s software crashed without the antivirus software installed, in the same way that it crashed with the antivirus software installed, so that I could send the crash report back to the company whose software crashed. In one instance, we had a report that Excel on Mac OS X crashed with our antivirus installed. Eventually, I, along with two people from Apple, a file systems engineer and a technical support professional, showed that Excel crashed on Mac OS X without antivirus installed because a rarely used code path in the spreadsheet application caused it to try to use an unimplemented feature in the HFS+ file system. Now, is it Microsoft’s problem that Excel does something that doesn’t work, or is it Apple’s problem that they exposed an unimplemented API? Thankfully, answering that question became somebody else’s problem about 20 years ago.

Just before we move on to the section on special topics, there are a couple of interesting talking points in part of the report that’s on acceptance testing. One is a suggestion from James Babcock, who ran a time-sharing services company, that we need software meters analogous to the present hardware meters so that our rental costs can be adjusted to allow for time lost through software errors as well as hardware errors. I would certainly welcome the rebates I’d get if some of the cloud computing services I use multiplied their subscription fee by their uptime ratio.

I’m going to mention Brad Cox’s 1996 book, Super Distribution, here as well. He invented a pay per use model for software pricing, and it’s a transitive pricing model. I pay for the application software I use as I use it. The application vendors pay for the library calls their software makes whenever it makes them, and so on. This model required special hardware to do the digital rights management in the 1990s, but I think now it could actually be a reasonable application
of a smart contracts blockchain like Ethereum.

In the Super Distribution model, I would automatically get money off during an outage, because I wouldn’t be able to use the software at all, so I wouldn’t be able to get charged for anything.

The other point on acceptance testing is a difference of opinion between Edsger Dijkstra, “testing is a very inefficient way of convincing oneself of the correctness of a program”, and Mr A. I. Llewellyn from the British Government’s Ministry of Technology: “Testing is one of the foundations of all scientific enterprise. In fact, it would be good to have independent tests of system function and performance published.”

This is the advert break. It starts now.

This episode is brought to you by me, Graham Lee. But really, by you.
Chiron Codex is a community of people who are learning how to become better software engineers
by adopting AI augmentation in a thoughtful way. We aren’t outsourcing our understanding to coding
assistants like Claude or Codex, but becoming software engineering centaurs by using AI tools
to improve our knowledge and the quality of our work. Join the community over on Patreon to find out
about interaction patterns that improve your work with AI coding tools. Running LLMs for software
development locally, discussions of recent research in the field, and more. If you’re a software
engineer who’s interested in the promise of AI tools, but sceptical about handing your skills over
to the computer, this is the community for you. Go to patreon.com slash Chiron Codex, that’s C-H-I-R-O-N-C-O-D-E-X,
now for more information and to join. Use the gift link in the show notes to get your first
month of insider access completely free. Alternatively, you can show your appreciation
by donating at Ko-fi, that’s ko-fi.com slash Chiron Codex, K-O-F-I dot com. Direct support
by my audience is the only revenue I get for my work as a software engineer and communicator,
so your support really means a lot to me and makes it possible for me to produce this podcast.
Thank you so much.

That was the advert break. It’s over now.

OK, we’re on to the section on special topics, which opens with software, the state of the art. We’ve actually already encountered most of the discussion points here, particularly the idea that most of software works very well, that people are doing what they need to at a much lower cost than ever before, and it’s only the edges of the field’s capability, both in terms of scale and novelty, for example, time sharing, where the problems arise. These were all in the executive overview at the start of the report that we discussed in episode 57.

This idea that people are doing what they need to do at a lower cost than ever before, though, is hard to square with Robert McClure’s assertion that “it seems almost automatic that software is never produced on time, never meets specification, and always exceeds its estimated cost.” He describes the causes as coming from “the refusal of industry to re-engineer last year’s model, from the inability of industry to allow personnel to accumulate applicable experience, and from emotional management.”

And that certainly all aligns with my own experience, having seen the phrase “rewritten from the ground up” used as if it’s a good thing, having experienced layoffs and limited career development prospects that limit retention, and management fads that come and go like high street clothing collections. However, this position is out of alignment with the rest of the conference report.

I think we see here the division that Thomas Haigh identified between the industry software engineers who think things are going well and would like them to go a bit better, and the academics who think that all of industry software is on a hiding to nothing until it adopts current academic practices.

Whatever the magnitude of the problem, somewhere between 1 and 100% of software projects running into difficulties, possible solutions were discussed. Ascher Opler suggested two approaches, either stealth mode, where the manufacturer doesn’t say anything about the capabilities of the system until they finish developing it, or loose promises mode, where they say what they’re doing but give a really long lead time and be honest about the uncertainty involved.

The subsequent third way that modern software engineers use is the lean startup approach, where the manufacturer says what it’s doing and then gets early feedback before it even starts building anything. It avoids both the risks of stealth mode, which are building something that nobody wants, and loose promises mode, where the risk is getting resumpt by someone who implements your plans faster than you do.

Going back to the theme of things that were subsequently rediscovered by somebody else but that already existed at the time of the NATO conference, Doug Ross is the only person in the conference report who actually refers to the contemporary state of affairs as a crisis. He warns against people who promise a breakthrough, a mere 18 years before Fred Brooks agreed that
there is no silver bullet in software engineering.

A second special topic in section 7 of the report is education, and Alan Perlis sets out criteria to define a curriculum in software engineering education. It’s useful to note his point that this is distinct from computer science education, as, according to him, “most of the computer science programs are producing faculty for other computer science departments”. This is actually a deliberate choice that the curriculum committee at the ACM made, choosing to focus on computer science as an academic and mathematical pursuit, rather than on software as a practical industry. But Perlis is damning in his assessment. “You have to look hard in a computer science department to find anything that is dedicated to utility as a goal”. Ouch.

Dijkstra produces the money quote for section 7 in this discussion on education. “You are right in saying a lot of systems really work, that is our glimmer of hope. But there is a profound difference between observing that apparently some people are able to do something, and being able to teach that ability”.

We could imagine this as being his way of digging in when the crisis narrative got debunked. Yes, everybody can make working software, but maybe they’re doing it wrong anyway because they don’t do it the way that I like.

One of the questions that managed to keep op-ed writers employed for decades after this conference was the extent to which software engineering and computer engineering share commonalities with, well, with engineering. This is a topic that I read deeply for my PhD thesis background, so I could go into way too much depth here. But suffice it to say that people are still discussing whether software engineers should be licensed engineers. And in fact, there are some places where they do need to be, and so in those places, people who write software just don’t use the word engineer.

One argument that was made in 2002, and seems particularly weak, is that engineering licensing would cover non-software disciplines, and it would be unfair to stop someone practicing software just because they don’t understand fluid dynamics.

The final topic we’re going to consider in this episode is the question of software pricing, i.e. whether software should be unbundled from hardware and sold as a separate product. We all know how this played out. Software was unbundled from hardware, became a huge economic engine in its own right, and even ended up eating its own tail when cloud computing changed the economic calculus so that hardware needs are factored into the software costs.

It seems like most of the attendees at the conference were in favour of software pricing, but the section in the report is presented neutrally with equal weight given to both sides. Tellingly, this is also the only section in the report that uses the Chatham House rule, where no quotes are attributed to named speakers. So, while we do know that most people at the conference were in favour of separate software pricing, we don’t know who or how many people were making the argument against.

If I had to guess, I would say that IBM representatives were against unbundling software, and everybody else was in favour of it, and that IBM lost the argument very shortly after. This was partly the work of a company called ADR, who have the distinction of being the first company ever to file a software patent. ADR brought an anti-monopoly case against IBM, saying that providing their software for free was stifling the market. This is, of course, an argument that came up again in the 1990s, with Microsoft bundling their browser and media player with Windows, and again very recently with the European Union’s Digital Markets Act and its definition of some services provided by large, typically American companies as gatekeepers. But let me know what you think. And also, your perspectives as people who rely on software being a commercial commodity, what do you think of the way that software is priced? You can email me at grahamlee at acm.org, or you can comment on this post, the post for this episode, over at sicpers.info slash podcast. That’s s-i-c-p-e-r-s dot info slash podcast.

The next episode will conclude the reading of the 1968 conference report by covering the keynote address and the working papers that are included in the report. Only a fraction of the submitted papers actually appear in the report. There’s no full proceedings, so a lot of the information that went into the conference is sadly lost forever, unless some attendee happened to file away their copies of the papers that they received. Until the next time, take care, and I’ll talk to you soon.

Episode 59: The NATO Software Engineering conferences, part 3

Links

Transcript

About Graham

Leave a Reply

OOP the Easy Way

APPropriate Behaviour

APPosite Concerns

Support This Site

FSF