Episode 58: The NATO Software Engineering conferences, part 2

This episode digs into the problems of software design and software production as perceived in the 1968 conference, most urgently: just what are software design and software production? The episode is supported by members of the Chiron Codex Patreon (use this gift link for your first month free), so please do join the community or hit the Ko-Fi button to make a one-off donation.

Transcript

Hello and welcome to episode 58 of the Structure and Interpretation of Computer Programmers podcast. I’m Graham Lee and this episode is the second part of a mini-series discussing the 1968 and 1969 NATO conferences. It’s sponsored by the members of my Patreon, which could include you.

In part one I reviewed the context of the first NATO software engineering conference in Garmisch which is in Bavaria in Germany, and approached the end of section three of the conference report with no clear idea—because the people in the room hadn’t agreed on one—which activities comprise design of software and which comprise the production of software.

Well, as this episode focuses on sections four and five which are about design and production it’s time for me to confidently tell you that I still don’t know what those terms mean. There’s a large extract in section 3.2 from a Mr. J. Harr of Bell Labs, which is the place in which a year later Unix would be invented. In his paper “the design and production of real-time software for Electronic Switching Systems, for which application a year later Unix would be invented.

In this paper the design process covers everything from specifying the overall hardware software system through division of the software into precisely defined blocks with defined interfaces and data structures, the compilation, simulation and testing of those blocks, integration into a software product and final load testing.

Of interest is that about 13% of the effort on the ESS project is on assemblers, compilers and translation. What they call translation is now what we would call a compiler for a high-level language. The 1968 then-compiler being a combination of a translator from a lower level language like Fortran to the machine language maybe with some helpful macros and also link editing or patching abilities that allow different program blocks to reference each other. So here was a project noteworthy for inclusion in the report (but hopefully that noteworthiness came from the fact that it was an ordinary project) where a significant number of staff and amount of effort were focused on creating the tools that create the product.

So because every activity in that report is a design activity I’ve tried to skip through to one of the working papers in section 9, the Classification of Subject Matter from the software product working group or production working group, because that paper categorizes production activities. These include training, indoctrination in conventions, determining and imposing productivity metrics as improvement, acquiring support staff and facilities, setting a budget, hiring staff, negotiating with customers, and design activities like specification, designing software units, creating test plans.

An amusing point from the Classification of Subject Matter is the inclusion of the entry “control of innovation and reinvention” in the list which, on the one hand, makes me think of the choose boring technology article, but on the other leads me to picture a manager who’s incensed that their staff has blasphemed by using their noodles in making the software.

If you were to press me to produce definitions of software design and software production (which implicitly you are by listening to a podcast in which I claim to discuss those topics) software design as a phrase used in the 1968 conference is the activity of understanding the system requirements and producing a collection of computer instructions that satisfy those requirements. Software production is doing that in a way that the customer wants to pay for, that you can afford, that the customer wants to use the output of, and is capable of using the output of, and preferably that the customer is happy with.

Okay, so pretending now that we know what software design is let’s look at the section of report on software design. It’s here that we find what I currently believe to be the earliest reference to the architect of real world buildings Christopher Alexander, he of the pattern language fame, in the software field, as Peter Naur describes software designers analogous to civil engineering or architecture in large heterogeneous environments.

Alexander d’Agapeyeff, who we met last time wishing we could do more to teach the design and testing of testable software, argued for designing a machine that was capable of running a high-level intermediate code translatable from high-level languages which is something that we might now recognise as Pascal p-code, the Lisp machine, JVM bytecode and so on.

A historian would probably find fault with my applying such modern ideas to these statements, the retroactive claim that conference attendees were prescient in defining the future with intermediate languages, OOP (as we shall see shortly), and TDD (as I expect to encounter multiple times in this series), and then the implication that the rest of the industry was too ignorant or too stubborn to notice for a number of decades. Certainly, certainly the NATO conferences have achieved a near mythical status now that they probably just didn’t get during the 1970s, and the reports are both incomplete and focused on points the editors considered interesting, whether because they were representative or provocative, but without telling us which. So the observations might not have landed with contemporary readers, and in fact people who were in the room at the conference may not have noticed some of these statements at all until they were typed up into the report.

Nonetheless, hardware that runs an intermediate language is a natural extension of the contemporary goals of closing the gap between large system design and implementation, so I feel like I’m on fairly stable ground making the association here. Similarly, two quotes seem to presage Bertrand Mayer’s open-closed principle with some precision. Letellier says a software package must be thought of as open-ended, and Gillette says generality is essential to satisfy the requirement for extensibility, and that the key to production success of any module construct is the rigid specification of the interfaces. In other words, you’re not allowed to modify the interfaces, they’re closed, but you do need to design the modules to be extensible, they’re open.

Anyway, back to the spooky foreshadowing of bytecode, d’Agapeyeff gives four reasons for intermediate languages to be executed on the computer. They are to increase the runtime checks a computer can make, thereby increasing program safety, provide more development facilities, increase portability, and to allow all communication with the programmer to be in source language. This last point is now also achieved for compiled languages using debugging data, supported by formats like STABS, COFF, and DWARF, which were all invented and introduced in the 1980s.

As projects, and I’m using my scare quote fingers here, as projects “scale”, which didn’t just mean the size of the software went up, it also referred to the expectations of the customers, or the situations in which they used the software, which might grow beyond those foreseen by the designers. As projects scale, application software might grow beyond the expressiveness of the design language used to describe it. Kolence blamed this on a lack of universal notation for software, which would do for programming what George Boole’s notation of logic does for electronic hardware design.

He suggested that Ken Iverson’s notation is the solution. Ken Iverson’s notation is the APL programming language, which actually grew out of a specification language used for a formal specification of parts of IBM’s System 360, among other things. APL was very popular through the 70s and 80s, and still has a hardcore following, but it never displaced the Algol-derived languages as a universal lingua franca for expressing computation. And it’s the context of a specification language in which to view the suggestion here. Not necessarily as an implementation language, not everyone who used APL even had an interpreter that would run on their computer, but as a specification language, in a continuity that includes Z, TLA, and the Unified Modelling Language, as other examples.

Dijkstra submitted a paper that proposes a hierarchical, or at least a layered design approach, which led to the discussion over the extent to which a specification should be complete. Willem van der Poel says that a complete specification is a working solution, i.e. if you describe your problem in enough detail you end up solving the problem. But Dijkstra says that an incomplete specification allows for useful flexibility. There’s an analogy here with the concept of undefined behaviour in a C programming language, which allows for a portable specification of the language that behaves in whatever way is most efficient on the host hardware. And, despite what detractors claim, a C compiler has never led to demons flying out of a programmer’s nostrils.

But, what does completeness mean? If a design can be complete, we need a definition of a complete design or an incomplete design. And, the idea of a logical closure was suggested by analogy to group theory, where a group is complete if it has a certain collection of operations. So, for example, a system that lets you write files and doesn’t have a facility to read them is clearly incomplete, because you have an operation that doesn’t have a corresponding logical extension operation.
But then, what about one that can read and write files but can’t delete them? Is that complete? So, while the idea of closure was introduced, it wasn’t very deeply pursued, or at least not in the conference report.

A particular problem with software designs is the issue of detecting and handling errors. Indeed, there is sentiment in the report that if you aren’t considering resilience and fault tolerance in your design, then you aren’t actually doing design. What makes errors difficult to design is that they tend to cut across all of your nice layers and modules, so that a failure of the storage drum to be ready means that you can’t complete a tax calculation.

To consider a more modern example, think about the Java null pointer exception. Java doesn’t even have pointers, and yet here we are, dealing with an error that is caused by one.

So, a lot of discussion took place on the directionality of design, whether that be top-down, meaning to start with the interface and requirements and work towards implementation on the computer, or bottom-up, meaning to start with reusable modules and combine them until you satisfy the requirements, or whether to do something else, because both top-down and bottom-up design have risks.

Your top-down design might paint you into a corner where you need to implement a module that you can’t actually build. Your bottom-up design might create a lot of reusable modules that don’t actually have any use at all in your system.

Naur describes a concept called design trees, where you build dependency graphs of the decisions that influence other decisions, so that you know which problems you need to solve first. Ed David, another employee at Bell Labs, suggested a skeletal coding approach, in which you actually build the whole system first, admittedly using stubs, simulations, and other doubles for modules that aren’t yet complete. Then you explore the aptness of that skeleton to your needs, tweak it, and progressively fill in the details.

Going back to our retroactive futurology, this is a process that eventually became popular as Boehm’s spiral model, which we mentioned in the previous episode, and the Rapid Application Development movement of the 1980s and 1990s. This iterative approach also addresses one of the big design drawbacks discussed in the report, which is that users and customers can’t clearly express what they want, but they can tell you when you’ve got it wrong.

Speaking of communication, Conway’s Law, which was brand new, having been published in April 68, makes several special guest appearances, as does the obvious corollary. If your organisation is going to make software that models this org chart, set up your org chart so that it models the software that you want to build.

This is the advert break. It starts now.

This episode is brought to you by me, Graham Lee.
But really, by you.

Chiron Codex is a community of people who are learning how to become better software engineers by adopting AI augmentation in a thoughtful way. We aren’t outsourcing our understanding to coding assistants like Claude or Codex, but becoming software engineering centaurs by using AI tools to improve our knowledge and the quality of our work.

Join the community over on Patreon to find out about interaction patterns that improve your work with AI coding tools, running LLMs for software development locally, discussions of recent research in the field, and more.

If you’re a software engineer who’s interested in the promise of AI tools, but sceptical about handing your skills over to the computer, this is the community for you. Go to patreon.com slash chironcodex, that’s C-H-I-R-O-N-C-O-D-E-X, now for more information and to join. Use the gift link in the show notes to get your first month of insider access completely free.

Alternatively, you can show your appreciation by donating at Ko-fi, that’s ko-fi.com slash chironcodex, K-O-F-I dot com. Direct support by my audience is the only revenue I get for my work as a software engineer and communicator, so your support really means a lot to me and makes it possible for me to produce this podcast.
Thank you so much.

That was the advert break. It’s over now.

From design then to production, and the big problem facing 1968 software people was being able to deliver large systems, both in terms of the amount of software and the amount of novelty introduced. The need to always chase the latest advances made, or should I say still makes, every project into part research, part development, and part implementation, even though it’s costed, presented to the customer, and charged for as a pure implementation project.

A Fortran compiler team will, by the time it writes its third Fortran compiler, be pretty good at writing Fortran compilers and at estimating how long it takes and how many resources they need to write a Fortran compiler. But most teams aren’t doing the same thing three times, they’re doing whatever it is for the first time, or for their first time anyway.

Your second Fortran compiler isn’t a Fortran compiler. It’s a Fortran compiler that works at an online terminal on a time-sharing computer, or in the cloud, or with blockchain, or AI assistance, or whatever’s new this week in the Datamation magazine.

The problem of scaling software production is so acute that there’s an argument over whether to just use a small team of people who know each other well for all software projects, or whether that limit would actually be the end of the software game altogether. Given the current Bot Farm amplified memes about a real-world Butlerian jihad, the event in Frank Herbert’s
Dune chronology where humankind turned against artificial intelligence, it’s kind of fun to imagine an alternate reality where the greatest computer scientists and electronic engineers in the world came together in 1968 and went, “no, this doesn’t actually work. Let’s just shut it all down.”

Digression. I said greatest in the world there, even though this podcast episode is about a NATO conference. Much as I’m not convinced the Western hegemony is the best way to organise society that one could invent, the truth is that communist bloc computing was on the back foot in 1968.

Under Stalin, cybernetics have been declared unsocialist as a tool for managerial control of the workers, so research into computing wasn’t easy to undertake, promote or secure resources for. This changed after Khrushchev’s Thaw, but it wasn’t until
the beginning of the 1960s that the Soviet government started sponsoring computing factories.

Competing interests and misaligned incentives meant that the dream of a centralised computer-controlled economy, a dream that Salvador Allende rediscovered for Chile in the 1970s, never came to fruition. At the beginning of the 1970s, Soviet computing policy turned to duplicating successful Western designs to the extent that the most popular microcomputer in the Eastern Bloc was actually a PDP-11 compatible.

The USSR undoubtedly had some very capable computing experts. Think of Ekaterina Shkabara or Lev Dashevskii, Viktor Glushkov
or Sergei Lebedev. And 1968 saw the release of the BESM-6, a machine with comparable capabilities to common American hardware. But the fact is that the Soviet Union was late to seeing value in computers and was relegated to copying Western innovations in both software (Algol, Fortran and Pascal were all popular compilers on BESM series computers) and in hardware. They typically designed integrated circuits by duplicating old designs from Texas Instruments.

Anyway, back to scaling software projects. And the conference perceived one of the biggest problems to be estimation in terms of both time and costs. If you could tell someone what they’d spend and how long they’d wait to get a working system and actually be correct about it, then you’d immediately make your endeavour more professional-seeming. Getting faster or cheaper at doing it or getting better at doing it could take a backseat to being reliable about doing the things that you claim you’re capable of doing.

The problem was nobody knew what they should be measuring which meant that they all ended up measuring the one thing that was actually countable: the number of instruction words produced. Everyone agreed that this was wrong but everyone agreed that there was no other game in town.

A couple of speakers suggested what would eventually become a decade later the “function point”: a measure of the amount of software requirements that you delivered. This was even presented in the context of measuring burndown in terms of test coverage. The amount of system you have done is the amount of software that actually does what was requested. This still suffers from the problem that we described in the previous episode where the requirements describe what the customers thought they wanted not what they actually need.

Harr listed 10 reasons that projects fail. Eight of these are the inability to estimate. They’re just the inability to estimate different things. One is a change management issue, which is not keeping the project documentation in sync with the reality, and the tenth is the one that Fred Brooks would seven years later name the mythical man month problem: trying to bring a project under control by throwing more people at it. The “human wave” approach was broadly derided at the conference even among those who didn’t think it realistic to keep software teams small.

Notice that modern software methodologies “solve” (again I’ve used my scare quote fingers) they “solve” these problems by backing away from them. We advocate for two pizza teams so that we don’t need to deal with solving communications problems. We heap scorn on people who try to solve those problems with ideas like SAFe or scrum of scrums. We advocate for short iterations so that we don’t have to do any estimation, beyond answering the question “do you think this will be ready within the next fortnight?”

We advocate for cross-functional on-site teams so that we can let gossip take the place of formal communication, or we drown remote workers under slack messages, emails, and wiki updates. In this sense, modern software engineering is more of a coping strategy than an answer to the challenges identified in 1968.

A section on performance monitoring in software production is mostly about testing—both of performance and of logical correctness. The section mentions automated suites of tests at both the unit and integration level, written in the same
language as the implementation, and checked in to the same configuration management system.

This brings me to what I consider to be the money quote from the report for this episode, and it’s from Alan Perlis:

A software system can be best designed if the testing is interlaced with the designing instead of being used after the design.

It turns out that there have been people advocating for test driven development in software longer than there have been people walking on the moon.

I want to end by coming back to this question of what constitutes design and what production, because there’s a section in the part of the report on production called Concepts which is about software paradigms.

Doug Ross advocates for “plexes”. Those are modules that combine data structure and algorithm very much like objects, in fact he references Simula as a good system for modelling these plexes. And Perlis observes that all of those abstractions exist in the Lisp programming language and they each have the name “function”.

It might seem like creating objects or functions is a design issue, but it influences so much of how you talk about and make software that it’s correct to consider it a management thing, a budgetary thing, and generally a production issue.

I’d love to hear your thoughts on this episode, or your reflections on the NATO conference report. You can comment on the blog post for this episode or you can email me at grahamlee at acm.org. Next time we’ll take a look at software support and see what the luminaries of 1968 made of helping their customers use their software.

Thank you very much for listening and we’ll talk later.

Episode 58: The NATO Software Engineering conferences, part 2

Links

Transcript

About Graham

Leave a Reply

OOP the Easy Way

APPropriate Behaviour

APPosite Concerns

Support This Site

FSF