Episode 57: The NATO Software Engineering conferences, part 1

This episode contextualises the 1968 NATO Science Committee conference on Software Engineering, and explains what we learn through the executive summary, preface, and first three sections of the conference report. Upcoming episodes will cover the rest of the 1968 conference, the change in attitude shortly thereafter, and the entirely different report from the 1969 conference.

The episode is supported by members of the Chiron Codex Patreon, so please do join the community or hit the Ko-Fi button to make a one-off donation.

Links

Transcript

Welcome to episode 57 of the Structure and Interpretation of Computer
Programmers podcast. I’m Graham Lee, and this episode is the first
part of a mini-series discussing the 1968 and 1969 NATO conferences on
software engineering. It’s sponsored by the members of my Patreon,
which can include you.

Software engineering is commonly thought to have had its genesis at
the NATO Science Committee Conference on Software Engineering, held in
Garmisch, Germany, in the week of October 7th to 11th, 1968.

Certainly, the phrase software engineering was coined for the
title of that conference, and NATO didn’t already do software
engineering. The conference was initiated by a Science Committee
working group on computer science.

    Computer science itself was a new idea, having been named by an
    independent consultant, Louis Fein, in 1959. The ACM first put
    together a preliminary CS curriculum in 1962 to 1965, and eventually
    ratified it as Curriculum 68, the same year as the first of the NATO
    conferences on software engineering.

    This conference report has mostly gone down in history as a broadly
    cited starting point for the so-called software crisis. But what does
    it actually say? Before answering that, we need to contextualise the
    conference, beginning, I suppose, by addressing the elephant in the
    room. Why NATO? The answer is simply that NATO represented the largest
    customer and a good chunk of the supply chain of computers and their
    applications at the time. Electronic computers had been invented a
    little more than 20 years earlier, and had found their first
    applications in the military. In Britain, the Colossus system provided
    brute force cryptanalysis to the government code and cipher school,
    and in the United States, the ENIAC was funded to calculate artillery
    tables, and applied by John von Neumann to thermonuclear reaction
    calculations used to design the hydrogen bomb.

    By the end of the 1950s, the United States’ semi-automated ground
    environment defence system employed 800 to 900 programmers, more than
    half of the total workforce in the country. The project would grow to
    about 2,000 programmers over its lifetime. Many of the ideas of
    division of labour between hardware and software people, and between
    different software people, came from military projects. Computers are
    one of the ideal examples of military technology that becomes dual use
    through serendipity. NATO had the most to gain if people found better
    ways to make software more efficiently and quickly.

    Software engineering, the phrase, was according to the conference
    report’s preface, and to the reminiscences of one of its editors,
    Brian Randell, a name chosen provocatively to suggest that software
    needed to be constructed with the same rigour as found in established
    engineering disciplines. This conference brought together people from
    academia and industry, about half were academics, nearly half from
    computing companies or consultancies, and a few government employees
    from computing using departments, and people from North America and
    Europe, but mostly Europe. I count 37 European attendees or observers,
    and 24 from the United States and Canada.

    And the conference was organised into three work groups, software
    design, software production, and software service, in which they would
    discuss this notion of software engineering. Now software engineering
    as a field almost implies the absence of hardware, at least the
    absence of hardware is an important constraint on the design of
    software. This move, certainly a political one in a field of
    professional boundaries, in which programmers and analysts try to
    assert their importance in the computing world as peers or even
    superiors to the electrical and electronics engineers by describing
    their own work as an independent engineering discipline in its own
    right.

    This move mirrors the slightly earlier development of academic
    computer science by minimising the contribution of the computer. The
    argument goes that as hardware gets more capable and flexible, the
    specific limitations of any one device become unimportant, and
    software designers can concentrate wholly on the problem domain. At
    the outset of the integrated circuit era, this might have seemed a
    reasonable bet, but in practice, there are a few domains where it’s
    true even now.

    Bob Barton made the opposite argument. He said, In design, we should
    start by designing hardware and software together. This will require a
    kind of general purpose person, a computer engineer. It’s unclear to
    what extent the Software Engineering Conference, at which Barton made
    that comment, actually served to widen the professional gap between
    hardware and software, or whether the existing Taylorist fad for
    subdividing knowledge work in the mid-20th century had already made
    that split absolute. What we do know is that other than some
    hobbyists and brief flurries at the beginning of the microcomputing
    and Internet of Things eras, computer engineers haven’t existed, and
    most organisations have separated their hardware and their software
    divisions, assuming they even designed both at all.

    From the very start of the report, the highlights section, that serves
    as an executive summary, Randell’s recollection, and the conference as
    presented in the report that he edited, and presumably the executive
    summary that as editor he would have co-written, diverge immensely.

    Randell recalls the conference as being the place where the software
    crisis was named and acknowledged, and a field of software engineering
    bent towards its resolution. In fact, it seems that the word crisis
    hardly appears in the report at all, that conference attendee Edsger
    Dijkstra popularised the software crisis myth in the 1970s, and that
    the editors of the report were aware that it, quote, did not attempt
    to provide a balanced review of the total state of software, and tends
    to under-stress the achievements of the field, end quote.

    Indeed, in another direct quote in the report from John Buxton, we
    find that 99% of computers work tolerably satisfactorily, and Ken
    Kolence says, “there are many areas where there is no such thing as a
    crisis”, although the wording here implies that the idea of a crisis
    was being discussed at the conference, at least.

    So what are the problems that the conference addressed?
    Interestingly, the highlights describe the problem crucial to the use
    of computers as being the “so-called software or programs developed to
    control their action”.

    I wonder what this means. I initially interpreted it as suggesting
    that the idea of software as a distinct entity was not yet settled.
    Perhaps some people thought of a computer as a general-purpose device
    that you add software to for a particular application, while others
    thought of a computer as a component of a system that needs to be
    programmed to fulfil its role in that system. Subsequently, I changed
    my mind, and I think the editors might just mean to say that software
    is a technical term that the broader reaches of their audience won’t
    know the meaning of in 1968. But I’m interested to hear how you
    interpret the idea of so-called software.

    The specific problems they describe as being relevant to their broader
    audience, that’s academics, policy makers, civil servants, people who
    market computers, beyond the realm of people who directly work on
    software engineering. And these are direct quotes from the highlights
    section of the report.

    Firstly, the problems of achieving sufficient reliability in the data
    systems which are becoming increasingly integrated into the central
    activities of modern society. I interpret this problem as one of the
    earliest examples of the idea that software is eating the world.
    Second, the difficulties of meeting schedules and specifications on
    large software projects. Third, the education of software or data
    systems engineers. And lastly, the highly controversial question of
    whether software should be priced separately from hardware.

    This is the advert break. It starts now.

    This episode is brought to you by me, Graham Lee. But really, by you.
    Chiron Codex is a community of people who are learning how to become
    better software engineers by adopting AI augmentation in a thoughtful
    way. We aren’t outsourcing our understanding to coding assistants
    like Claude or Codex, but becoming software engineering centaurs by
    using AI tools to improve our knowledge and the quality of our work.
    Join the community over on Patreon to find out about interaction
    patterns that improve your work with AI coding tools, running LLMs for
    software development locally, discussions of recent research in the
    field, and more.

    If you’re a software engineer who’s interested in the promise of AI
    tools, but sceptical about handing your skills over to the computer,
    this is the community for you. Go to https://patreon.com/chironcodex,
    that’s C-H-I-R-O-N-C-O-D-E-X, now for more information and to join.
    Use the gift link in the show notes to get your first month of insider
    access completely free. Alternatively, you can show your appreciation
    by donating at ko-fi, that’s https://ko-fi.com/chironcodex, K-O-F-I
    dot com.

    Direct support by my audience is the only revenue I get for my work as
    a software engineer and communicator, so your support really means a
    lot to me, and makes it possible for me to produce this podcast.
    Thank you so much.

    That was the advert break. It’s over now.

    Remember that in 1968, a lot of software programs were batch jobs that
    ran on a whole machine, with no timesharing. There were already a
    total of two computers at MIT that ran the CTSS timesharing system.
    Development of Multics, the predecessor of Unix, was underway, and
    Dijkstra’s team had been working on the THE multiprocessing system for
    a while.

    But, for the most part, while a computer ran your program, it did
    nothing else. That also meant that it wasn’t running your compiler or
    your assembler. Programmers had to wait in line for computer time,
    just like everybody else. So, programs were written by hand, often
    with flowcharts as design aids, and a lot of debugging incurred in
    vivo, with programmers emulating the computer state in their head, and
    checking that algorithms yielded the expected results. As we’ll see
    in later parts of the conference, automated testing did exist, both at
    the unit and system level.

    Computer hardware had already adopted transistors, and even some early
    integrated circuits. But, in 1968, there wasn’t the aggressive
    upgrade cycle that we see today, and it’s likely that almost every
    computer that had ever been built by the time of the conference was
    either still in use, or had had its parts cannibalised for another
    computer that was still in use. This includes computers based on
    thermionic valves, and including those valve-based computers that use
    non-binary storage, including valves that store octal and decimal
    digits.

    Many early computers were one-offs, designed to support the
    applications they were commissioned for, but there were some standard
    designs, and even one example of a family of compatible computers that
    could all, well, almost all, run the same software, while offering
    different specifications or capabilities. This was the IBM System 360.

    Its operating system, OS 360, was released in 1965, and it
    required 44 kilobytes of memory, when the System 360 family offered
    between 8 kilobytes and 4 megabytes. The conference report makes a
    note of this as a massive, expensive, staff-heavy project, as
    expensive to IBM as a project to develop the System 360 hardware that
    it ran on. But the world would have to wait until 1975 for Fred
    Brooks’ detailed post-mortem in the Mythical Man Month.

      To give some idea of the scale of software production at the time of
      the conference, co-chair Dr. H.J. Helms estimates that there were
      10,000 installed computers in Europe, a number that grew by 25% to 50%
      per year, with more than a quarter of a million analysts and
      programmers affected by the quality of software that manufacturers
      distributed for those computers.

      Alexander d’Agapeyeff reports that a decade earlier, in 1958, a
      European general-purpose computer manufacturer often had less than 50
      software programmers. Now, 1968, they probably number 1,000 to 2,000
      people. What would be needed in 1978? he asked.

      Well, fast-forwarding further than that, there are now big tech
      companies with tens of thousands of software programmers who don’t
      manufacture any computers at all.

      As noted in the highlights, it’s large systems, where ambition
      outstrips capability, in which the attendees saw a problem. With two
      attendees, Asher Oppler and Stanley Gill, the latter being one of the
      co-inventors of the subroutine, questioning whether customers should
      even be allowed to request computer systems whose complexity outstrips
      the capabilities of software creators.

      As the complexity of system grows, the number of errors introduced
      grows even faster. Doug McElroy and Collins both noted that the
      process by which software is created uses backward techniques and has
      a deservedly poor reputation. But why?

      The report proposes two underlying causes in the section on Software
      Engineering and Society, which was written for a more general
      policy-making audience than the technical sections later. The first
      cause is, according to Cambridge University’s Sandy Fraser, that
      software production isn’t a linear path in which every activity takes
      a step towards working software, and that managers don’t know what to
      measure or how to measure it.

      This is still a problem in 2026, as we saw with managers leaping on
      the tokens-consumed metric without connecting that to working software
      produced by their organisations.

      The second cause, expressed by Robert Graham of MIT’s Project Mac,
      which spawned the MIT AI lab, is that projects go on for years using
      their initial poor understanding of the system, then deliver something
      that doesn’t work as needed. Then they have to go back and start
      again.

      So even in 1968, it was seen that software construction needed more
      feedback than projects were accepting from customers. And indeed,
      that’s a core topic in Section 3 of the report, a discussion on the
      nature of software engineering.

      Two papers, one by a Mr. J. Nash of IBM UK and the other by
      Dr. F. Selig of oil company Mobile, give schematic outlines of the
      software engineering process, moving linearly from analysis to design
      to implementation to deployment to maintenance. Both show activities
      occurring in parallel, unlike the phased approach that became popular
      among people who misread the Royce paper, with Nash’s diagram in
      particular showing that technical support, documentation, test
      development and control and administration, i.e. project management,
      occur throughout the project lifetime.

      Multiple attendees noted the lack of feedback in both diagrams and the
      necessity to get feedback throughout the project. Bernard Galler,
      then president of the ACM, recounted stories of projects delivering
      poor quality results because of the lack of user feedback into the
      designs and asked the question, why do these things happen? Why
      indeed?

      Selig himself points to feedback within the project with external
      requirements informing software design and internal design constraints
      informing the requirements. Sandy Fraser’s own description of the
      progress of a software activity presages iterative and incremental
      approaches like Barry Boehm’s 1988 Spiral model, in which, to quote
      Sandy Fraser, each stage produced a usable product and the period
      between the end of one stage and the start of the next provided the
      operational experience upon which the next design was based.

      With the benefit of hindsight, this sounds a lot like proceeding in
      short iterations with time for retrospection in between them. In
      practice, without access to the whole paper—the conference report is
      comprised of working papers that were discussed in the conference but
      never published as a proceedings as such—without access to the whole
      paper, we don’t know if these iterations were weeks or months long or
      who found the products to be usable. It could be that the output of
      an early iteration was a system requirements specification that was
      usable by a software designer, for example.

      d’Agapeyeff described an inverted pyramid model in which a large
      number of application programs depend on a smaller number of service
      routines that sit on an even smaller base of control programs
      buttressed by compilers and assemblers. Due to the lack of feedback
      between applications programmers and hardware vendors who wrote the
      control programs and the service routines, there was a necessary
      middleware layer that adapted the service routines onto the
      application’s needs but which couldn’t do anything to address
      performance issues.

      He described programming as still too much of an artistic endeavour
      and suggested that more teaching was needed in structuring programs,
      designing and testing modules and simulating runtime conditions. In
      other words, in designing testable software and in testing it.

      Assuming you listened from the start of the podcast and didn’t just
      skip to here on the basis that I tend to take a long time getting
      warmed up to a topic, you will remember that there were three
      workgroups at the conference, design, production and service. At the
      actual conference, attendees disagreed that design and production of
      software were distinct activities.

      Report editor Peter Naur says that the distinction is arbitrary and
      only exists to support the division of labour in software projects.
      Dijkstra says that we can’t separate the two if we are going to do a
      decent job. And a consultant by the name of Kinslow says that design
      is necessarily iterative. He describes the failure on large projects
      as rushing to get the specification done, so skipping bits which you
      expect to be able to fill in later, but which are then incorrectly
      coded by 200 people. And then it’s too late to correct the damage
      that’s been done to the project.

      I’ve seen that failure mode on software projects in my career, which
      started in 2004, but probably, at least hopefully, much less
      frequently than the people in 1968, saw it.

      The money quote from the first part of the 1968 report is, to my mind,
      this from Doug Ross of MIT, who went on to invent the structured
      analysis and design technique.

      “The most deadly thing in software is the concept, which almost
      universally seems to be followed, that you are going to specify what
      you are going to do and then do it. And that is where most of our
      troubles come from. The projects that are called successful have met
      their specifications, but those specifications were based upon the
      designer’s ignorance before they started the job.”

      Think about this quote the next time you read a LinkedIn post on the
      benefits of spec-driven development.

      In this episode, we’ve covered the first 33 pages of a 226-page
      report, one of two reports from the NATO conferences on software
      engineering, and found that even then, software design was understood
      to need iterative feedback from users, integrators, and producers, and
      that everybody involved in the project had to share their knowledge
      and build the software based on the latest knowledge integrated from
      everybody, not on the designer’s initial feels.

      Good news about the rest of this series is that the next page, page
      34, is blank. But next time, we’ll start to look at the output of
      some of the working groups and dig into the state of the design and
      production of software in 1968.

      Until then, remember that you can contact me with your feedback on
      this episode. You can go to the page on the Structure and
      Interpretation of Computer Programmers podcast where the post for this
      episode is hosted. That’s at https://sicpers.info/podcast.

      You can email me grahamlee at acm.org or you can join the Patreon to
      support my work and join in the chat there. That’s at
      https://patreon.com/chironcodex. I’ll talk to you again soon.

      About Graham

      I make it faster and easier for you to create high-quality code.
      Bookmark the permalink.

      Leave a Reply

      Your email address will not be published. Required fields are marked *

      This site uses Akismet to reduce spam. Learn how your comment data is processed.