Type safety, undefined behaviour, and us

There appears to be a shift towards programming languages that improve safety by providing an expressive type system, automatic memory management, and no gaps in the specification that lead to “undefined behaviour”. If your program is consistent with the logic of the programming language specification, then it compiles and executes the behaviour you would understand from the source code and the language documentation. If any of that isn’t true, then your program doesn’t compile: there are no gaps for not-quite-consistent programs to fall through, that get detected later when the program is running.

When I express this set of language design constraints, you may think of Swift, of Rust, of F#. You may try to point to Java, and say that it is very precisely defined and specified (it may be, but it still has undefined behavior: consider when fail-fast iterators throw ConcurrentModificationException, for example). But when was this trend started? When did people start to think about how these programming language properties might affect their programs?

Here’s a 1974 description of the SIMULA 67 programming language. This is SIMULA as a tool for extensible program products, by Jacob Palme, published in ACM SIGPLAN notices. The quotes are presented out of order, for expressivity.

Many programming language definitions contain the word “undefined” in many places. This is bad for security, since no one knows for sure what will happen if a programmer by mistake makes something “undefined” . In the definition of SIMULA, there is almost no occurence [sic] of the word “undefined” . This means that you always can know what is going to happen if you just have a program and the language definition . All errors can be discovered as soon as a rule of the language is broken. In other languages, errors are some times not discovered until strange consequences appear, and in these languages you then have to look at mysterious dumps to try to find the error. This can never happen in a fully working SIMULA system. All errors against the language are discovered as soon as they logically can be discovered, and the error message is always phrased in SIMULA terms, telling you what SIMULA error you have done. (Never any mysterious message like “Illegal memory reference”). The type checking also means that those programming errors which are not language errors in other languages will very often be language errors in SIMULA, so that they can be discovered by the compiler. Compared to other languages I know of, no language has such a carefully prepared, fully covering security system as SIMULA, and this is very important for the production of commercial program products.

Important for security is also that there is no kind of explicit statement for freeing the memory of a record no longer needed. Such statements are very dangerous, since the program can by mistake later on try to process a record whose memory has been freed. In SIMULA, records no longer needed are removed automatically by a so called garbage collector.

Important for security is also that the data fields of new records are initialized at allocation time. Otherwise, some garbage left from previous data could give very difficult errors.

When Palme says “security”, he doesn’t mean the kind of thing we might call “software security” or “information security” in 2023: the protection of assets despite possibly-intentional misuse of the program. He really means “safety”: the idea that the programmer can know the behaviour of the program, even when the program has been deployed to the customer’s computer and is out of the programmer’s direct control.

Now, what do we learn from this? Firstly that the problems evident in C were already known and solved when C was being developed. C’s first wave was between roughly 1972 (its inception) and 1978 (first edition K&R); this 1974 review identifies important qualities evinced by a 1967 programming language; qualities lacking in C.

Also, that they maybe aren’t that showstoppers, given how much software is successfully in use today and is written in C, or a related language, or on a runtime that’s written in C. Probably that software engineering is a fashion discipline, and that safe programming languages are fashionable now in a way that they weren’t in the 1960s, 1970s, 1980s, 1990s, 2000s, and 2010s: we had access to them, and we didn’t care to use them.

But also we see support for André Gide’s position: Toutes choses sont dites déjà; mais comme personne n’écoute, il faut toujours recommencer. That is: everything has already been said, but because nobody listens, it’s necessary to say it again.

That nobody listens isn’tt a criticism of everyday software engineers. There were a tiny number of people reading ACM SIGPLAN notices in 1974, and most people who’ve entered computing in the intervening 39 years don’t know any of them. There’s no way they could be reasonably expected to have encountered this particular review of SIMULA, let alone to have taken the time to reflect on its position with respect to everything else they know about programming languages so that it can influence their choices.

Software engineering is big enough, and up until recently growing fast enough, that to a first approximation anything that’s worth reading in the field hasn’t been read by any practitioner. If an idea is important, it bears repeating. The people who didn’t hear it the first time around aren’t ignorant; they’re busy.

Type safety, undefined behaviour, and us

About Graham

One Response to Type safety, undefined behaviour, and us

Leave a Reply

OOP the Easy Way

APPropriate Behaviour

APPosite Concerns

Support This Site

FSF