Structure and Interpretation of Computer Programmers

I make it easier and faster for you to write high-quality software.

Thursday, January 3, 2019

Impossibility and Uncertainty in AI

About this paper

Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function), Peter Eckersley. Submitted to the ArXiV on December 31, 2018.


Ethical considerations in artificial intelligence applications have arguably been present since the birth of the field, if not earlier. Karel Čapek wrote R.U.R., the play that introduced both the word “robot” to our lexicon, and the question of whether such creations would want to work. Isaac Asimov and Mary Shelley both asked how a society with “natural” and “artificial” people would function.

Getting Trollied

More recently, as we’ve been able to create things that we can both identify as solving problems and market under the term Artificial Intelligence, the question of AI ethics has reappeared with different specialisations. Should we apply AI techniques to particular fields of endeavour at all, such as autonomous weaponry? Who should be responsible for a decision made, for an example, by an autonomous vehicle? How should such a vehicle make a decision when faced by the classic trolley problem?

Eckersley opens this paper with an aside, important to those who consider AI as an engineering discipline (which is not a clear statement in itself): autonomous vehicle research is a long, long way away from even contemplating the trolley problem. Currently, reducing the uncertainty with which obstacles are detected and identified from noisy input signals is a much more pressing issue.

However, human drivers are better at object identification, so may be sufficiently advanced to address the trolley problem should it arise in piloting a vehicle. And this is really an important point in thinking philosophically about how AIs should act: we already have intelligent actors, and already think about how they act.


Indeed, we already have artificial intelligences, including society-level AIs. A government bureaucracy; an enterprise; a policy document; all of these are abstractions of human intelligence behind machines, rules, and assumptions of how to work in given contexts. Any time that someone relies on external input to make a decision, whether that input is a checklist, a company handbook, a ready reckoner tool or a complex software-built decision support tool, you could replace that tool with an AI system with no change in the ethics of the situation.

Anywhere you could consider an AI making choices that are suboptimal to the welfare of the affected people, you could imagine those choices being suggested by the checklist or ready reckoner. None of this is to minimise the impact or applicability of AI ethics, quite the contrary: if we are only considering these problems now because we think AI has become useful, we are late to the party. Very late: what was the Soviet Gosplan other than a big decision support system trying to optimise a society along utilitarian lines?

The early sections of this paper summarise earlier proofs that it is impossible to define an objective utility function that will optimise for multiple, conflicting output dimensions. Not because it is hard to discover or compute the function, but because an acceptable trade-off may not exist. The proofs take this form:

  • imagine a guess at a solution, s.
  • there is a rule r1 which, applied to s, yields a better solution, s1.
  • there is a rule r2 which, applied to s1, yields a better solution, s2.
  • there is a rule rN which, applied to s(n-1), yields a better solution, s.

In other words, the different states that represent solutions cannot be “totally ordered” from “best” to “worst”; a cycle exists such that the question “which is the best” is paradoxical. While that has clear impact on AIs that use objective utility functions to choose solutions to their inputs, the wider implication is that no utilitarian system can objectively choose the “best” arrangement for welfare.

If you’re building the multivac from Isaac Asimov’s Franchise, then I’m going to have to disappoint you now: it’s impossible. But so is having a room full of people with slide rules and 383 different folders containing next year’s production targets. It’s not building an AI to optimise for the utility function that cannot be done; it’s finding a maximum on the utility function.

Finding a Workaround

A few techniques for addressing this problem are discussed and rejected: we could ignore the problem on the basis that our AIs are not currently important enough to have huge impact on social welfare. But AI is being used in welfare systems, in criminal justice systems, in financial management. And even where AI is not used, corporations and bureaucracies are already being applied, and these will have their decision support tools, which will eventually include AI.

Similarly, simplifying the problem is possible, but undesirable. You could define some kind of moral exchange rate to reduce the dimensionality of your problem: this much wealth in retirement is worth this much poverty today; this much social equality can be traded for this much value of statistical life. Or you could accept one or more of the undesirable properties of a solution as axiomatic; we can solve for inequality if we accept poverty, perhaps. Neither of these are ethically unambiguous.

Embracing Uncertainty

Ultimately, Eckersley concludes that the only feasible approach is to relax the requirement for objective, total ordering of the possible outcomes. One way is to introduce partial ordering of at least two of the comparison dimensions: to go from saying that a solution is { better than, worse than, the same as } another along some axis to saying it is { better than, worse than, the same as, incomparable to } the other solution. While this breaks the deadlock it’s difficult to work with, and domain-specific interpretations of partial ordering will be needed. And now we’re back at the trolley problem: there are multiple outcomes to choose from, we don’t know (because it’s impossible to say) which is better, and yet a decision must be taken. Which?

A second model replaces the definite ordering of solutions with probabilistic ordering. Rather than the impossible statement “outcome B is better than outcome A”, the pair of statements “outcome B has probability p of being better than outcome A” and “outcome A has probability (1-p) of being better than outcome B” are given. Now a system can find the outcome with the highest likelihood of being best, or pick from some with sufficient likelihood of being best, even though it is still impossible to find the best. There will always be some chance of violating a constraint, but those should at least be free of systematic bias.


If your AI strategy is based on the idea that your domain can be “solved numerically”, by finding some utility function or by training an AI to approximate the utility function, then you need to ask yourself whether this is true without introducing some biases or unethical choices into the solutions. Consider whether your system should find what is “best” and do it, or find actions that are “good enough” and either choose between them, or present them for human supervision.

posted by Graham at 14:30  

Thursday, December 20, 2018

Research Watch, and Java by Contract

I introduced Java by Contract, a tool for building design-by-contract style invariants, preconditions and postconditions in Java using annotations. It’s MIT licensed, contributions are welcome, and I hope this helps lots of people to introduce stronger correctness checking into your software. And book office hours if you’d like me to help you with that.

Java by Contract came about as part of Research Watch, a new blog series over at The Labrary where I talk about academic work and how us “practitioners” (i.e. people who computer who aren’t in academia) can make use of the results. The first post considers a report of Teaching Quality Object-Oriented Programming to computer science students.

By the way, I will be speaking at Coventry Tech Meetup on 10th January on the topic “Beyond TDD”, and Java by Contract will make an appearance there.

Long-time SICPers readers will remember Programming Literate, a Tumblr discussing results from empirical software engineering. And if you don’t, you’ll probably remember your feeds exploding on July 15, 2013 when I imported all of the posts from there to here. You can think of Research Watch as a reboot of Programming Literate. There’ll be papers new and vintage, empirical and opinionated, on a range of computing topics. If that sounds interesting, subscribe to the Labrary’s RSS feed.

posted by Graham at 10:30  

Wednesday, December 19, 2018

Teaching Quality Object-Oriented Programming

About this paper

Teaching Quality OOP by Yishai A. Feldman, published March 2005 (see the link for full citation).


One of the points made in my book Object-Oriented Programming the Easy Way is that objects should be specified by their interfaces through contracts, which say what messages the objects respond to, how you use them, and what happens as a result.

While it is up to any one object to decide how it responds to messages, we need to know whether that object represents a useful addition to our system. In other words, we want to know what the object will do in response to what messages. (Page 61)

The book Structure and Interpretation of Computer Programs says the same thing. An abstract data type has a collection of things that can be done, and stuff that happens when you do it.

In general, we can think of data as defined by some collection of selectors and constructors, together with specified conditions that these procedures must fulfill in order to be a valid representation. (§2.1.3)

This seems evident. Knowing that I have a int count(), a boolean contains(Object o) and a void add(Object o) method is insufficient, I need to know how they interact before I can use them. For an array, given:

int x = a.count();
int y = a.count();

you would always expect y - x == 1. For a set, it would depend on the content of the collection before addition; it could be 1 or 0. Knowing what methods are called is insufficient to know what type I am dealing with.

Why is it, then, that programming languages give you types for expressing the methods on an object, but not what they do or how they relate? Why does a Java interface, a Swift protocol, or a C++ abstract class only have the part of the contract related to names, not behaviours?

Famously, the Eiffel language addresses this, and it’s here that we come into contact with the paper that is the topic of this post. Eiffel, and its underlying theory, is well-described in the book Object-Oriented Software Construction by Bertrand Meyer, CTO of Eiffel Software and researcher at ETH Zurich. Feldman wanted to teach his students the theory of “Quality Software” based on two principles that are well-described in OOSC:

  • Design by Contract, because it’s more approachable and usable than formal methods, and more useful than testing; and
  • Command-Query Separation, because it isolates state changes, making it easier to draw conclusions about the behaviour of a software system.

But he didn’t want to teach them Eiffel, because:

it might leave the students with the mistaken and harmful impression that quality programming is confined to one language. (§3)

Additionally, OOSC does not include any exercises, so is not useful as a teaching support book. I will note here that Meyer has also written A Touch of Class, which does have supporting teaching material including exercises, but still uses the Eiffel language and therefore only solves half of the author’s problems.

Feldman taught the theory from OOSC using Eiffel notation, and encouraged students to complete exercises that were in variants of the Java programming language. Tools available at the time read the contract out of special additions to the class’s Javadoc comments, and modified the source code to include assertions at the relevant points in execution.

This led me to wonder about modern Java syntax, and whether it’s possible to make a similar tool using Java’s annotation features so that programmers don’t have to worry whether a source conversion tool has introduced errors, or trace changes to the source code when working back from a failure report to the broken source.

The answer is yes, and so now the Labrary can offer Java by Contract as a tool for designing Java types by contract. It encodes the parts of the contract as names of methods to invoke that return boolean, failing if the answer is false. A rewrite of the Map interface from page 13 of the paper is given below.

public interface Map {
   * Does the key k appear in the map?
  @Precondition(name = "nonNullK")
  @Postcondition(name = "inMapIffInKeys")
  boolean has(Object k);

  default boolean nonNullK(Object k) {
    return (k != null);
  default boolean inMapIffInKeys(Object k, Boolean result) {
    return (result == this.keys().has(k));

   * The value of the map at key k, null if undefined.
  @Precondition(name = "nonNullK")
  @Postcondition(name = "nonNullValueIffHasKey")
  Object item(Object k);

  default boolean nonNullValueIffHasKey(Object k, Object ret) {
    return ((ret != null) == this.has(k));

   * Associate key k with value v.
  @Precondition(name = "nonNullKeyAndValue")
  @Postcondition(name = "hasK")
  @Postcondition(name = "itemForKIsNowV")
  void put(Object k, Object v);

  default boolean nonNullKeyAndValue(Object k, Object v) {
    return ((k != null) && (v != null));
  default boolean hasK(Object k, Object v, Void result) {
    return this.has(k);
  default boolean itemForKIsNowV(Object k, Object v, Void result) {
    return (this.item(k) == v)

   * Remove key k and associated value from map.
  @Precondition(name = "nonNullK")
  @Postcondition(name = "doesNotHaveK")
  void prune(Object k);

  default boolean doesNotHaveK(Object k, Void result) {
    return !this.has(k);

   * The set of all keys in the map.
  @Postcondition(name = "nonNullReturn")
  ReadOnlySet keys();

  default boolean nonNullReturn(ReadOnlySet result) {
    return (result != null);

This approach lets us write contract conditions that have full access to the internal state of the objects they are implemented on, so internal invariants can be verified in addition to the properties explored through the interface (by specifying them on the implementing class, not on the declaring interface). As a few pages of the paper are dedicated to the author’s (and the class’s) experience with various design by contract tools and their shortcomings, it’s valuable to explore this space and come up with better approaches. I hope that Java by Contract is a useful addition.

Feldman additionally notes that the Java class library, including the Collections library, violates the Command-Query Separation principle. He even notes that the language does: the construct:


is both a command (increment x) and a query (what value did x previously have?) in a single expression. The early exercises in his class instruct students to write CQS-satisfying library objects (hence the Map example above), which the subsequent exercises build on.

Feldman left academia the year after this paper was written, and is now at IBM Research. That means the trail of development of this class has run out, and we cannot say how the teaching would have adapted to the subsequent decade of progress in Java.

posted by Graham at 16:35  

Sunday, October 22, 2017

Bottom-up teaching

We’re told that the core idea in computer programming is problem-solving. That one of the benefits of learning about computer programming (one that is not universally accepted) is gaining the skill of problem decomposition.

If you look at real teaching of computing, it seems to have more to do with solution composition than problem decomposition. The latter seems to be background noise: here are the things you can build solutions with, presumably at some point you’ll come across a solution that’s the same size and shape as one of your problem components though how is left up to you.

I have many books on programming languages. Each lists the features of the language, and gives minimally complex examples of the use of those features. In that sense, Kernighan and Ritchie’s “The C Programming Language” (section 1.3, the for statement) is as little an instructional in solving problems using a computer as Eric Nikitin’s “Into the Realm of Oberon” (section 7.1, the FOR loop) or Dave Thomas’s “Programming Elixir” (section 7.2, Using Head and Tail to Process a List).

A course textbook on bitcoin and blockchain (Narayanan, Bonneau, Felten, Miller and Goldfeder, “Bitcoin and Cryptocurrency Technologies”) starts with Section 1.1, “Cryptographic hash functions”, and builds a cryptocurrency out of them, leaving motivational questions about politics and regulation to Chapter 7.

This strategy is by no means universal: Liskov and Guttag’s “Program Development in Java” starts out by describing abstraction, then looks at techniques for designing abstractions in Java. Adele Goldberg and Alan Kay described teaching Smalltalk by proposing exploratory projects, designing the objects that model the problem under consideration and the way in which they will communicate, then incrementally filling in by designing classes and methods that have the desired properties. C.J. Date’s “An Introduction to Database Systems” answers the question “why databases?” before introducing the relational model, and doesn’t introduce SQL until it can be situated in the context of the relational model.

Both of these approaches, and their associated techniques (the bottom-up approach and solution construction; the top-down approach and problem decomposition) are useful; the former leads to progress and the latter leads to understanding. But both must be taken in concert, because understanding without progress leads to the frustration of an unsolved problem and progress without understanding is merely the illusion of progress.

My guess is that more programmers – indeed whole movements, when we consider the collective state of things like OOP, functional programming, BDD, or agile practices – are in the “bottom-up only” group than in the “top-down only” or “a bit of both” groups. That plenty more copies of Introduction to Programming in [This Week’s Hot Language] have been sold than Techniques for Making Your Problem Amenable to Computation. That the majority of software really does comprise of solutions looking for problems.

posted by Graham at 16:06  

Tuesday, April 25, 2017

Acne cream

I just want to point out that even the best of us aren’t doing what we expect the makers of acne creams to do.

What we actually know about software development, and why we believe it’s true by Greg Wilson.

posted by Graham at 15:53  

Thursday, March 24, 2016

On immutable data structures…?

If you write a scholarly publication and cite another one, what you say about it depends on its mutability. An article or a book can be cited by saying “this publication I’m identifying here says this”. Maybe you have to version your claim: “the second edition of this publication says this”. They’re immutable. Even if the third edition doesn’t say the thing you relied on in constructing your argument, the second edition still did. Someone who can get access to that second edition can look at it and see how you built your synthesis.

You can’t do that with a website. Websites change. Instead, you have to say that “this website identified by this URL, on the date that I read it, said this”. Someone who comes along later has to sort-of trust that, because if the website no longer says that, it might not be possible to tell whether it ever did say that, or whether you’re telling porky pies about your research.

Dependencies in software systems are usually given as if they work like book citations:

gem 'rack', '1.0'

…looks like it says “the thesis that’s constructed by my software is a synthesis in which version 1.0 of rack is axiomatic”, but it doesn’t. It’s really saying “at the time that I want you to think that I actually tested this stuff, it was true that the thing identified by being version 1.0 of rack was…”. It’s really a poorly-constructed website citation.

It’s fun to think, particularly in light of the npm shenanigans, just how long that dependency you didn’t bother downloading will still be around. You can presumably forget about relying on commercial software, as the licence agreement is the legal equivalent of Vader saying “I have altered the deal. Pray I do not alter it any further.” And indeed you can forget most open sores licences, which don’t put any requirements on your supplier. But what about the GPL? Version 3 (retrieved from this URL on 24th March 2016) says that anybody who distributes licensed software as object code may, as one possible way to provide access to the corresponding source code, provide that object:

accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge

What if the person you got the object code from dies within that three year period, do you have the right to ask the executor of their estate for the source code?

posted by Graham at 18:51  

Friday, July 4, 2014

Intra-curricular activities

I’m apparently fascinated by the idea of defining curricula for learning programming. I’ve written about how we need to be careful what we try to pay forward from the way we learned in the past, and I’ve talked about how we do need to pay it forward so that the second hundred years see faster progress than the first hundred years.

I’m a fan (with reservations, as seen below) of the book series as a form of curriculum. Take something like Kent Beck’s signature series, which covers a decent subset of both technical and social approaches in software development in breadth and in depth. You could probably imagine developers who would benefit from reading some or all of the books in the series. In fact, you may be one.

Coping with people approaching the curriculum from different skill levels and areas of experience is hard. Not just for the book series, it’s hard in general. Universities take the simplifying approach of assuming that everybody wants to learn the same stuff, and teaching that stuff. And to some extent that’s easy for them, because the backgrounds of prospective students is relatively uniform. Even so, my University course organised incoming students into two groups; those who had studied complex numbers at A-level and those who had not. The difference was simply that the group who had not were given a couple of lectures on complex numbers, then it was assumed that they also knew the topic from the fourth week.

Now consider selling a programming book to the public. Part of the proposal process with all of the publishers I’ve worked with has been describing the target audience. Is this a book for people who have never programmed before? For people who have programmed a little, but never used this particular tool or technique? People who have programmed a lot but never used this tool? Is this thing similar to what they have used before, or very different? For people who are somewhat familiar with the tool? For experts (and how is that defined)? Is it for readers comfortable with maths? For readers with no maths background?

Every “no” in answer to one of those questions is an opportunity to improve the experience for a subset of the potential audience by tailoring it to that subset. It’s also an opportunity to exclude a subset of the audience by making the content less relevant to them.

[I’ll digress here to explain how I worked that out for my books: whether it’s selfishness or a failure of empathy, I wrote books that I wanted to read but that didn’t exist. Therefore the expected experience is something similar to mine, back when I filled in the proposal form.]

Clearly no single publication will cover the whole phase space of potential readers and be any good. The interesting question is how much it’s worth covering with multiple publications; whether the idea of series-as-curriculum pulls in the general direction as much as scope-limiting each book pulls in the specific. Should the curriculum take readers on a straight line from novice to master? Should it “fan in” from multiple introductions? Should it “fan out” in multiple directions of interest and enquiry? Would a non-linear curriculum be inclusive or offputtingly confusing? Should the questions really be answered by substituting the different question “how many people would buy that”?

posted by Graham at 18:15  

Tuesday, April 15, 2014

Preparing for Computing’s Big One-Oh-Oh

However you slice the pie, we’re between two and three decades away from the centenary celebration for applied computing (which is of course significantly after theoretical or hypothetical advances made by the likes of Lovelace, Turing and others). You might count the anniversary of Colossus in 2043, the ENIAC in 2046, or maybe something earlier (and arguably not actually applied) like the Z3 or ABC (both 2041). Whichever one you pick, it’s not far off.

That means that the time to start organising the handover from the first century’s programmers to the second is now, or perhaps a little earlier. You can see the period from the 1940s to around 1980 as a time of discovery, when people invented new ways of building and applying computers because they could, and because there were no old ways yet. The next three and a half decades—a period longer than my life—has been a period of rediscovery, in which a small number of practices have become entrenched and people occasionally find existing, but forgotten, tools and techniques to add to their arsenal, and incrementally advance the entrenched ones.

My suggestion is that the next few decades be a period of uncovery, in which we purposefully seek out those things that have been tried, and tell the stories of how they are:

  • successful because they work;
  • successful because they are well-marketed;
  • successful because they were already deployed before the problems were understood;
  • abandoned because they don’t work;
  • abandoned because they are hard;
  • abandoned because they are misunderstood;
  • abandoned because something else failed while we were trying them.

I imagine a multi-volume book✽, one that is to the art of computer programming as The Art Of Computer Programming is to the mechanics of executing algorithms on a machine. Such a book✽ would be mostly a guide, partly a history, with some, all or more of the following properties:

  • not tied to any platform, technology or other fleeting artefact, though with examples where appropriate (perhaps in a platform invented for the purpose, as MIX, Smalltalk, BBC BASIC and Oberon all were)
  • informed both by academic inquiry and practical experience
  • more accessible than the Software Engineering Body of Knowledge
  • as accepting of multiple dissenting views as Ward’s Wiki
  • at least as honest about our failures as The Mythical Man-Month
  • at least as proud of our successes as The Clean Coder
  • more popular than The Celestial Homecare Omnibus

As TAOCP is a survey of algorithms, so this book✽ would be a survey of techniques, practices and modes of thought. As this century’s programmer can go to TAOCP to compare algorithms and data structures for solving small-scale problems then use selected algorithms and data structures in their own work, so next century’s applier of computing could go to this book✽ to compare techniques and ways of reasoning about problems in computing then use selected techniques and reasons in their own work. Few people would read such a thing from cover to cover. But many would have it to hand, and would be able to get on with the work of invention without having to rewrite all of Doug Engelbart’s work before they could get to the new stuff.

It's dangerous to go alone! Take this.

✽: don’t get hung up on the idea that a book is a collection of quires of some pigmented flat organic matter bound into a codex, though.

posted by Graham at 12:04  

Monday, July 15, 2013

Did that restructuring work actually help?

Before getting into the meat of this post, I’d like to get into the meta of this post. This essay, and I imagine many in this blog [Ed: by which I meant the blog this has been imported from], will be treading a fine line. The intended aim is to question accepted industry practice, and find results consistent or inconsistent with the practice as a beneficial task to perform. I’m more likely to select papers that appear to refute the practice, as that’s more interesting and makes us introspect the way we work more than does affirmation. The danger is that this skates too close to iconoclasm, as expressed in the Goto Copenhagen talk title Is it just me or is everything shit?. My intention isn’t to say that whatever we’re doing is wrong, just to provide some healthy inspection and analysis of our industry.

Legacy Software Restructuring: Analyzing a Concrete Case

The thread in this paper is that metrics that have long been used to measure the quality of source code—metrics related to coupling and cohesion—may not actually be relevant to the problems developers have to solve. Firstly, the jargon:

  • coupling refers to the connections between the part of the software (module, class, function, whatever) under consideration and the rest of the software system. Received wisdom is that lower coupling (i.e. fewer connections that are less-tightly intertwined) is better.
  • cohesion refers to the relatedness of the tasks performed by the (module, class, function, whatever) under consideration. The more different responsibilities a component provides or uses, the lower its cohesion. Received wisdom is that higher cohesion (i.e. fewer responsibilities per module) is better.

We’re told that striving for low coupling and high cohesion will make the parts of our software reusable and replaceable, and will reduce the number of code sites we need to change when we want to fix bugs in the future. The focus of this paper is on whether the metrics we use as proxies for these properties actually represent enhancements to the code; in other words, whether we have a systematic way to decide whether a change is an improvement or not.


The way in which the authors test their metrics is necessarily problematic. There is no objective standard against which they can be prepared—if there were, we’d have an objective standard and we could all go home. They hypothesise that any restructuring effort by a development team must represent an improvement in the codebase: if you didn’t think a change was better, why would you make that change?

Necessity is one such reason. Consider the following thought process:

I need to add this feature to my product, this change was unforeseen at design time, so the architecture doesn’t really support it. I’m not very happy about the this, but shoehorning it in here is the simplest way to support what I need.

To understand other problems with this methodology, a one-paragraph introduction to the postmodern philosophy of software engineering is required. Software, it says, supports not some absolute set of requirements that were derived from studying the universe, but the ad-hoc set of interactions between the various people who engage with the software system. Indeed, the software system itself modifies these interactions, creating a feedback loop that in fact modifies the requirements of the software that was created. Some of the results of this philosophy[*] are expressed in “Manny” Lehman’s Laws of Software Engineering, which are also cited by the paper I’m talking about here. The authors offer one of Lehman’s Laws as:

[*] I don’t consider Lehman’s laws to be objectively true of software artefacts, but to be hypotheses that arise from a particular philosophy of software. I also think that philosophy has value.

Considering Lehman’s law of software evolution, such systems would already have suffered a decrease in their quality due to the maintenance. This would increase the probability that the restructuring has a better modular quality.

This statement is inconsistent. On the one hand, this change improves the quality of some software. On the other hand, the result of a collection of such changes is to decrease the quality. Now there’s nothing to say that a particular change won’t be an improvement; but there’s also nothing to say that the observed change has this property.[*] The postmodern philosophy adds an additional wrinkle: even if this change is better, it’s only better _as perceived by the people currently working with the system_. Others may have different ideas. We saw, in discussing the teaching of programming, that even experienced programmers can have difficulty reading somebody else’s code. I wouldn’t find it a big stretch to posit that different people have different ideas of what constitutes “good” modular decomposition, and that therefore a different set of programmers would think this change to be worse.

[*]Actually I think the sentence in the paper might just be broken; remember that I found this on the preprint server so it might not have been reviewed yet. One of Lehman’s laws says that, for “E-type” software (by which he means systems that evolve with their environment—in other words, systems where a postmodern appraisal is applicable), the software will gradually be perceived as _reducing_ in quality if no maintenance work goes into it. That’s because the system is evolving while the software isn’t; the requirements change without the software catching up.

Results and Discussion

The authors found that, for three particular revisions of Eclipse, the common metrics for coupling and cohesion did not monotonically “improve” with successive restructuring efforts. In some cases, both coupling and cohesion decreased in the same effort. In addition, they found that the number and extent of cyclic dependencies between Java packages increased with every successive version of the platform.

It’s not really possible to choose a conclusion to draw from these results:

  • maybe the dogma of increasing cohesion and decreasing coupling is misleading.
  • maybe the metrics used to measure those properties were poorly chosen (though they are commonly-chosen).
  • maybe the Eclipse developers use some other measurement of quality that the authors didn’t ask about.
  • maybe some of the Eclipse engineers do take these properties into account, and some others don’t, and we [can’t – added on import] even draw general conclusions about Eclipse.

So this paper doesn’t demonstrate that cohesion and coupling metrics are wrong. But it does raise the important question: might they be not right? If you’re relying on some code metrics derived from received wisdom or dogma, it’s time to question whether they really apply to what you do.

posted by Graham at 10:59  

Monday, July 15, 2013

Teaching Programming to People. It’s easy, right?

I was doing a literature search for a different subject (which will appear soon), and found a couple of articles related to teaching programming. I don’t know if you remember when you learnt programming, but you probably found it hard. I’ve had some experience of teaching programming: specifically, teaching C to undergraduates. Said undergraduates, as it happens, weren’t on a computing course (they studied Physics), and only turned up to the few classes they had a year because attendance was mandatory. The lectures, which weren’t compulsory, had fewer students showing up.

Teaching Python to Undergraduates

When I took the course, we were taught a Pascal variant on NeXTSTEP. I have some evidence that Algol had been the first programming language taught on the course; probably on an HLH Orion minicomputer. While Pascal was developed in part as a vehicle to teach structured programming concepts, the academics in the computing course at my department were already starting to see it as a toy language with no practical utility. Such justification was used to look for a different language to teach. As you can infer from the previous paragraph, we settled on C: but not before a test where interested students (myself included) who had, for the most part, already taken the Pascal course. The experiences with Python were written up in a Masters’ thesis by Michael Williams, the student who had converted the (Pascal-based, of course) teaching materials to Python.

Like Wirth, when Guido van Rossum designed Python he had teaching in mind; though knowing the criticisms of Pascal he also made it extensible so that it could be used as a “real” language. This extensibility was put to use in the Python experiment at Oxford, giving students the numpy module which they used mainly for its matrix datatype (an important facility in Physics).

What the report shows is that it’s possible to teach someone enough Python to get onto problems with numerical computation in a day; although clearly this is also true of Pascal and C. One interesting observation is the benefit of enforced layout (Python’s meaningful indentation) to both the students, who reported that they did not find it difficult to indent a program correctly; and to the teachers, who found that because students were coerced into laying out their code consistently, it was easier to read and understand the intention of code.

An interesting open question is whether that means enforced indentation leads to more efficient code reviews in general, not just in an expert/neophyte relationship. Many developers using languages that don’t enforce layout choose to add the enforcement themselves. Whether this is an issue at all when modern IDEs can lay out code automatically (assuming developers with enough experience of the IDE to use that feature) also needs answering.

The conclusion of this study was that Python is appropriate as a teaching language for Oxford’s Physics course, though clearly it was not adopted and C was favoured. Why was this? As this decision was made after the report was produced, it doesn’t say, and my own recollection is hazy. I recall the “not for real world use” lobby was involved, that it was also possible to teach C in the time involved, and that while many people wanted to teach Java this was overruled due to a desire to avoid OO. The spurned Java crowd preferred C for its Java-like syntax.

Wait, C?

The next part of this story wasn’t published, but I’ll cover it anyway just for completeness. The year after this Python study, the teaching course did an A/B test where half of the first year course was taught Python, and half C. Whatever conclusions were drawn from this test, C won out so either there was no significant difference in that type of course or the “real worldness” of C was thought to be greater than that of Python. I remember both being given as justifications, but don’t know whether either or both were retrofitted.

Whatever the cause, teaching C was sufficiently not bad that the course is still based on the language.

Going back to that Java decision. How good is Java as a teaching language?

Analyses of Student Programming Errors In Java Programming Courses

I’m going to use the results of this paper to argue that Java is not good as a teaching language.

Programming errors can be categorized as syntax, semantic and logic. A syntax error is an error due to incorrect grammar. Syntax errors are often detected by a program called a compiler, if the language is a compiled language such as Java. A semantic error is an error due to misuse of a programming concept, despite correct syntactic structure. Semantic errors are caught when the program code is compiled. A logic error occurs when the program does not solve the problem that the programmer meant for it to solve.

[Notice that the study is only investigating errors: it’s not completely obvious but “bugs” aren’t included. The author’s only reporting on things that are either compiler or runtime errors in Java-land, like typos and indices out of bounds.]

Categorization of errors of the present study into syntax, semantic, runtime and logic revealed that syntax errors made up 94.1%, semantic errors 4.7% and logic
errors 1.2%.

In the ideal world, a programming course teaches students the principles of programming and how to combine these to solve some computational problem. In learning these things, you expect people to make semantic and logic errors: they don’t yet know how these things work. Syntax errors, on the other hand, are the compiler’s way of saying “meh, you know what you meant but I couldn’t be bothered to work it out”, or “I require you to jump through some hoops and you didn’t”.

You don’t want syntax errors when you’re teaching programming. You want people to struggle with the problems, not the environment in which those problems are presented. Imagine failing a student because they pushed the door to the exam room when it was supposed to be pulled: that’s a syntax error. One of the roles of a demonstrator in a computing course is to be the magic compiler pixie, fixing syntax errors so the students can get back on track.

OK, so not Java. What else is out there?


The Oxford Physics investigation didn’t publish any results on the difficulty of teaching C. Thankfully, the Other Place is more forthcoming. Tim Love, author of Tim Love’s Cricket for the Dragon 32 and teacher of C++ to Engineering undergraduates blogged about difficulties encountered defining functions in C++. There’s no frequency information here, just representative problems. While most of the problems are semantic rather than syntactic, with some logic problems too, we can’t really compare these with the Java analysis above anyway.

The sorts of problems described in this blog post are largely the kind of problems you want, or at least don’t mind, people experiencing while you’re teaching them programming: as long as they get past them, and understand the difference between what they were trying and what eventually worked.

In that context, comments like this are worrying:

I left one such student to read the documentation for a few minutes, but when I returned to him he was none the wiser. The Arrays section of the doc might be sub-optimal, but it can’t be that bad – it’s much the same as last year’s.

So the teacher knows that the course might have problems, but not what they are or how to correct them despite seeing the failure modes in first person. This is not isolated: the Oxford course linked above has not changed substantially since the version I wrote (which added all the Finder & Xcode stuff).

So, we know something about the problems encountered by students of programming. Do we know anything about how they try to solve those problems?

An analysis of patterns of debugging among novice computer science students

We’ve already seen that students didn’t think to use the interactive interpreter feature of Python when the course handbook stopped telling them about it. In this paper, Ahmadzadeh et al modified the Java compiler to collect analytics about errors encountered by students on their course (the methodology looks very similar to the other Java paper, above). An interesting statistic noticed in §3.1, a statistical analysis of compiler errors:

It can be seen from this table that the error that is most common amongst all the subjects is failing to define a variable before it is used. This was almost always the highest frequency error when teaching a range of different concepts.

It’s possible that you could avoid 30-50% of the problems discovered in this study by using a language that doesn’t require explicit declaration of variables. Would the errors then be replaced by something else? Maybe.

In section 4, the authors note that there’s a distinction between being able to debug effectively and being able to program well: most people who are good at debugging are also good programmers, but a minority of good programmers are good at debugging. Of course this is measuring a class of neophytes so it’s possible that this gap eventually closes, but more work would need to be done to demonstrate or disprove that.

I notice that the students in this test are (at least, initially) fixing problems introduced into a program by someone else. Is that skill related to fixing problems in your own code? Might you be more frustrated if you think you’ve finished an assignment only to find there’s a problem you don’t understand in it? Does debugging someone else’s program support the educational goal? This paper suggests that the skills are in fact different, which is why “good” programmers can be “bad” debuggers: they understand programming, but not the problem solved by someone else’s code. They also suggest that “bad” programmers who do well at fixing bugs do it because they understand the aim of the program, and can reason about what the software should be doing. Perhaps being good at fixing bugs means more understanding of specifications than of code—traditionally the outlook of the tester (who is called on to find bugs but rarely to fix them).

Conclusions and Questions

There’s a surprising amount of data out there on the problems faced by students being taught programming—some of it leads directly to actionable conclusions, or at least testable hypotheses. Despite that, some courses look no different from courses I taught in 2004-2006 nor indeed any different from a course I took in 2000-2001.

Judicious selection of language could help students avoid some of the “syntactic” problems in programming, by choosing languages with less ceremony. Whether such a change would lead to students learning faster, enjoying the topic more, or just bumping up against a different set of syntax errors needs to be tested. But can we extrapolate from this? Are environments that are good for student programmers good for novices in general, including inexperienced professionals? Can this be taken further? Could we conclude that some languages waste time for all programmers, or that becoming expert in programming just means learning to cope with your environment’s idiosyncrasies?

And what should we make of this result that being good at programming and debugging do not go together? Should a programming course aim to develop both skills, or should specialisation be noticed and encouraged early? [Is there indeed a degree in software testing offered at any university?]

But, perhaps most urgently, why are so many different groups approaching this problem independently? Physics and Engineering academics are not experts at teaching computing, and as we’ve seen science code is not necessarily the best code. Could someone aggregate these results from various courses and produce the killer course in undergraduate computing?

posted by Graham at 10:57  
« Previous PageNext Page »

Powered by WordPress