Fuck. This. Shit.

Enough with the subtle allusions of the previous posts. What’s going on here is not right.

It’s not right that I get to pass as a member of the group of people who can work in technology, while others have to justify their very presence in the field.

It’s not right that “looking like me” is a pass to being considered for the best-paid jobs, while “not looking like me” is not.

[that last one took me a long time to understand. To me, it seems like I worked hard to get where I am, but I needed to understand that I was given the opportunity to work at the things I worked at. That all I needed to do was to work at it, not to work at it and convince everyone else that I was eligible to work at it.]

It’s not right that while I get a free pass into various fields of endeavour, others are told that they either slept their way into the field or are groupies or are unfuckable.

Previously, I avoided writing about this stuff because I thought I might get writing about this stuff wrong. Fuck. This. Shit. I’ve got social capital to burn; it’s much easier for me to get another job around this sector than plenty of people who are as good or better than me at doing the work. I might be worried about treading the line between being supportive and getting into trouble, but that’s not as bad as the line women, trans people, non-white people, non-straight people, disabled people have to tread between asking to be considered equally and keeping their jobs have to tread. I have one job: doing my job. I do not have two jobs: doing my job, and convincing people that someone like me should be allowed to do my job. If the cost of equality is giving up my free ride, then I give up my free ride.

The pipeline is not the problem, it leads to a vat of acid. No-one wants to lean in to a vat of acid. (Thanks to Cate Huston for that metaphor, it made a lot of sense to me.)

Our industry is exclusive, and needs to be inclusive. What should you do about this? I don’t know, I’m far from knowledgable. If your position is “I agree with the straight white guy that the world is broken, I should ask the straight white guy how to fix it” then perhaps you are the problem, just as I have been and am the problem.

What should I do about this? First step for me is to listen. To not tell people who are describing their experiences what my experiences are. To avoid thinking about my reply to people, and to think about what they’ve said. To stop looking for holes in arguments and to listen for opportunities to grow. Not just to grow me, but to grow others.

Sitting on the Sidelines

Thank you, James Hague, for your article You Can’t Sit on the Sidelines and Become a Philosopher. I got a lot out of reading it, because I identified myself in it. Specifically in this paragraph:

There’s another option, too: you could give up. You can stop making things and become a commentator, letting everyone know how messed-up software development is. You can become a philosopher and talk about abstract, big picture views of perfection without ever shipping a product based on those ideals. You can become an advocate for the good and a harsh critic of the bad. But though you might think you’re providing a beacon of sanity and hope, you’re slowly losing touch with concrete thought processes and skills you need to be a developer.

I recognise in myself a lot of the above, writing long, rambling, tedious histories; describing how others are doing it wrong; and identifying inconsistencies without attempting to resolve them. Here’s what I said in that last post:

I feel like I ought to do something about some of that. I haven’t, and perhaps that makes me the guy who comes up to a bunch of developers, says “I’ve got a great idea” and expects them to make it.

Yup, I’m definitely the person James was talking about. But he gave me a way out, and some other statements that I can hope to identify with:

You have to ignore some things, because while they’re driving you mad, not everyone sees them that way; you’ve built up a sensitivity. […] You can fix things, especially specific problems you have a solid understanding of, and probably not the world of technology as a whole.

The difficulty is one of choice paralysis. Yes, all of those broken things are (perhaps literally) driving me mad, but there’s always the knowledge that trying to fix any one of them means ignoring all of the others. Like the out of control trolley, it’s easier to do nothing and pretend I’m not part of the problem than to deliberately engage with choosing some apparently small part of it. It’s easier to read Alan Kay, to watch Bret Victor and Doug Engelbart and to imagine some utopia where they had greater influence. A sort of programmerpunk fictional universe.

As long as you eventually get going again you’ll be fine.

Hopefully.

Intra-curricular activities

I’m apparently fascinated by the idea of defining curricula for learning programming. I’ve written about how we need to be careful what we try to pay forward from the way we learned in the past, and I’ve talked about how we do need to pay it forward so that the second hundred years see faster progress than the first hundred years.

I’m a fan (with reservations, as seen below) of the book series as a form of curriculum. Take something like Kent Beck’s signature series, which covers a decent subset of both technical and social approaches in software development in breadth and in depth. You could probably imagine developers who would benefit from reading some or all of the books in the series. In fact, you may be one.

Coping with people approaching the curriculum from different skill levels and areas of experience is hard. Not just for the book series, it’s hard in general. Universities take the simplifying approach of assuming that everybody wants to learn the same stuff, and teaching that stuff. And to some extent that’s easy for them, because the backgrounds of prospective students is relatively uniform. Even so, my University course organised incoming students into two groups; those who had studied complex numbers at A-level and those who had not. The difference was simply that the group who had not were given a couple of lectures on complex numbers, then it was assumed that they also knew the topic from the fourth week.

Now consider selling a programming book to the public. Part of the proposal process with all of the publishers I’ve worked with has been describing the target audience. Is this a book for people who have never programmed before? For people who have programmed a little, but never used this particular tool or technique? People who have programmed a lot but never used this tool? Is this thing similar to what they have used before, or very different? For people who are somewhat familiar with the tool? For experts (and how is that defined)? Is it for readers comfortable with maths? For readers with no maths background?

Every “no” in answer to one of those questions is an opportunity to improve the experience for a subset of the potential audience by tailoring it to that subset. It’s also an opportunity to exclude a subset of the audience by making the content less relevant to them.

[I’ll digress here to explain how I worked that out for my books: whether it’s selfishness or a failure of empathy, I wrote books that I wanted to read but that didn’t exist. Therefore the expected experience is something similar to mine, back when I filled in the proposal form.]

Clearly no single publication will cover the whole phase space of potential readers and be any good. The interesting question is how much it’s worth covering with multiple publications; whether the idea of series-as-curriculum pulls in the general direction as much as scope-limiting each book pulls in the specific. Should the curriculum take readers on a straight line from novice to master? Should it “fan in” from multiple introductions? Should it “fan out” in multiple directions of interest and enquiry? Would a non-linear curriculum be inclusive or offputtingly confusing? Should the questions really be answered by substituting the different question “how many people would buy that”?

Things I believe

The task of producing software is one of choosing and creating constraints, rules and abstractions inside a system which provides very few a priori. Typically we select a large collection of pre-existing constraints, rules and abstractions upon which to base our own: models of computation, programming and deployment environments, everything from the size of the register file to the way in which text is represented on the display is theoretically up for grabs, but we impose limitations in their freedom upon ourselves when we create a new product.

None of these limitations is essential. Many are conventional, and have become so embedded in the cultural practice of making software that it would be expensive or impractical to choose alternative options. Still others have so much rhetoric surrounding them that the emotional cost of change is too great to bear.

So what are these restrictions? Here’s a list of mine. I accept that they don’t all apply to you. I accept that many of them have alternatives. Indeed I believe that all of them have alternatives, and that enumerating them is the first thing that lets me treat them as assumptions to be challenged.

  1. Computers use the same memory for programs and data. I know the alternatives exists but wouldn’t know how to start using them.
  2. Memory is a big blob of uniform storage. Like above, except I know this one isn’t true but that I just ignore that detail.
  3. Memory and bus wires can be in one of two states.
  4. There probably is a free-form hierarchical database available.
  5. There is a thing called a stack and a thing called a heap, and the difference between the two is important.
  6. There is no point trying to do a better job at multiprocessing than the operating system.
  7. There is an operating system.
  8. The operating system, file system, indeed any first system on which my thing is a second system; those first systems are basically interchangeable.
  9. I can buy a faster thing (except in mobile, where I can’t).
  10. Whatever processor you gave me behaves correctly.
  11. Whatever compiler you gave me behaves correctly.
  12. Whatever library you gave me probably behaves correctly.
  13. Text is a poor way to represent a computer program but is the best we have.
  14. The way to write a computer program is to tell the computer what to do.
  15. The goal of the industry for last few decades has been the DynaBook.
  16. I still do not need a degree in computer science.
  17. I should know what my software does before I give it to the people who need to use it.
  18. The universal runtime environment is the C system.
  19. Processors today are basically like faster versions of the MC68000.
  20. Platform vendors no longer see lock-in as a goal, but do see it as a convenient side-effect.
  21. You will look after drawing pictures, playing videos, and making sounds for me.
  22. Types are optional.

Preparing for Computing’s Big One-Oh-Oh

However you slice the pie, we’re between two and three decades away from the centenary celebration for applied computing (which is of course significantly after theoretical or hypothetical advances made by the likes of Lovelace, Turing and others). You might count the anniversary of Colossus in 2043, the ENIAC in 2046, or maybe something earlier (and arguably not actually applied) like the Z3 or ABC (both 2041). Whichever one you pick, it’s not far off.

That means that the time to start organising the handover from the first century’s programmers to the second is now, or perhaps a little earlier. You can see the period from the 1940s to around 1980 as a time of discovery, when people invented new ways of building and applying computers because they could, and because there were no old ways yet. The next three and a half decades—a period longer than my life—has been a period of rediscovery, in which a small number of practices have become entrenched and people occasionally find existing, but forgotten, tools and techniques to add to their arsenal, and incrementally advance the entrenched ones.

My suggestion is that the next few decades be a period of uncovery, in which we purposefully seek out those things that have been tried, and tell the stories of how they are:

  • successful because they work;
  • successful because they are well-marketed;
  • successful because they were already deployed before the problems were understood;
  • abandoned because they don’t work;
  • abandoned because they are hard;
  • abandoned because they are misunderstood;
  • abandoned because something else failed while we were trying them.

I imagine a multi-volume book✽, one that is to the art of computer programming as The Art Of Computer Programming is to the mechanics of executing algorithms on a machine. Such a book✽ would be mostly a guide, partly a history, with some, all or more of the following properties:

  • not tied to any platform, technology or other fleeting artefact, though with examples where appropriate (perhaps in a platform invented for the purpose, as MIX, Smalltalk, BBC BASIC and Oberon all were)
  • informed both by academic inquiry and practical experience
  • more accessible than the Software Engineering Body of Knowledge
  • as accepting of multiple dissenting views as Ward’s Wiki
  • at least as honest about our failures as The Mythical Man-Month
  • at least as proud of our successes as The Clean Coder
  • more popular than The Celestial Homecare Omnibus

As TAOCP is a survey of algorithms, so this book✽ would be a survey of techniques, practices and modes of thought. As this century’s programmer can go to TAOCP to compare algorithms and data structures for solving small-scale problems then use selected algorithms and data structures in their own work, so next century’s applier of computing could go to this book✽ to compare techniques and ways of reasoning about problems in computing then use selected techniques and reasons in their own work. Few people would read such a thing from cover to cover. But many would have it to hand, and would be able to get on with the work of invention without having to rewrite all of Doug Engelbart’s work before they could get to the new stuff.

It's dangerous to go alone! Take this.

✽: don’t get hung up on the idea that a book is a collection of quires of some pigmented flat organic matter bound into a codex, though.

On too much and too little

In the following text, remember that words like me or I are to be construed in the broadest possible terms.

It’s easy to be comfortable with my current level of knowledge. Or perhaps it’s not the value, but the derivative of the value: the amount of investment I’m putting into learning a thing. Anyway, it’s easy to tell stories about why the way I’m doing it is the right, or at least a good, way to do it.

Take, for example, object-oriented design. We have words to describe insufficient object-oriented design. Spaghetti Code, or a Big Ball of Mud. Obviously these are things that I never succumb to, but other people do. So clearly (actually, not clearly at all, but that’s beside the point) there is some threshold level of design or analysis practice that represents an acceptable minimum. Whatever that value is, it’s less than the amount that I do.

Interestingly there are also words to describe the over-application of object-oriented design. Architecture Astronauts, for example, are clearly people who do too much architecture (in the same way that NASA astronauts got carried away with flying and overdid it, I suppose). It’s so cold up in space that you’ll catch a fever, resulting in Death by UML Fever. Clearly I am only ever responsible for tropospheric architecture, thus we conclude that there is some acceptable maximum threshold for analysis and design too.

The really convenient thing is that my current work lies between these two limits. In fact, I’m comfortable in saying that it always has.

But wait. I also know that I’m supposed to hate the code that I wrote six months ago, probably because I wasn’t doing enough of whatever it is that I’m doing enough of now. But I don’t remember thinking six months ago that I was below the threshold for doing acceptable amounts of the stuff that I’m supposed to be doing. Could it be, perhaps, that the goalposts have conveniently moved in that time?

Of course they have. What’s acceptable to me now may not be in the future, either because I’ve learned to do more of it or because I’ve learned that I was overdoing it. The trick is not so much in recognising that, but in recognising that others who are doing more or less than me are not wrong, they could in fact be me at a different point on my timeline but with the benefit that they exist now so I can share my experiences with them and work things out together. Or they could be someone with a completely different set of experiences, which is even more exciting as I’ll have more stories to swap.

When it comes to techniques and devices for writing software, I tend to prefer overdoing things and then finding out which bits I don’t really need after all, rather than under-application. That’s obviously a much larger cognitive and conceptual burden, but it stems from the fact that I don’t think we really have any clear ideas on what works and what doesn’t. Not much in making software is ever shown to be wrong, but plenty of it is shown to be out of fashion.

Let me conclude by telling my own story of object-oriented design. It took me ages to learn object-oriented thinking. I learned the technology alright, and could make tools that used the Objective-C language and Foundation and AppKit, but didn’t really work out how to split my stuff up into objects. Not just for a while, but for years. A little while after that Death by UML Fever article was written, my employer sent me to Sun to attend their Object-Oriented Analysis and Design Using UML course.

That course in itself was a huge turning point. But just as beneficial was the few months afterward in which I would architecturamalise all the things, and my then-manager wisely left me to it. The office furniture was all covered with whiteboard material, and there soon wasn’t a bookshelf or cupboard in my area of the office that wasn’t covered with sequence diagrams, package diagrams, class diagrams, or whatever other diagrams. I probably would’ve covered the external walls, too, if it wasn’t for Enterprise Architect. You probably have opinions(TM) of both of the words in that product’s name. In fact I also used OmniGraffle, and dia (my laptop at the time was an iBook G4 running some flavour of Linux).

That period of UMLphoria gave me the first few hundred hours of deliberate practice. It let me see things that had been useful, and that had either helped me understand the problem or communicate about it with my peers. It also let me see the things that hadn’t been useful, that I’d constructed but then had no further purpose for. It let me not only dial back, but work out which things to dial back on.

I can’t imagine being able to replace that experience with reading web articles and Stack Overflow questions. Sure, there are plenty of opinions on things like OOA/D and UML on the web. Some of those opinions are even by people who have tried it. But going through that volume of material and sifting the experience-led advice from the iconoclasm or marketing fluff, deciding which viewpoints were relevant to my position: that’s all really hard. Harder, perhaps, than diving in and working slowly for a few months while I over-practice a skill.

Some so-called expert

There’s a comedy sketch being frequently tweeted called The Expert. Now, all programmers will be aware that there is nothing funnier than interpreting a joke literally and telling everyone the many ways in which it’s wrong, and that there is no way to be seen as a more intelligent and empathetic person than to do this. So here we go: what are all the inexpert things this “expert” does?

Firstly, having been told how important the strategic initiative is, he makes no attempt to actually find out what it is, and how his task is connected to the objectives described. This means that he doesn’t know anything about the context of his work, which is just setting himself up for all sorts of trouble. It’s like a programmer going “yeah sure, I can add a second copy of that goto line” without checking whether they’re working on some sort of security-sensitive module.

He refuses to accept any form of creative solution to the problem, and his project manager is correct to try to tactfully defer his immediate refusal to do the work asked. Immediately saying “no, I can’t do that” is identical to saying “I have never done that, and I cannot imagine any novelty entering my life”. This is not symptomatic of expertise, but of narrow-mindedness.

A pause, and a gathering of resources, leads us to conclude that some of the tasks set are eminently achievable, making this alleged expert look like the comfort-zone-hogging risk-averse luddite that perhaps he is. Of course you can draw a red line with inks of other colours, for example. You simply rely on the relativistic Doppler effect, or on fluorescent properties of the materials. Of course you can draw seven lines all perpendicular, if your diagram can extend into seven dimensions. And that is of course assuming a Euclidean geometry for the diagram; an assumption that our “I know best” expert doesn’t even think to question. Alternatively, you can find out what the time-dependent evolution of the diagram is, as it may be that a total of seven lines that are each instantaneously perpendicular to the other lines present but that do not all simultaneously exist is a sufficient solution. Again, our unimaginative expert doesn’t think about that. In fact, he never really explores whether the perpendicularity requirement means mutually perpendicular, he just proceeds to mansplain to the client representative why he is right and she is wrong.

Assured of his expertise, he then injects sarcasm into his voice in a condescending fashion. “I’m sure your target audience doesn’t exist solely of those people.” Again, this is indicative of a lack of empathy and an unwillingness to consider other viewpoints than his own.

Although, having said that, he’s pretty quick to demur to authority, and on the few occasions that he does want to enquire about the requirements, does not pursue the matter if someone else interrupts.

This is an “expert” who is going to go away with an incomplete understanding of the problem, and will likely fail to give a satisfactory solution. Often such people will then seek to externalise any responsibility for the failure, complaining that the requirements weren’t clear or that the clients had unrealistic expectations. Maybe they weren’t and they did, but as an expert it’s his responsibility to understand those and apply his skills to solving the problem at hand, not to find ways to throw other people under the proverbial bus.

The manager in this video is clearly the sanest voice, and also manages to keep his frustration at his own mistake somewhat bottled. The extent of that mistake? He has contracted an “expert in a narrow field”, who “doesn’t see the overall picture”, and put him in a meeting with their client for which he was totally unprepared. So it’s a shame that the expert’s grandest commitment—to inflate a balloon of unknown quality and structure into the shape of a kitten—is made without the manager around to intermediate. He might have been able to intervene before the physical contact between the “expert” and the designer, which should be considered wholly inappropriate for a business meeting.

Maybe it was a mistake to put someone so junior in front of the client without some coaching. Hopefully, with appropriate mentoring and support, our “expert” can grow to be a mature, empathetic and positive contributor to his team.

Where am I going with this?

I recently asked how people would describe this Secure Mac Programming blog were they trying to tell someone else they should read it. Of all the answers, the one that most succinctly sums up the trouble with the old name is from Alan:

@secboffin Not Just Secure, Not Just Mac, Not Just Programming.

I’m probably in the midst of some existential crisis, having spent a couple of years thinking and writing about philosophy, ethics, and the social responsibility of my work and its context. It’s clear that I’m dealing with some conflict, and it doesn’t look like reconciliation is an option.

Often I write about ideas that are still knocking around my head, such that I never come to any conclusion. I’ve used multiple choice conclusions, conclusions that appear to be from a different argument, and have concluded that my entire argument may or may not be useful.

This is just something I need to work out: what do I think I do, what do other people think I do, what parts of that do I like and dislike, are there other things I would like, can I replace the disliked parts with the liked parts, and so on. I write it here as you may have related ideas, or you may be thinking about the same things yourself and benefit from knowing that other people are, too.

What I know includes a list of things that currently interest me:

With all that in mind, I’m happy to introduce the beginning of a slow rebranding of this blog. It is now called the Structure and Interpretation of Computer Programmers, and can be found at http://www.sicpers.info/ in addition to its previous home at http://blog.securemacprogramming.com.

I do not intend to remove the old domain or break existing feed subscriptions. Over time (basically, as I work out how to do it) I’ll migrate links, feed entries and so on to reference the new domain, and the age-old updated mission of the blog.

Software, Science?

Is there any science in software making? Does it make sense to think of software making as scientific? Would it help if we could?

Hold on, just what is science anyway?

Good question. The medieval French philosopher-monk Buridan said that the source of all knowledge is experience, and Richard Feynman paraphrased this as “the test of all knowledge is experiment”.

If we accept that science involves some experimental test of knowledge, then some of our friends who think of themselves as scientists will find themselves excluded. Consider the astrophysicists and cosmologists. It’s all very well having a theory about how stars form, but if you’re not willing to swirl a big cloud of hydrogen, helium, lithium, carbon etc. about and watch what happens to it for a few billion years then you’re not doing the experiment. If you’re not doing the experiment then you’re not a scientist.

Our friends in what are called the biological and medical sciences also have some difficulty now. A lot of what they do is tested by experiment, but some of the experiments are not permitted on ethical grounds. If you’re not allowed to do the experiment, maybe you’re not a real scientist.

Another formulation (OK, I got this from the wikipedia entry on Science) sees science as a sort of systematic storytelling: the making of “testable explanations and predictions about the universe”.

Under this definition, there’s no problem with calling astronomy a science: you think this is how things work, then you sit, and watch, and see whether that happens.

Of course a lot of other work fits into the category now, too. There’s no problem with seeing the “social sciences” as branches of science: if you can explain how people work, and make predictions that can (in principle, even if not in practice) be tested, then you’re doing science. Psychology, sociology, economics: these are all sciences now.

Speaking of the social sciences, we must remember that science itself is a social activity, and that the way it’s performed is really defined as the explicit and implicit rules and boundaries created by all the people who are doing it. As an example, consider falsificationism. This is the idea that a good scientific hypothesis is one that can be rejected, rather than confirmed, by an appropriately-designed experiment.

Sounds pretty good, right? Here’s the interesting thing: it’s also pretty new. It was mostly popularised by Karl Popper in the 20th Century. So if falsificationism is the hallmark of good science, then Einstein didn’t do good science, nor did Marie Curie, nor Galileo, or a whole load of other people who didn’t share the philosophy. Just like Dante’s Virgil was not permitted into heaven because he’d been born before Christ and therefore could not be a Christian, so all of the good souls of classical science are not permitted to be scientists because they did not benefit from Popper’s good message.

So what is science today is not necessarily science tomorrow, and there’s a sort of self-perpetuation of a culture of science that defines what it is. And of course that culture can be critiqued. Why is peer review appropriate? Why do the benefits outweigh the politics, the gazumping, the gender bias? Why should it be that if falsification is important, negative results are less likely to be published?

Let’s talk about Physics

Around a decade ago I was studying particle physics pretty hard. Now there are plenty of interesting aspects to particle physics. Firstly that it’s a statistics-heavy discipline, and that results in statistics are defined by how happy you are with them, not by some binary right/wrong criterion.

It turns out that particle physicists are a pretty conservative bunch. They’ll only accept a particle as “discovered” if the signal indicating its existence is measured as a five-sigma confidence: i.e. if there’s under a one-on-a-million chance that the signal arose randomly in the absence of the particle’s existence. Why five sigma? Why not three (a 99.7% confidence) or six (to keep Motorola happy)? Why not repeat it three times and call it good, like we did in middle school science classes?

Also, it’s quite a specialised discipline, with a clear split between theory and practice and other divisions inside those fields. It’s been a long time since you could be a general particle physicist, and even longer since you could be simply a “physicist”. The split leads to some interesting questions about the definition of science again: if you make a prediction which you know can’t be verified during your lifetime due to the lag between theory and experimental capability, are you still doing science? Does it matter whether you are or not? Is the science in the theory (the verifiable, or falsifiable, prediction) or in the experiment? Or in both?

And how about Psychology, too

Physicists are among the most rational people I’ve worked with, but psychologists up the game by bringing their own feature to the mix: hypercriticality. And I mean that in the technical sense of criticism, not in the programmer “you’re grammar sucks” sense.

You see, psychology is hard, because people are messy. Physics is easy: the apple either fell to earth or it didn’t. Granted, quantum gets a bit weird, but it generally (probably) does its repeatable thing. We saw above that particle physics is based on statistics (as is semiconductor physics, as it happens); but you can still say that you expect some particular outcome or distribution of outcomes with some level of confidence. People aren’t so friendly. I mean, they’re friendly, but not in a scientific sense. You can do a nice repeatable psychology experiment in the lab, but only by removing so many confounding variables that it’s doubtful the results would carry over into the real world. And the experiment only really told you anything about local first year psychology undergraduates, because they’re the only people who:

  1. walked past the sign in the psychology department advertising the need for participants;
  2. need the ten dollars on offer for participation desperately enough to turn up.

In fact, you only really know about how local first year psychology undergraduates who know they’re participating in a psychology experiment behave. The ethics rules require informed consent which is a good thing because otherwise it’s hard to tell the difference between a psychology lab and a Channel 4 game show. But it means you have to say things like “hey this is totally an experiment and there’ll be counselling afterward if you’re disturbed by not really electrocuting the fake person behind the wall” which might affect how people react, except we don’t really know because we’re not allowed to do that experiment.

On the other hand, you can make observations about the real world, and draw conclusions from them, but it’s hard to know what caused what you saw because there are so many things going on. It’s pretty hard to re-run the entire of a society with just one thing changed, like “maybe if we just made Hitler an inch taller then Americans would like him, or perhaps try the exact same thing as prohibition again but in Indonesia” and other ideas that belong in Philip K. Dick novels.

So there’s this tension: repeatable results that might not apply to the real world (a lack of “ecological validity”), and real-world phenomena that might not be possible to explain (a lack of “internal validity”). And then there are all sorts of other problems too, so that psychologists know that for a study to hold water they need to surround what they say with caveats and limitations. Thus is born the “threats to validity” section on any paper, where the authors themselves describe the generality (or otherwise) of their results, knowing that such threats will be a hot topic of discussion.

But all of this—the physics, the psychology, and the other sciences—is basically a systematised story-telling exercise, in which the story is “this is why the universe is as it is” and the system is the collection of (time-and-space-dependent) rules that govern what stories may be told. It’s like religion, but with more maths (unless your religion is one of those ones that assigns numbers to each letter in a really long book then notices that your phone number appears about as many times as a Poisson distribution would suggest).

Wait, I think you were talking about software

Oh yeah, thanks. So, what science, if any, is there in making software? Does there exist a systematic approach to storytelling? First, let’s look at the kinds of stories we need to tell.

The first are the stories about the social system in which the software finds itself: the story of the users, their need (or otherwise) for a software system, their reactions (or otherwise) to the system introduced, how their interactions with each other change as a result of introducing the system, and so on. Call this requirements engineering, or human-computer interaction, or user experience; it’s one collection of stories.

You can see these kinds of stories emerging from the work of Manny Lehman. He identifies three types of software:

  • an S-system is exactly specified.
  • a P-system executes some known procedure.
  • an E-system must evolve to meet the needs of its environment.

It may seem that E-type software is the type in which our stories about society are relevant, but think again: why do we need software to match a specification, or to follow a procedure? Because automating that specification or procedure is of value to someone. Why, or to what end? Why that procedure? What is the impact of automating it? We’re back to telling stories about society. All of these software systems, according to Lehman, arise from discovery of a problem in the universe of discourse, and provide a solution that is of possible interest in the universe of discourse.

The second kind are the stories about how we worked together to build the software we thought was needed in the first stories. The practices we use to design, build and test our software are all tools for facilitating the interaction between the people who work together to make the things that come out. The things we learned about our own society, and that we hope we can repeat (or avoid) in the future, become our design, architecture, development, testing, deployment, maintenance and support practices. We even create our own tools—software for software’s sake—to automate, ease or disrupt our own interactions with each other.

You’ll mostly hear the second kind of story at most developer conferences. I believe that’s because the people who have most time and inclination to speak at most developer conferences are consultants, and they understand the second stories to a greater extent than the first because they don’t spend too long solving any particular problem. It’s also because most developer conferences are generally about making software, not about whatever problem it is that each of the attendees is trying to solve out in the world.

I’m going to borrow a convention that Rob Rix told me in an email, of labelling the first type of story as being about “external quality” and the second type about “internal quality”. I went through a few stages of acceptance of this taxonomy:

  1. Sounds like a great idea! There really are two different things we have to worry about.
  2. Hold on, this doesn’t sounds like such a good thing. Are we really dividing our work into things we do for “us” and things we do for “them”? Labelling the non-technical identity? That sounds like a recipe for outgroup homogeneity effect.
  3. No, wait, I’m thinking about it wrong. The people who make software are not the in-group. They are the mediators: it’s just the computers and the tools on one side of the boundary, and all of society on the other side. We have, in effect, the Janus Thinker: looking on the one hand toward the external stories, on the other toward the internal stories, and providing a portal allowing flow between the two.

JANUS (from Vatican collection) by Flickr user jinnrouge

So, um, science?

What we’re actually looking at is a potential social science: there are internal stories about our interactions with each other and external stories about our interactions with society and of society’s interactions with the things we create, and those stories could potentially be systematised and we’d have a social science of sorts.

Particularly, I want to make the point that we don’t have a clinical science, an analogy drawn by those who investigate evidence-based software engineering (which has included me, in my armchair way, in the past). You can usefully give half of your patients a medicine and half a placebo, then measure survival or recovery rates after that intervention. You cannot meaningfully treat a software practice, like TDD as an example, as a clinical intervention. How do you give half of your participants a placebo TDD? How much training will you give your ‘treatment’ group, and how will you organise placebo training for the ‘control’ group? [Actually I think I’ve been on some placebo training courses.]

In constructing our own scientific stories about the world of making software, we would run into the same problems that social scientists do in finding useful compromises between internal and ecological validity. For example, the oft-cited Exploratory experimental studies comparing online and offline programming performance (by Sackman et al., 1968) is frequently used to support the notion that there are “10x programmers”, that some people who write software just do it ten times faster than others.

However, this study does not have much ecological validity. It measures debugging performance, using either an offline process (submitting jobs to a batch system) or an online debugger called TSS, which probably isn’t a lot like the tools used in debugging today. The problems were well-specified, thus removing many of the real problems programmers face in designing software. Participants were expected to code a complete solution with no compiler errors, then debug it: not all programmers work like that. And where did they get their participants from? Did they have a diverse range of backgrounds, cultures, education, experience? It does not seem that any results from that study could necessarily apply to modern software development situated in a modern environment, nor could the claim of “10x programmers” necessarily generalise as we don’t know who is 10x better than whom, even at this one restricted task.

In fact, I’m also not convinced of its internal validity. There were four conditions (two programming problems and two debugging setups), each of which was assigned to six participants. Variance is so large that most of the variables are independent of each other (the independent variables are the programming problem and the debugging mode, and the dependent variables are the amount of person-time and CPU-time), unless the authors correlate them with “programming skill”. How is this skill defined? How is it measured? Why, when the individual scores are compared, is “programming skill” not again taken into consideration? What confounding variables might also affect the wide variation in scored reported? Is it possible that the fastest programmers had simply seen the problem and solved it before? We don’t know. What we do know is that the reported 28:1 ratio between best and worst performers is across both online and offline conditions (as pointed out in, e.g., The Leprechauns of Software Engineering, so that’s definitely a confounding factor. If we just looked at two programmers using the same environment, what difference would be found?

We had the problem that “programming skill” is not well-defined when examining the Sackman et al. study, and we’ll find that this problem is one we need to overcome more generally before we can make the “testable explanations and predictions” that we seek. Let’s revisit the TDD example from earlier: my hypothesis is that a team that adopts the test-driven development practice will be more productive some time later (we’ll defer a discussion of how long) than the null condition.

OK, so what do we mean by “productive”? Lines of code delivered? Probably not, their amount varies with representation. OK, number of machine instructions delivered? Not all of those would be considered useful. Amount of ‘customer value’? What does the customer consider valuable, and how do we ensure a fair measurement of that across the two conditions? Do bugs count as a reduction in value, or a need to do additional work? Or both? How long do we wait for a bug to not be found before we declare that it doesn’t exist? How is that discovery done? Does the expense related to finding bugs stay the same in both cases, or is that a confounding variable? Is the cost associated with finding bugs counted against the value delivered? And so on.

Software dogma

Because our stories are not currently very testable, many of them arise from a dogmatic belief that some tool, or process, or mode of thought, is superior to the alternatives, and that there can be no critical debate. For example, from the Clean Coder:

The bottom line is that TDD works, and everybody needs to get over it.

No room for alternatives or improvement, just get over it. If you’re having trouble defending it, apply a liberal sprinkle of argumentum ab auctoritate and tell everyone: Robert C. Martin says you should get over it!

You’ll also find any number of applications of the thought-terminating cliché, a rhetorical technique used to stop cognitive dissonance by allowing one side of the issue to go unchallenged. Some examples:

  • “I just use the right tool for the job”—OK, I’m done defending this tool in the face of evidence. It’s just clearly the correct one. You may go about your business. Move along.
  • “My approach is pragmatic”—It may look like I’m doing the opposite of what I told you earlier, but that’s because I always do the best thing to do, so I don’t need to explain the gap.
  • “I’m passionate about [X]”—yeah, I mean your argument might look valid, I suppose, if you’re the kind of person who doesn’t love this stuff as much I do. Real craftsmen would get what I’m saying.
  • and more.

The good news is that out of such religious foundations spring the shoots of scientific thought, as people seek to find a solid justification for their dogma. So just as physics has strong spiritual connections, with Steven Hawking concluding in A Brief History of Time:

However, if we discover a complete theory, it should in time be understandable by everyone, not just by a few scientists. Then we shall all, philosophers, scientists and just ordinary people, be able to take part in the discussion of the question of why it is that we and the universe exist. If we find the answer to that, it would be the ultimate triumph of human reason — for then we should know the mind of God.

and Einstein debating whether quantum physics represented a kind of deific Dungeons and Dragons:

[…] an inner voice tells me that it is not yet the real thing. The theory says a lot, but does not really bring us any closer to the secret of the “old one.” I, at any rate, am convinced that He does not throw dice.

so a (social) science of software could arise as an analogous form of experimental theology. I think the analogy could be drawn too far: the context is not similar enough to the ages of Islamic Science or of the Enlightenment to claim that similar shifts to those would occur. You already need a fairly stable base of rational science (and its application via engineering) to even have a computer at all upon which to run software, so there’s a larger base of scientific practice and philosophy to draw on.

It’s useful, though, when talking to a programmer who presents themselves as hyper-rational, to remember to dig in and to see just where the emotions, fallacious arguments and dogmatic reasoning are presenting themselves, and to wonder about what would have to change to turn any such discussion into a verifiable prediction. And, of course, to think about whether that would necessarily be a beneficial change. Just as we can critique scientific culture, so should we critique software culture.

Principled Lizards

Sixty-five million years ago, there were many huge lizards. Most of them were really happy being lizards, and would spend all of the time they could doing lizardy things. Some wanted to be the biggest lizards, and grew so large and so heavy that it would sound like peals of thunder if you could hear them walking about on their lizardy way. Others wanted to be the most terrible lizards, and they developed big scary teeth and sharp, shiny talons. The most terrible lizards were feared by many of the other lizards, but it was a fear that sprang from awe: they were all happy that each was, in their own way, the most lizardy of the lizards. And they were all happy that each of the other lizards they met was trying to be, in their own way, the most lizardy of lizards.

For the lizards met often. They would have their big get-togethers where the big lizards and the small lizards and the terrible lizards and the scaly lizards would each talk about how they handle being so big, or so small, or so terrible, or so scaly. And the other lizards would listen to these talks, and they would applaud the speakers for being so big, or so small, or so terrible, or so scaly. Having seen these examples of lizardly apotheosis, they would try to emulate them. So it was that the lizard world became bigger, but also smaller, and more terrible, and more scaly.

But it seems that not all of the lizards shared these goals of ever-increasing lizardhood. Some would try different things. A group of lizards found that they could regulate their own blood temperature, they would no longer need to sit in the sun all morning like the other lizards. One group of lizards turned their feathery covering to the task of improved aerodynamics. Another group turned it to a sort of coat, which stopped them getting so cold.

The big meetings of lizardy lizards did not really pay these developments much notice, as they were not very lizard like changes. They knew that they were lizards! They should do the lizardy things, like getting bigger or smaller or more terrible or more scaly! They put over eighty hours a week into it, they were passionate about it. The world was, for them, all about being more lizardly every day.

Some of the things that the decidedly non-lizardlike groups were coming up with did take a sort of root among those who called themselves the “lizard community”, but only to the extent that they could be seen as lizardy things. So ideas from the feather aerodynamics group became diluted, and were called “flight-oriented lizarding”. At the big gatherings of all the lizards, the FOL evangelists would show how they had made things that looked a bit like the feathers used for aerodynamics, but which were more lizardy. They had some benefit to lizards in that they slowed them down slightly as they fell out of trees. And, of course, as this was something that you had to be able to demonstrate expert lizardly competence in, they invented the idea of the master flight-oriented lizard.

All sorts of rules were invented to demonstrate competency and master-lizardliness in the flight-oriented world. This feather and that feather must each have a single responsibility: this for slowing the fall, that for turning. Feathers must be open for falling but closed for impact. Specific types of feathers could be invented, but only where they could be used in place of the more generic feathers. Feathers had to be designed so that they never got into the area around a lizard’s eyes (the in-the-face segregation principle). Despite the fact that flight-oriented lizards only used their feathers for falling out of trees, feathers had to be designed to work when travelling upwards too (the descendency inversion principle).

But to the expert lizards—the biggest, smallest, scaliest and most terrible lizards—something felt uncomfortable. It felt like people were saying that there was something else to do than being an expert lizard, as if lizardness wasn’t enough. So, of course, they arranged another meeting of all the lizards. Expert lizards and novice lizards and improving lizards all came together, that one day sixty-five million years ago, and they met in the town of Chicxulub. And the most expert of the expert lizards got up in front of all the lizards, and said this:

If you want to carry on at lizarding you have to really love it. You’ve got to want to put every waking moment into becoming a better lizard. You’ve got to look up after practising your lizarding, and be shocked at how much time has gone past. If that isn’t you, if you don’t absolutely love everything about lizarding, perhaps it’s time to move on and do something else.

Many of the expert lizards agreed with this idea, and were pleased with themselves. But many that had been trying other things, the fur or the flying or the warm blood, were confused: did they want to be lizards forever, and strive toward the best of lizardliness, or not? Did they perhaps want to explore the opportunities presented by warm blood, or flying, or fur?

And so it was that at Chicxulub, as a rock from outer space danced through the upper atmosphere, pushing and heating and ionising the air in front of it, people chose between the many paths open to them.