The package management paradox

There was no need to build a package management system since CPAN, and yet npm is the best.
Wait, what?

Every time a new programming language or framework is released, people seem to decide that:

  1. It needs its own package manager.

  2. Simple algorithms need to be rewritten from scratch in “pure” $language/framework and distributed as packages in this package manager.

This is not actually true. Many programming languages – particularly many of the trendy ones – have a way to call C functions, and a way to expose their own routines as C functions. Even C++ has this feature. This means that you don’t need any new packaging system, if you can deploy packages that expose C functions (whatever the implementation language) then you can use existing code, and you don’t need to rewrite everything.

So there hasn’t been a need for a packaging system since at least CPAN, maybe earlier.

On the other hand, npm is the best packaging system ever because people actually consume existing code with it. It’s huge, there are tons of libraries, and so people actually think about whether this thing they’re doing needs new code or the adoption of existing code. It’s the realisation of the OO dream, in which folks like Brad Cox said we’d have data sheets of available components and we’d pull the components we need and bind them together in our applications.

Developers who use npm are just gluing components together into applications, and that’s great for software.

Dogmatic paradigmatism

First, you put all of your faith in structured programming, and you got burned. You found it hard to associate the operations in your software with the data upon which they act, and to make sure that the expectations made on the data in one place are satisfied when that data has been modified in that other place, or over there in yet another place. Clearly structured programming is broken.

Then, you put all of your faith in object-oriented programming, and you got burned. You found it hard to follow the flow of a program when it jumps in and out of different classes, and to see which parts were coupled to what. Clearly object-oriented programming is broken.

Then, you put all of your faith in functional programming, and you got burned. You found it hard to represent real business processes in terms of immutable data structures and pure functions, and to express changes to the operating environment without using side effects. Clearly functional programming is broken.

Or maybe it’s you. Maybe, rather than relying on faith to make these conceptual thought frameworks do what you need from them, you could have thought about the concepts.

Layers of Distraction

A discussion I was involved in over on Facebook reminded me of some other issues I’d already drafted for this blog, so I stuck the two together and here we are.

Software systems can often be seen as aggregations of strata, with higher layers making use of the services in the lower layers. You’ll often see a layered architecture diagram looking like a flat and well-organised collection of boiled sweets.

As usual, it’s the interstices rather than the objects themselves that are of interest. Where two layers come together, there’s usually one of a very small number of different transformations taking place. The first is that components above the boundary can express instructions that any computer could run, and they are transformed into instructions suitable for this computer. That’s what the C compiler does, it’s what the x86 processor does (it takes IA-32 instructions, which any computer could run, and turns them into the microcode which it can run), it’s what device drivers do.

The second is that it turns one set of instructions any computer could run into another set that any computer could run. If you promise not to look too closely the Smalltalk virtual machine does this, by turning instructions in the Smalltalk bytecode into instructions in the host machine language.

The third is that it turns a set of computer instructions in a specific domain into the general-purpose instructions that can run on the computer (sometimes this computer, sometimes any computer). A function library turns requests to do particular things into the machine instructions that will do them. A GUI toolkit takes requests to draw buttons and widgets and turns them into requests to draw lines and rectangles. The UNIX shell turns an ordered sequence of suggestions to run programs into the collection of C library calls and machine instructions implied by the sequence.

The fourth is turning a model of a problem I might want solving into a collection of instructions in various computer domains. Domain-specific languages sit here, but usually this transition is handled by expensive humans.

Many transitions can be found in the second and third layers, so that we can turn this computer into any computer, and then build libraries on any computer, then build a virtual machine atop those libraries, then build libraries for the virtual machine, then build again in that virtual machine, then finally put the DOM and JavaScript on top of that creaking mess. Whether we can solve anybody’s problems from the top of the house of cards is a problem to be dealt with later.

You’d hope that from the outside of one boundary, you don’t need to know anything about the inside: you can use the networking library without needing to know what device is doing the networking, you can draw a button without needing to know how the lines get onto the screen, you can use your stock-trading language without needing know what Java byte codes are generated. In other words, both abstractions and refinements do not leak.

As I’ve gone through my computing career, I’ve cared to different extents about different levels of abstraction and refinement. That’s where the Facebook discussion came in: there are many different ways that a Unix system can start up. But when I’m on a desktop computer, I not only don’t care which way the desktop starts up, I don’t want to have to deal with it. Whatever the relative merits of SMF, launchd, SysV init, /etc/rc, SystemStarter, systemd or some other system, the moment I need to even know which is in play is the moment that I no longer want to use this desktop system.

I have books here on processor instruction sets, but the most recent (and indeed numerous) are for the Motorola 68k family. Later than that and I’ll get away with mostly not knowing, looking up the bits I do need ad hoc, and cursing your eyes if your debugger drops me into a disassembly.

So death to the trope that you can’t understand one level of abstraction (or refinement) without understanding the layers below it. That’s only true when the lower layers are broken, though I accept that that is probably the case.

The reasonable effectiveness of developer tools

In goals upon goals upon goals, I suggested that a fixation on developer tools is misplaced. This is not to say that developer tools are unhelpful, nor that they can’t have a significant impact on our work.

Consider the following, over-restricted, definition of what a programmer does:

A programmer’s responsibility is to turn a computer into a solution to somebody’s problem.

We have plenty of tools designed to stop you having to consider the details of this computer when doing that: assemblers, compilers, device drivers, hardware abstraction layers, virtual machines, memory managers and so on. Then we have tools to speed up aspects of working in those abstractions: build systems, IDEs and the like. And tools that help make sure you moved in the correct direction: testing tools, analysers and the like.

Whether we have tools that help you move from an abstract view of your computer to even an abstract view of your problem depends strongly on your problem domain, and the social norms of programmers in that space. Science is fairly well-supplied, for example, with both commercial and open source tools.

But many developers will be less lucky, or less aware of the tools at their disposal. Having been taken from “your computer…” to “any computer…” by any of a near-infinite collection of generic developer tools, they will then get to “…can solve this problem” by building their own representations of the aspects of the problem. In this sense, programming is still done the way we did it in the 1970s, by deciding what our problem is and how we can model bits of it in a computer.

It’s here, in the bit where we try to work out whether we’re building a useful thing that really solves the problems real people really have, that there are still difficulties, unnecessary costs and incidental complexity. Therefore it’s here where judicious selection and use of tools can be of benefit, as their goals support our goals of supporting our users’ goals.

And that’s why I think that developer tools are great, even while warning against fixating upon them. Fixate on the things that need to be done, then discover (or create) tools to make them faster, better and redundant.

Code longevity

I recently wrote about the impending centenary of applied computing; a time when we could reflect on the first hundred years to make it easier for people to progress beyond our position into the second hundred years. This necessitates looking at the things we’ve tried, the things that succeeded and the things that failed. It involves recalling and describing the good ideas and the bad ideas.

So, did the bad ideas fail and the good ideas succeed? Can we declare that because something worked, it must have been a success? Is length of service a great proxy for quality of principle?

Let’s start by looking at the lifetime of some of the trappings of applied computing. I’m writing this on the smartphone shown in the picture below. It is, among the many computers I own that claim to be computers and could reasonably be described as modern, one of only two that is not running a recent variant of a minicomputer game–loading system.

Surface RT and Lumia 925

Now is that a fair assessment? Certainly all the Macs, iOSes, Androids (and even routers and television streamy box things) in the house are based on Unix, and Unix is the thing of the 1970s minicomputer. I’ve even used that idea to explain why we still have to deal with PDP-8 problems in iPhones. But is it fair to assume that because the name has lasted, then the idea has been preserved? Did Unix succeed, or has it been replaced by different things with the same name? That happens a lot; is today’s ethernet really the same ethernet that Bob Metcalfe and colleagues at PARC invented? Conversely, just because the name changed is everything new? Does Windows NT really represent a clean break in 1993?

There’s certainly some core, a kernel (f’nar) of the modern Unix that, whether in code or philosophy, can be traced back to the original system (and indeed beyond). But is that there because it’s still a good idea, or because there’s no impetus to remove it? Or even because it’s a bad idea, but removing it would be expensive?

As we’re already talking about Unix, let’s talk about C. In his talk Null References: The Billion-Dollar Mistake, Tony Hoare describes his own mistake as being the introduction of a null reference. He then says that C’s mistake (C follows Algol in having null references, but it also lacks have subscript bounds checking) is an order of magnitude worse. In fact, Hoare also identified a third problem: he says that it’s a good idea to permit a program failure to be diagnosed just from the error message and the high-level program source text. However, runtime failures in C usually end up with a core dump and/or a stack trace through the instructions of the target machine environment.

We can easily wonder just how much (expensive) programmer time has been lost disassembling stack traces, matching up debugger symbols and interpreting core dumps, but without figures for that I’ll generously assume that it’s an order of magnitude smaller than the losses due to buffer overflows. Now that’s only a tens-of-billions-of-dollars value of mistake, and C is the substrate for trillions of dollars of value of industry. So do we say that on balance, C is 99% a Good Thing™? Is it a bad idea that nonetheless enabled plenty of good ones?

[Incidentally, and without wanting to derail the central thesis of this post, I disagree with Hoare’s numbers. Symantec is merely one of the largest companies in the information security sector, with annual revenue in their most recent report of $6.9B. That’s a small part of the total value sunk into that sector, which I’ll guess has an annual magnitude of multiple tens of billions. A large fraction of the problems addressed by infosec can be attributed to C’s lack of bounds checking, so that there’s probably just an annual impact of around ten billion dollars working on fixing the problem. Assuming those businesses have sustainable revenues over multiple years, the integrated cost is well into the hundreds of billions. That only revises the estimated impact on the C software industry from ‘fractions of a per cent’ to ‘a per cent’ though.]

Perhaps it’s fair to say that C was a good idea when it arose, and that it’s since been found to have deficiencies that haven’t yet become expensive enough to warrant decommissioning it. There’s an assumption of rational action in there that I think it’s fair to question, though: am I assuming that C is not worth replacing just because it has not been replaced? Might there actually be other factors involved?

Yes, there might. It’s possible that there are organisations out there for whom C is more expensive than its worth, but where the sunk cost fallacy stops them from moving on. Or organisations who stick with C because their platform vendor gives them a C toolset, even where free or paid alternatives would be cheaper [in fact that would point to a difficulty with any holistic evaluation: that the cost to the people who provide development environments and the cost to the people who consume development environments depends on different factors, and the power in the market is biased towards a few large providers. Welcome to economics]. Or organisations who stick with C because of a perception of a large community of users, which is (perceived to be) more useful than striking out alone with better tools.

It’s also possible that moves in the other direction are based on non-rational factors: organisations that seek novelty rather than improvement, or who move away from C because a vendor convinces them that their alternative is better regardless of objective truth.

It turns out that the simple question we wanted to ask about applied computing: “What works?” leads to such a complex and maybe even chaotic system of forces acting in multiple dimensions that answering it will be very difficult. This doesn’t mean that an answer should not be sought, but that finding the answer will combine expertise from many different fields. Particularly, something that survives for a long time doesn’t necessarily work: it could just be that people are afraid of the alternatives, or haven’t really considered them.

Preparing for Computing’s Big One-Oh-Oh

However you slice the pie, we’re between two and three decades away from the centenary celebration for applied computing (which is of course significantly after theoretical or hypothetical advances made by the likes of Lovelace, Turing and others). You might count the anniversary of Colossus in 2043, the ENIAC in 2046, or maybe something earlier (and arguably not actually applied) like the Z3 or ABC (both 2041). Whichever one you pick, it’s not far off.

That means that the time to start organising the handover from the first century’s programmers to the second is now, or perhaps a little earlier. You can see the period from the 1940s to around 1980 as a time of discovery, when people invented new ways of building and applying computers because they could, and because there were no old ways yet. The next three and a half decades—a period longer than my life—has been a period of rediscovery, in which a small number of practices have become entrenched and people occasionally find existing, but forgotten, tools and techniques to add to their arsenal, and incrementally advance the entrenched ones.

My suggestion is that the next few decades be a period of uncovery, in which we purposefully seek out those things that have been tried, and tell the stories of how they are:

  • successful because they work;
  • successful because they are well-marketed;
  • successful because they were already deployed before the problems were understood;
  • abandoned because they don’t work;
  • abandoned because they are hard;
  • abandoned because they are misunderstood;
  • abandoned because something else failed while we were trying them.

I imagine a multi-volume book✽, one that is to the art of computer programming as The Art Of Computer Programming is to the mechanics of executing algorithms on a machine. Such a book✽ would be mostly a guide, partly a history, with some, all or more of the following properties:

  • not tied to any platform, technology or other fleeting artefact, though with examples where appropriate (perhaps in a platform invented for the purpose, as MIX, Smalltalk, BBC BASIC and Oberon all were)
  • informed both by academic inquiry and practical experience
  • more accessible than the Software Engineering Body of Knowledge
  • as accepting of multiple dissenting views as Ward’s Wiki
  • at least as honest about our failures as The Mythical Man-Month
  • at least as proud of our successes as The Clean Coder
  • more popular than The Celestial Homecare Omnibus

As TAOCP is a survey of algorithms, so this book✽ would be a survey of techniques, practices and modes of thought. As this century’s programmer can go to TAOCP to compare algorithms and data structures for solving small-scale problems then use selected algorithms and data structures in their own work, so next century’s applier of computing could go to this book✽ to compare techniques and ways of reasoning about problems in computing then use selected techniques and reasons in their own work. Few people would read such a thing from cover to cover. But many would have it to hand, and would be able to get on with the work of invention without having to rewrite all of Doug Engelbart’s work before they could get to the new stuff.

It's dangerous to go alone! Take this.

✽: don’t get hung up on the idea that a book is a collection of quires of some pigmented flat organic matter bound into a codex, though.

On too much and too little

In the following text, remember that words like me or I are to be construed in the broadest possible terms.

It’s easy to be comfortable with my current level of knowledge. Or perhaps it’s not the value, but the derivative of the value: the amount of investment I’m putting into learning a thing. Anyway, it’s easy to tell stories about why the way I’m doing it is the right, or at least a good, way to do it.

Take, for example, object-oriented design. We have words to describe insufficient object-oriented design. Spaghetti Code, or a Big Ball of Mud. Obviously these are things that I never succumb to, but other people do. So clearly (actually, not clearly at all, but that’s beside the point) there is some threshold level of design or analysis practice that represents an acceptable minimum. Whatever that value is, it’s less than the amount that I do.

Interestingly there are also words to describe the over-application of object-oriented design. Architecture Astronauts, for example, are clearly people who do too much architecture (in the same way that NASA astronauts got carried away with flying and overdid it, I suppose). It’s so cold up in space that you’ll catch a fever, resulting in Death by UML Fever. Clearly I am only ever responsible for tropospheric architecture, thus we conclude that there is some acceptable maximum threshold for analysis and design too.

The really convenient thing is that my current work lies between these two limits. In fact, I’m comfortable in saying that it always has.

But wait. I also know that I’m supposed to hate the code that I wrote six months ago, probably because I wasn’t doing enough of whatever it is that I’m doing enough of now. But I don’t remember thinking six months ago that I was below the threshold for doing acceptable amounts of the stuff that I’m supposed to be doing. Could it be, perhaps, that the goalposts have conveniently moved in that time?

Of course they have. What’s acceptable to me now may not be in the future, either because I’ve learned to do more of it or because I’ve learned that I was overdoing it. The trick is not so much in recognising that, but in recognising that others who are doing more or less than me are not wrong, they could in fact be me at a different point on my timeline but with the benefit that they exist now so I can share my experiences with them and work things out together. Or they could be someone with a completely different set of experiences, which is even more exciting as I’ll have more stories to swap.

When it comes to techniques and devices for writing software, I tend to prefer overdoing things and then finding out which bits I don’t really need after all, rather than under-application. That’s obviously a much larger cognitive and conceptual burden, but it stems from the fact that I don’t think we really have any clear ideas on what works and what doesn’t. Not much in making software is ever shown to be wrong, but plenty of it is shown to be out of fashion.

Let me conclude by telling my own story of object-oriented design. It took me ages to learn object-oriented thinking. I learned the technology alright, and could make tools that used the Objective-C language and Foundation and AppKit, but didn’t really work out how to split my stuff up into objects. Not just for a while, but for years. A little while after that Death by UML Fever article was written, my employer sent me to Sun to attend their Object-Oriented Analysis and Design Using UML course.

That course in itself was a huge turning point. But just as beneficial was the few months afterward in which I would architecturamalise all the things, and my then-manager wisely left me to it. The office furniture was all covered with whiteboard material, and there soon wasn’t a bookshelf or cupboard in my area of the office that wasn’t covered with sequence diagrams, package diagrams, class diagrams, or whatever other diagrams. I probably would’ve covered the external walls, too, if it wasn’t for Enterprise Architect. You probably have opinions(TM) of both of the words in that product’s name. In fact I also used OmniGraffle, and dia (my laptop at the time was an iBook G4 running some flavour of Linux).

That period of UMLphoria gave me the first few hundred hours of deliberate practice. It let me see things that had been useful, and that had either helped me understand the problem or communicate about it with my peers. It also let me see the things that hadn’t been useful, that I’d constructed but then had no further purpose for. It let me not only dial back, but work out which things to dial back on.

I can’t imagine being able to replace that experience with reading web articles and Stack Overflow questions. Sure, there are plenty of opinions on things like OOA/D and UML on the web. Some of those opinions are even by people who have tried it. But going through that volume of material and sifting the experience-led advice from the iconoclasm or marketing fluff, deciding which viewpoints were relevant to my position: that’s all really hard. Harder, perhaps, than diving in and working slowly for a few months while I over-practice a skill.

Software, Science?

Is there any science in software making? Does it make sense to think of software making as scientific? Would it help if we could?

Hold on, just what is science anyway?

Good question. The medieval French philosopher-monk Buridan said that the source of all knowledge is experience, and Richard Feynman paraphrased this as “the test of all knowledge is experiment”.

If we accept that science involves some experimental test of knowledge, then some of our friends who think of themselves as scientists will find themselves excluded. Consider the astrophysicists and cosmologists. It’s all very well having a theory about how stars form, but if you’re not willing to swirl a big cloud of hydrogen, helium, lithium, carbon etc. about and watch what happens to it for a few billion years then you’re not doing the experiment. If you’re not doing the experiment then you’re not a scientist.

Our friends in what are called the biological and medical sciences also have some difficulty now. A lot of what they do is tested by experiment, but some of the experiments are not permitted on ethical grounds. If you’re not allowed to do the experiment, maybe you’re not a real scientist.

Another formulation (OK, I got this from the wikipedia entry on Science) sees science as a sort of systematic storytelling: the making of “testable explanations and predictions about the universe”.

Under this definition, there’s no problem with calling astronomy a science: you think this is how things work, then you sit, and watch, and see whether that happens.

Of course a lot of other work fits into the category now, too. There’s no problem with seeing the “social sciences” as branches of science: if you can explain how people work, and make predictions that can (in principle, even if not in practice) be tested, then you’re doing science. Psychology, sociology, economics: these are all sciences now.

Speaking of the social sciences, we must remember that science itself is a social activity, and that the way it’s performed is really defined as the explicit and implicit rules and boundaries created by all the people who are doing it. As an example, consider falsificationism. This is the idea that a good scientific hypothesis is one that can be rejected, rather than confirmed, by an appropriately-designed experiment.

Sounds pretty good, right? Here’s the interesting thing: it’s also pretty new. It was mostly popularised by Karl Popper in the 20th Century. So if falsificationism is the hallmark of good science, then Einstein didn’t do good science, nor did Marie Curie, nor Galileo, or a whole load of other people who didn’t share the philosophy. Just like Dante’s Virgil was not permitted into heaven because he’d been born before Christ and therefore could not be a Christian, so all of the good souls of classical science are not permitted to be scientists because they did not benefit from Popper’s good message.

So what is science today is not necessarily science tomorrow, and there’s a sort of self-perpetuation of a culture of science that defines what it is. And of course that culture can be critiqued. Why is peer review appropriate? Why do the benefits outweigh the politics, the gazumping, the gender bias? Why should it be that if falsification is important, negative results are less likely to be published?

Let’s talk about Physics

Around a decade ago I was studying particle physics pretty hard. Now there are plenty of interesting aspects to particle physics. Firstly that it’s a statistics-heavy discipline, and that results in statistics are defined by how happy you are with them, not by some binary right/wrong criterion.

It turns out that particle physicists are a pretty conservative bunch. They’ll only accept a particle as “discovered” if the signal indicating its existence is measured as a five-sigma confidence: i.e. if there’s under a one-on-a-million chance that the signal arose randomly in the absence of the particle’s existence. Why five sigma? Why not three (a 99.7% confidence) or six (to keep Motorola happy)? Why not repeat it three times and call it good, like we did in middle school science classes?

Also, it’s quite a specialised discipline, with a clear split between theory and practice and other divisions inside those fields. It’s been a long time since you could be a general particle physicist, and even longer since you could be simply a “physicist”. The split leads to some interesting questions about the definition of science again: if you make a prediction which you know can’t be verified during your lifetime due to the lag between theory and experimental capability, are you still doing science? Does it matter whether you are or not? Is the science in the theory (the verifiable, or falsifiable, prediction) or in the experiment? Or in both?

And how about Psychology, too

Physicists are among the most rational people I’ve worked with, but psychologists up the game by bringing their own feature to the mix: hypercriticality. And I mean that in the technical sense of criticism, not in the programmer “you’re grammar sucks” sense.

You see, psychology is hard, because people are messy. Physics is easy: the apple either fell to earth or it didn’t. Granted, quantum gets a bit weird, but it generally (probably) does its repeatable thing. We saw above that particle physics is based on statistics (as is semiconductor physics, as it happens); but you can still say that you expect some particular outcome or distribution of outcomes with some level of confidence. People aren’t so friendly. I mean, they’re friendly, but not in a scientific sense. You can do a nice repeatable psychology experiment in the lab, but only by removing so many confounding variables that it’s doubtful the results would carry over into the real world. And the experiment only really told you anything about local first year psychology undergraduates, because they’re the only people who:

  1. walked past the sign in the psychology department advertising the need for participants;
  2. need the ten dollars on offer for participation desperately enough to turn up.

In fact, you only really know about how local first year psychology undergraduates who know they’re participating in a psychology experiment behave. The ethics rules require informed consent which is a good thing because otherwise it’s hard to tell the difference between a psychology lab and a Channel 4 game show. But it means you have to say things like “hey this is totally an experiment and there’ll be counselling afterward if you’re disturbed by not really electrocuting the fake person behind the wall” which might affect how people react, except we don’t really know because we’re not allowed to do that experiment.

On the other hand, you can make observations about the real world, and draw conclusions from them, but it’s hard to know what caused what you saw because there are so many things going on. It’s pretty hard to re-run the entire of a society with just one thing changed, like “maybe if we just made Hitler an inch taller then Americans would like him, or perhaps try the exact same thing as prohibition again but in Indonesia” and other ideas that belong in Philip K. Dick novels.

So there’s this tension: repeatable results that might not apply to the real world (a lack of “ecological validity”), and real-world phenomena that might not be possible to explain (a lack of “internal validity”). And then there are all sorts of other problems too, so that psychologists know that for a study to hold water they need to surround what they say with caveats and limitations. Thus is born the “threats to validity” section on any paper, where the authors themselves describe the generality (or otherwise) of their results, knowing that such threats will be a hot topic of discussion.

But all of this—the physics, the psychology, and the other sciences—is basically a systematised story-telling exercise, in which the story is “this is why the universe is as it is” and the system is the collection of (time-and-space-dependent) rules that govern what stories may be told. It’s like religion, but with more maths (unless your religion is one of those ones that assigns numbers to each letter in a really long book then notices that your phone number appears about as many times as a Poisson distribution would suggest).

Wait, I think you were talking about software

Oh yeah, thanks. So, what science, if any, is there in making software? Does there exist a systematic approach to storytelling? First, let’s look at the kinds of stories we need to tell.

The first are the stories about the social system in which the software finds itself: the story of the users, their need (or otherwise) for a software system, their reactions (or otherwise) to the system introduced, how their interactions with each other change as a result of introducing the system, and so on. Call this requirements engineering, or human-computer interaction, or user experience; it’s one collection of stories.

You can see these kinds of stories emerging from the work of Manny Lehman. He identifies three types of software:

  • an S-system is exactly specified.
  • a P-system executes some known procedure.
  • an E-system must evolve to meet the needs of its environment.

It may seem that E-type software is the type in which our stories about society are relevant, but think again: why do we need software to match a specification, or to follow a procedure? Because automating that specification or procedure is of value to someone. Why, or to what end? Why that procedure? What is the impact of automating it? We’re back to telling stories about society. All of these software systems, according to Lehman, arise from discovery of a problem in the universe of discourse, and provide a solution that is of possible interest in the universe of discourse.

The second kind are the stories about how we worked together to build the software we thought was needed in the first stories. The practices we use to design, build and test our software are all tools for facilitating the interaction between the people who work together to make the things that come out. The things we learned about our own society, and that we hope we can repeat (or avoid) in the future, become our design, architecture, development, testing, deployment, maintenance and support practices. We even create our own tools—software for software’s sake—to automate, ease or disrupt our own interactions with each other.

You’ll mostly hear the second kind of story at most developer conferences. I believe that’s because the people who have most time and inclination to speak at most developer conferences are consultants, and they understand the second stories to a greater extent than the first because they don’t spend too long solving any particular problem. It’s also because most developer conferences are generally about making software, not about whatever problem it is that each of the attendees is trying to solve out in the world.

I’m going to borrow a convention that Rob Rix told me in an email, of labelling the first type of story as being about “external quality” and the second type about “internal quality”. I went through a few stages of acceptance of this taxonomy:

  1. Sounds like a great idea! There really are two different things we have to worry about.
  2. Hold on, this doesn’t sounds like such a good thing. Are we really dividing our work into things we do for “us” and things we do for “them”? Labelling the non-technical identity? That sounds like a recipe for outgroup homogeneity effect.
  3. No, wait, I’m thinking about it wrong. The people who make software are not the in-group. They are the mediators: it’s just the computers and the tools on one side of the boundary, and all of society on the other side. We have, in effect, the Janus Thinker: looking on the one hand toward the external stories, on the other toward the internal stories, and providing a portal allowing flow between the two.

JANUS (from Vatican collection) by Flickr user jinnrouge

So, um, science?

What we’re actually looking at is a potential social science: there are internal stories about our interactions with each other and external stories about our interactions with society and of society’s interactions with the things we create, and those stories could potentially be systematised and we’d have a social science of sorts.

Particularly, I want to make the point that we don’t have a clinical science, an analogy drawn by those who investigate evidence-based software engineering (which has included me, in my armchair way, in the past). You can usefully give half of your patients a medicine and half a placebo, then measure survival or recovery rates after that intervention. You cannot meaningfully treat a software practice, like TDD as an example, as a clinical intervention. How do you give half of your participants a placebo TDD? How much training will you give your ‘treatment’ group, and how will you organise placebo training for the ‘control’ group? [Actually I think I’ve been on some placebo training courses.]

In constructing our own scientific stories about the world of making software, we would run into the same problems that social scientists do in finding useful compromises between internal and ecological validity. For example, the oft-cited Exploratory experimental studies comparing online and offline programming performance (by Sackman et al., 1968) is frequently used to support the notion that there are “10x programmers”, that some people who write software just do it ten times faster than others.

However, this study does not have much ecological validity. It measures debugging performance, using either an offline process (submitting jobs to a batch system) or an online debugger called TSS, which probably isn’t a lot like the tools used in debugging today. The problems were well-specified, thus removing many of the real problems programmers face in designing software. Participants were expected to code a complete solution with no compiler errors, then debug it: not all programmers work like that. And where did they get their participants from? Did they have a diverse range of backgrounds, cultures, education, experience? It does not seem that any results from that study could necessarily apply to modern software development situated in a modern environment, nor could the claim of “10x programmers” necessarily generalise as we don’t know who is 10x better than whom, even at this one restricted task.

In fact, I’m also not convinced of its internal validity. There were four conditions (two programming problems and two debugging setups), each of which was assigned to six participants. Variance is so large that most of the variables are independent of each other (the independent variables are the programming problem and the debugging mode, and the dependent variables are the amount of person-time and CPU-time), unless the authors correlate them with “programming skill”. How is this skill defined? How is it measured? Why, when the individual scores are compared, is “programming skill” not again taken into consideration? What confounding variables might also affect the wide variation in scored reported? Is it possible that the fastest programmers had simply seen the problem and solved it before? We don’t know. What we do know is that the reported 28:1 ratio between best and worst performers is across both online and offline conditions (as pointed out in, e.g., The Leprechauns of Software Engineering, so that’s definitely a confounding factor. If we just looked at two programmers using the same environment, what difference would be found?

We had the problem that “programming skill” is not well-defined when examining the Sackman et al. study, and we’ll find that this problem is one we need to overcome more generally before we can make the “testable explanations and predictions” that we seek. Let’s revisit the TDD example from earlier: my hypothesis is that a team that adopts the test-driven development practice will be more productive some time later (we’ll defer a discussion of how long) than the null condition.

OK, so what do we mean by “productive”? Lines of code delivered? Probably not, their amount varies with representation. OK, number of machine instructions delivered? Not all of those would be considered useful. Amount of ‘customer value’? What does the customer consider valuable, and how do we ensure a fair measurement of that across the two conditions? Do bugs count as a reduction in value, or a need to do additional work? Or both? How long do we wait for a bug to not be found before we declare that it doesn’t exist? How is that discovery done? Does the expense related to finding bugs stay the same in both cases, or is that a confounding variable? Is the cost associated with finding bugs counted against the value delivered? And so on.

Software dogma

Because our stories are not currently very testable, many of them arise from a dogmatic belief that some tool, or process, or mode of thought, is superior to the alternatives, and that there can be no critical debate. For example, from the Clean Coder:

The bottom line is that TDD works, and everybody needs to get over it.

No room for alternatives or improvement, just get over it. If you’re having trouble defending it, apply a liberal sprinkle of argumentum ab auctoritate and tell everyone: Robert C. Martin says you should get over it!

You’ll also find any number of applications of the thought-terminating cliché, a rhetorical technique used to stop cognitive dissonance by allowing one side of the issue to go unchallenged. Some examples:

  • “I just use the right tool for the job”—OK, I’m done defending this tool in the face of evidence. It’s just clearly the correct one. You may go about your business. Move along.
  • “My approach is pragmatic”—It may look like I’m doing the opposite of what I told you earlier, but that’s because I always do the best thing to do, so I don’t need to explain the gap.
  • “I’m passionate about [X]”—yeah, I mean your argument might look valid, I suppose, if you’re the kind of person who doesn’t love this stuff as much I do. Real craftsmen would get what I’m saying.
  • and more.

The good news is that out of such religious foundations spring the shoots of scientific thought, as people seek to find a solid justification for their dogma. So just as physics has strong spiritual connections, with Steven Hawking concluding in A Brief History of Time:

However, if we discover a complete theory, it should in time be understandable by everyone, not just by a few scientists. Then we shall all, philosophers, scientists and just ordinary people, be able to take part in the discussion of the question of why it is that we and the universe exist. If we find the answer to that, it would be the ultimate triumph of human reason — for then we should know the mind of God.

and Einstein debating whether quantum physics represented a kind of deific Dungeons and Dragons:

[…] an inner voice tells me that it is not yet the real thing. The theory says a lot, but does not really bring us any closer to the secret of the “old one.” I, at any rate, am convinced that He does not throw dice.

so a (social) science of software could arise as an analogous form of experimental theology. I think the analogy could be drawn too far: the context is not similar enough to the ages of Islamic Science or of the Enlightenment to claim that similar shifts to those would occur. You already need a fairly stable base of rational science (and its application via engineering) to even have a computer at all upon which to run software, so there’s a larger base of scientific practice and philosophy to draw on.

It’s useful, though, when talking to a programmer who presents themselves as hyper-rational, to remember to dig in and to see just where the emotions, fallacious arguments and dogmatic reasoning are presenting themselves, and to wonder about what would have to change to turn any such discussion into a verifiable prediction. And, of course, to think about whether that would necessarily be a beneficial change. Just as we can critique scientific culture, so should we critique software culture.

Inside-Out Apps

This article is based on a talk I gave at mdevcon 2014. The talk also included a specific example to demonstrate the approach, but was otherwise a presentation of the following argument.

You probably read this blog because you write apps. Which is kind of cool, because I have been known to use apps. I’d be interested to hear what yours is about.

Not so fast! Let me make this clear first: I only buy page-based apps, they must use Core Data, and I automatically give one star to any app that uses storyboards.

OK, that didn’t sound like it made any sense. Nobody actually chooses which apps to buy based on what technologies or frameworks are used, they choose which apps to buy based on what problems those apps solve. On the experience they derive from using the software.

When we build our apps, the problem we’re solving and the experience we’re providing needs to be at the uppermost of our thoughts. You’re probably already used to doing this in designing an application: Apple’s Human Interface Guidelines describe the creation of an App Definition Statement to guide thinking about what goes into an app and how people will use it:

An app definition statement is a concise, concrete declaration of an app’s main purpose and its intended audience.

Create an app definition statement early in your development effort to help you turn an idea and a list of features into a coherent product that people want to own. Throughout development, use the definition statement to decide if potential features and behaviors make sense.

My suggestion is that you should use this idea to guide your app’s architecture and your class design too. Start from the problem, then work through solving that problem to building your application. I have two reasons: the first will help you, and the second will help me to help you.

The first reason is to promote a decoupling between the problem you’re trying to solve, the design you present for interacting with that solution, and the technologies you choose to implement the solution and its design. Your problem is not “I need a Master-Detail application”, which means that your solution may not be that. In fact, your problem is not that, and it may not make sense to present it that way. Or if it does now, it might not next week.

You see, designers are fickle beasts, and for all their feel-good bloviation about psychology and user experience, most are actually just operating on a combination of trend and whimsy. Last week’s refresh button is this week’s pull gesture is next week’s interaction-free event. Yesterday’s tab bar is today’s hamburger menu is tomorrow’s swipe-in drawer. Last decade’s mouse is this decade’s finger is next decade’s eye motion. Unless your problem is Corinthian leather, that’ll be gone soon. Whatever you’re doing for iOS 7 will change for iOS 8.

So it’s best to decouple your solution from your design, and the easiest way to do that is to solve the problem first and then design a presentation for it. Think about it. If you try to design a train timetable, then you’ll end up with a timetable that happens to contain train details. If you try to solve the problem “how do I know at what time to be on which platform to catch the train to London?”, then you might end up with a timetable, but you might not. And however the design of the solution changes, the problem itself will not: just as the problem of knowing where to catch a train has not changed in over a century.

The same problem that affects design-driven development also affects technology-driven development. This month you want to use Core Data. Then next month, you wish you hadn’t. The following month, you kind of want to again, then later you realise you needed a document database after all and go down that route. Solve the problem without depending on particular libraries, then changing libraries is no big deal, and neither is changing how you deal with those libraries.

It’s starting with the technology that leads to Massive View Controller. If you start by knowing that you need to glue some data to some view via a View Controller, then that’s what you end up with.

This problem is exacerbated, I believe, by a religious adherence to Model-View-Controller. My job here is not to destroy MVC, I am neither an iconoclast nor a sacrificer of sacred cattle. But when you get overly attached to MVC, then you look at every class you create and ask the question “is this a model, a view, or a controller?”. Because this question makes no sense, the answer doesn’t either: anything that isn’t evidently data or evidently graphics gets put into the amorphous “controller” collection, which eventually sucks your entire codebase into its innards like a black hole collapsing under its own weight.

Let’s stick with this “everything is MVC” difficulty for another paragraph, and possibly a bulleted list thereafter. Here are some helpful answers to the “which layer does this go in” questions:

  • does my Core Data stack belong in the model, the view, or the controller? No. Core Data is a persistence service, which your app can call on to save or retrieve data. Often the data will come from the model, but saving and retrieving that data is not itself part of your model.
  • does my networking code belong in the model, the view, or the controller? No. Networking is a communications service, which your app can call on to send or retrieve data. Often the data will come from the model, but sending and retrieving that data is not itself part of your model.
  • is Core Graphics part of the model, the view, or the controller? No. Core Graphics is a display primitive that helps objects represent themselves on the display. Often those objects will be views, but the means by which they represent themselves are part of an external service.

So building an app in a solution-first approach can help focus on what the app does, removing any unfortunate coupling between that and what the app looks like or what the app uses. That’s the bit that helps you. Now, about the other reason for doing this, the reason that makes it easier for me to help you.

When I come to look at your code, and this happens fairly often, I need to work out quickly what it does and what it should do, so that I can work out why there’s a difference between those two things and what I need to do about it. If your app is organised in such a way that I can see how each class contributes to the problem being solved, then I can readily tell where I go for everything I need. If, on the other hand, your project looks like this:

MVC organisation

Then the only thing I can tell is that your app is entirely interchangeable with every other app that claims to be nothing more than MVC. This includes every Rails app, ever. Here’s the thing. I know what MVC is, and how it works. I know what UIKit is, and why Apple thinks everything is a view controller. I get those things, your app doesn’t need to tell me those things again. It needs to reflect not the similarities, which I’ve seen every time I’ve launched Project Builder since around 2000, but the differences, which are the things that make your app special.

OK, so that’s the theory. We should start from the problem, and move to the solution, then adapt the solution onto the presentation and other technology we need to use to get a product we can sell this week. When the technologies and the presentations change, we can adapt onto those new things, to get the product we can sell next week, without having to worry about whether we broke solving the problem. But what’s the practice? How do we do that?

Start with the model.

Remember that, in Apple’s words:

model objects represent knowledge and expertise related to a specific problem domain

so solving the problem first means modelling the problem first. Now you can do this without regard to any particular libraries or technology, although it helps to pick a programming language so that you can actually write something down. In fact, you can start here:

Command line tool

A Foundation command-line tool has everything you need to solve your problem (in fact it contains a few more things than that, to make up for erstwhile deficiencies in the link editor, but we’ll ignore those things). It lets you make objects, and it lets you use all those boring things that were solved by computer scientists back in the 1760s like strings, collections and memory allocation.

So with a combination of the subset of Objective-C, the bits of Foundation that should really be in Foundation, and unit tests to drive the design of the solution, we can solve whatever problem it is that the app needs to solve. There’s just one difficulty, and that is that the problem is only solved for people who know how to send messages to objects. Now we can worry about those fast-moving targets of presentation and technology choice, knowing that the core of our app is a stable, well-tested collection of objects that solve our customers’ problem. We expose aspects of the solution by adapting them onto our choice of user interface, and similarly any other technology dependencies we need to introduce are held at arm’s length. We test that we integrate with them correctly, but not that using them ends up solving the problem.

If something must go, then we drop it, without worrying whether we’ve broken our ability to solve the problem. The libraries and frameworks are just services that we can switch between as we see fit. They help us solve our problem, which is to help everyone else to solve their problem.

And yes, when you come to build the user interface, then model-view-controller will be important. But only in adapting the solution onto the user interface, not as a strategy for stuffing an infinite number of coats onto three coat hooks.

References

None of the above is new, it’s just how Object-Oriented Programming is supposed to work. In the first part of my MVC series, I investigated Thing-Model-View-Editor and the progress from the “Thing” (a problem in the real world that must be solved) to a “Model” (a representation of that problem and its solution in the computer). That article relied on sources from Trygve Reenskaug, who described (in 1979) the process by which he moved from a Thing to a Model and then to a user interface.

In his 1992 book Object-Oriented Software Engineering: A Use-Case Driven Approach, Ivar Jacobson describes a formalised version of the same motion, based on documented use cases. Some teams replaced use cases with user stories, which look a lot like Reenskaug’s user goals:

An example: To get better control over my finances, I would need to set up a budget; to keep account
of all income and expenditure; and to keep a running comparison between budget and accounts.

Alastair Cockburn described the Ports and Adapters Architecture (earlier known as the hexagonal architecture) in which the application’s use cases (i.e. the ways in which it solves problems) are at the core, and everything else is kept at a distance through adapters which can easily be replaced.