Representativeness in Software Engineering Research

The first paragraph describes the context of this post in relation to the blog on which it originally appeared, not blog.securemacprogramming.com.

For this post, I wanted to go a little bit meta. One focus of this blog will be on whether results from academic software engineering are applicable to the work I do as a commercial software developer, so it was hard to pass up this Microsoft Research paper on representativeness of research.

In a nutshell, the problem is this: imagine that you read some study that shows, for example, that schedule slippage on a software project is significantly lessened if developers are given two digestive biscuits and a cup of tea at 4pm on working days. The study examined the breaktime habits of developers on 500 open source projects. This sounds quite convincing. If this thing with the tea and biscuits is true across five hundred projects, it must be applicable to my project, right?

That doesn’t follow. There are many reasons why it might not follow: the study may be biased. The analysis may be wrong. The biscuit thing may be significant but not the root cause of success. The authors may have selected projects that would demonstrate the biscuit outcome. Projects that had initially signed up but got delayed might have dropped out. This paper evaluates one cause of bias: the projects used in the study aren’t representative of the project you’re doing.

It’s a fallacy to assume that just because a study has a large sample size, its results can be generalised to the population. This only applies in the case that the sample represents an even slice of the population. Imagine a world in which all software projects are either written in Java or LISP. Now it doesn’t matter whether I select 10 projects or 10,000 projects: a sample of LISP practices will not necessarily tell us anything about how to conduct a Java project.

Conversely a study that investigates both Java and LISP projects can—in this restricted universe, and with the usual “all other things being equal” caveat—tell us something generally about software projects independent of language. However, choice of language is only one dimension in which software can be measured: the size, activity, number of developers, licence and other factors can all be relevant. Therefore the possible phase space of important factors can be multidimensional.

In this paper the authors develop a framework, based on work in medicine and other fields, for measuring and maximising representativeness of a sample by appropriate selection of projects along the dimensions of the problem space. They apply the framework to recent research.

What they discovered, tabulated in Table II of the paper, is that while a very small, carefully-selected sample can be surprisingly representative (50 out of ~20k projects represented ~15% of their problem space), the ~200 projects they could find analysed in recent research only scored around 9% on their representativeness metric. However in certain dimensions the studies were highly representative, many covering 100% of the phase space in specific dimensions.

Conclusions

A fact that jumped out at me, because of the field I work in, is that there are 245 Objective-C projects in the universe studied by this paper (the projects indexed on Ohloh) and that not one of these is covered by any of the studies they analysed. That could mean that my own back yard is ripe for the picking: that there are interesting results to be determined by analysing Objective-C projects and comparing those results with received wisdom.

In the discussion section, the authors point out that just because a study is not general, does not mean it is not useful. You may not be able to generalise a result gleaned from analysing (say) Java developer tools to all software development, but if what you’re interested in is Java developer tools then that is not a problem.

What this paper gives us, then, is not necessarily a tool that commercial developers can run out and use. It gives us some important quantitative context on evaluating the research that we do read. And, should we want to analyse our own work and investigate hypotheses about factors affecting our projects, it gives us a framework to understand just how representative those analyses would be.

Garbage-collected Objective-C

When was a garbage collector added to Objective-C? If you follow Apple’s work with the language, you might be inclined to believe that it was in 2008 when AutoZone was added as part of Objective-C 2.0 (the AutoZone collector has since been deprecated by Apple, and I’m not sure whether anyone else ever adopted it).

With a slightly wider knowledge of the language’s history, you can push this date back a bit. The GNUstep project—a Free Software reimplementation of Apple’s (formerly NeXT’s) APIs—has been using the Boehm–Demers–Weiser collector for a while. How long? I can’t tell exactly, but a keyword search in the project’s version control logs makes me think that most of the work to support it was done by one person in mid-2002:

r13976 | nico | 2002-06-26 15:34:16 +0100 (Wed, 26 Jun 2002) | 3 lines

Do not add -lobjc_gc -lgc flags when compiling with gc=yes – should now
be added automatically by gnustep-make


r13971 | nico | 2002-06-25 18:28:56 +0100 (Tue, 25 Jun 2002) | 2 lines

Tidyup for gc=yes with old compilers


r13970 | nico | 2002-06-25 18:23:05 +0100 (Tue, 25 Jun 2002) | 2 lines

Tidied code to compile with gc=yes and older compilers


r13969 | nico | 2002-06-25 13:36:11 +0100 (Tue, 25 Jun 2002) | 3 lines

Tidied some indentation; a couple of insignificant changes to have it compile
under gc


r13968 | nico | 2002-06-25 13:15:04 +0100 (Tue, 25 Jun 2002) | 2 lines

Tidied code which wouldn’t compile with gc=yes and gcc < 3.x


r13967 | nico | 2002-06-25 13:13:19 +0100 (Tue, 25 Jun 2002) | 2 lines

Tidied code which was not compiling with the garbage collector


r13966 | nico | 2002-06-25 13:12:17 +0100 (Tue, 25 Jun 2002) | 2 lines

Tidied code which was not compiling with gc=yes

That was, until fairly recently, the earliest example I knew about. Then I discovered a conference talk by Paulo Ferreira:

Reclaiming storage in an object oriented platform supporting extended C++ and Objective-C applications

This is a paper presented at “1991 International Workshop on Object Orientation in Operating Systems”. 1991. That is—obviously—11 years before GNUstep’s GC work and 17 years before Apple released AutoZone.

Comandos

The context in which this work was being done is a platform called Comandos. I’d never heard of that before—and I thought I knew Objective-C!

Judging from the report linked above, Comandos is a platform for distributed and parallel object-oriented software, based on UNIX but supporting multiple variants. The fact that it was created in 1986 means that both the languages supported—Objective-C and C++—were new at the time. Indeed the project was contemporary with the development of NeXTSTEP, which was publicly released to developers in 1988.

The 1994 summary report doesn’t mention Objective-C: just C++, Eiffel and a bespoke language called Guide. It’s possible that the platform supported ObjC simply because they used gcc which picked up ObjC support during the life of Comandos; however this seems unlikely as there would be significant work in making Objective-C objects work with their platform’s distributed messaging interface and persistence subsystem.

Why ObjC should be one of two languages mentioned (along with C++) in the 1991 paper on garbage collection, but zero of three mentioned (C++, Eiffel, Guide) in 1994 will have to remain a mystery for now. Looking into the references for Ferreira’s paper, I can find one mention of Objective-C as the inspiration for their own, custom C-based message dispatch system, but no indication that they actually used Objective-C.

The Garbage Collector

I’m not really an expert at garbage collectors. In fact, I have no idea what I’m doing. I appreciate them when they’re around, and leak or crash things occasionally when they’re not.

To my uneducated eye, the description of the Ferreira 1991 garbage collector and Apple’s description of their collector (no link I’m afraid, it was session 940 at WWDC 2008) look quite different. AutoZone is conservative (like B-W-D) and only works on Objective-C objects. Ferreira’s collector operates, like B-W-D, on any memory block including new C++ instances and C heap allocations. Apple’s collector is supposed to avoid blocking wherever it can, a constraint not mentioned in the Ferreira paper.

All of Comandos, GNUstep and Cocoa (Apple’s Objective-C framework) have systems for distributed objects that complicate collection: does some remote process have a handle on memory in my address space? The proxy system used by Cocoa and GNUstep make it easy to answer this question. Comandos used a different technique, where objects local to a process were “volatile” and objects shared between processes were “persistent”. Persistent objects were subject to a different lifecycle management process, so the Ferreira GC didn’t interact with them.

As an aside, Apple’s garbage collector also needed to provide a “mixed mode”—support for code that could be loaded into either a garbage-collected or manually managed process.

Conclusions

Memory management is hard. Making programmers do it themselves leads to all sorts of problems. Doing it automatically is also hard, and many different approaches have been tried over the last few decades. Interestingly, Apple has (for the moment) settled on a “none of the above” approach, using a compiler-inserted reference counting system based on the manual ownership tracking previously implemented by the frameworks.

What interests me most about this paper on Objective-C garbage collection is not so much its technical content (which it’s actually rather light on, containing only conversational overviews of the algorithms and no information about results), but the fact that it existed at all and I, as someone who considers himself an experienced Objective-C programmer, did not know anything about it or its project.

That’s why I started this blog [Ed: referring to the blog these posts are imported from] by discussing it. A necessary prerequisite to deciding whether the literature has something useful to tell us is knowing about its existence. I’m really surprised that it took so long for me to find out about something that’s almost directly related to my everyday work. Mind you, maybe I shouldn’t feel too bad: the author of AutoZone told me he hadn’t heard of it, either.

Programming Literate Manifesto

Late last year, I decided to set up a second blog, focusing on exploring the world of academic literature relevant to our work as people who make software. The tone and content was very different to what I usually write here. I’ve now decided that while it’s interesting to explore this material, it was a mistake to try creating a second identity for this. I want to write about it, this is where I write, it belongs here. There are currently only a few posts at the other blog, so I’m going to import them all. If you’ve already read them or this content doesn’t interest you, filter the “academia” category.

This first one set the stall for the remaining posts. One thing made clear in the manifesto was that I wanted to encourage discussion: I’m not convinced blog comments are the place for that so comments remain off for the time being.

Programming Literate Manifesto

This blog is written by what you might call a “practising software engineer”, working in the field of mobile software. I’m hoping for a few things from the articles here, which fall into three main categories:

  • introduce more of “the primary literature” to people at the software coal face. Explore the applicability of research material to what we’re doing. Bring some more critical appraisal to the field. Invite discussion from working programmers about the relevance of the articles discussed.
  • get input from academics about related work, whether the analyses here are balanced, and how the researchers see the applicability of the work covered here to current practice. Welcome academics to the discussions on the articles – in other words to make this blog part of the interface between research and practice.
  • find out about some interesting work and have fun writing about it.

Sources

Papers and articles in this blog have to come from where I can find them, obviously. Largely that’s going to mean using the following resources:

Where the articles I cover are available online I’ll be linking to them, preferring free downloads over paywalled sites. Yes, IEEE, I’m looking at you.

Sandbox

I don’t know how easy it is to be truly dispassionate when writing, so it makes sense to lay out my stall. Hopefully that means intrinsic biases can be uncovered.

In my opinion, “software engineering” is the social science that describes how people involved in making software work and communicate with each other—and to some extent how they communicate to the computers too, at least so far as the created software and source code are tools that enable collaboration and are also the result of such.

That makes it quite a wide discipline (perhaps actually an interface discipline, or multiple disciplines looking for an interface). There’s some sociology and ethnography involved in identifying and describing the culture or cultures of software teams and communities. There’s the management science side, attempting to define “success” at various activities related to making software, trying to discover how to measure for and maximise such success. Questions exist over whether programming is best taught as an academic or vocational discipline, so education science is also on-topic. Then there’s the usability/HCI side related to the tools we use, and the mathematics and computer science that go into the building of those tools.

Just because a field is not a “hard” science, does not mean that useful results cannot be derived. It means that you have to be quite analytical about how someone else’s results apply to your circumstances, perhaps. That’s not to dismiss evidence-based software engineering out of hand, but to say that any result that is said to apply to all of software engineering in general needs to be peered at quite closely.

About the name

It’s quite simply a pun on Donald Knuth’s Literate Programming.