Blog | Structure and Interpretation of Computer Programmers | From programmer to software engineer.

A two-dimensional dictionary

Posted on 2013-03-15 by Graham

What?

A thing I made has just been open-sourced by my employers at Agant: the AGTTwoDimensionalDictionary works a bit like a normal dictionary, except that the keys are CGPoints meaning we can find all the objects within a given rectangle.

Why?

A lot of time on developing Discworld: The Ankh-Morpork Map was spent on performance optimisation: there’s a lot of stuff to get moving around a whole city. As described by Dave Addey, the buildings on the map were traced and rendered into separate images so that we could let characters go behind them. This means that there are a few thousand of those little images, and whenever you’re panning the map around the app has to decide which images are visible, put them in the correct place (in three dimensions; remember people can be in front of or behind the buildings) and draw everything.

A first pass involved creating a set containing all of the objects, looping over them to find out which were within the screen region. This was too slow. Implementing this 2-d index instead made it take about 20% the original time for only a few tens of kilobytes more memory, so that’s where we are now. It’s also why the data type doesn’t currently do any rebalancing of its tree; it had become fast enough for the app it was built in already. This is a key part of performance work: know which battles are worth fighting. About one month of full-time development went into optimising this app, and it would’ve been more if we hadn’t been measuring where the most benefit could be gained. By the time we started releasing betas, every code change was measured in Instruments before being accepted.

Anyway, we’ve open-sourced it so it can be fast enough for your app, too.

How?

There’s a data structure called the multidimensional binary tree or k-d tree, and this dictionary is backed by that data structure. I couldn’t find an implementation of that structure I could use in an iOS app, so cracked open the Objective-C++ and built this one.

Objective-C++? Yes. There are two reasons for using C++ in this context: one is that the structure actually does get accessed often enough in the Discworld app that dynamic dispatch all the way down adds a significant time penalty. The other is that the structure contains enough objects that having a spare isa pointer per node adds a significant memory penalty.

But then there’s also a good reason for using Objective-C: it’s an Objective-C app. My controller objects shouldn’t have to be written in a different language just to use some data structure. Therefore I reach for the only application of ObjC++ that should even be permitted to compile: an implementation in one language that exposes an interface in the other. Even the unit tests are written in straight Objective-C, because that’s how the class is supposed to be used.

Posted in code-level, iPad, iPhone, Mac, OOP, performance, software-engineering | Comments Off

“You could simply do X” costs more

Posted on 2013-03-15 by Graham

Someone always says it. “Could you just add this?” or “I don’t think it would be too hard to…” or if somebody else “changes these two simple things”, someone might create a completely bug-compatible, scale-compatible implementation of this other, undocumented service…wait, what?

Many of us are naturally optimistic people. We believe that the problems that befall others, or that we’ve experienced before, will not happen this time. That despite the last project suffering from “the code getting messier and messier”, we’ll do it right this time.

Optimism’s great. It tricks us into trying to solve difficult problems. It convinces us that the solution is “just around the corner”, so we should persevere. The problems start to arise when we realise that everyone else is optimistic, too—and that optimism is contagious. If you’re asked to give a drive-by estimate on how hard something is, or how long it takes, you’ll give an answer that probably doesn’t take into account all the problems that might arise. But now two of you believe in this optimistic estimate: after all, you’re a smart person, you’re trusted to give good estimates.

We need to be careful when talking to people who aren’t developers to make it clear that there’s no such thing as “simply” in most software systems. That “simply” adding a field brings with it all sorts of baggage: placing the field in an aesthetically pleasing fashion across multiple localised user interfaces, localising the field, building the user experience of interacting with the field and so on. That using the value from the field could turn it from a complicated problem into a complex problem, particularly if the field is just selecting between multiple implementations of what may even be multiple interfaces. That just adding this field brings not only work, but additional risk. That these are just the problems we could think of up front; there are often more that get uncovered as we begin to shave the yak.

But clearly we also need to bear in mind the problems we’ve faced and continue to face when talking to each other. We should remember that the last thing we tried to simply do ended up chasing a rabbit down a hole. If I don’t think that I can “simply” do something without unexpected complexity and risk, I should not expect that others can “simply” do it either.

Posted in Uncategorized | Comments Off

The Liskov Citation Principle

Posted on 2013-03-08 by Graham

In her keynote speech at QCon London 2013 on The Power of Abstraction, Barbara Liskov referred to several papers contemporary with her work on abstract data types. I’ve collected these references and found links to free copies of the articles where available.

Dijkstra 1968 Go To statement considered harmful

Wirth 1971 Program development by stepwise refinement

Parnas 1971 Information distribution aspects of design methodology

Liskov 1972 A design methodology for reliable software systems

Schuman and Jorrand 1970 Definition mechanisms in extensible programming languages
Not apparently available online for free

Balzer 1967 Dataless Programming

Dahl and Hoare 1972 Hierarchical program structures
Not apparently available online for free

Morris 1973 Protection in programming languages

Liskov and Zilles 1974 Programming with abstract data types

Liskov 1987 Data abstraction and hierarchy

Posted in code-level, documentation, OOP, software-engineering, Talk | 2 Comments

When all you have is a NailFactory…

Posted on 2013-03-07 by Graham

…every problem looks like it can be solved by configuring a different nail.

We have an obsession with tools in the software industry. We’ve built tools for building software, tools for testing software, tools for recording how the software is broken, tools for recording when we fixed software. Tools for migrating data from the no-longer-cool tools into the cool tools. Tools for measuring how much other tools have been used.

Let’s call this Tool-Driven Development, and let’s give Tool-Driven Development the following manifesto (a real manifesto that outlines intended behaviour, not a green paper):

Given an observed lack of consideration toward attribute x, we Tool-Driven Developers commit to supplying a tool that automates the application of attribute x.

So, if your developers aren’t thinking about testing, we’ll make a tool to make the tests they don’t write run quicker! If your developers aren’t doing performance analysis, we’ve got all sorts of tools for getting the very reports they don’t know that they need!

This fascination with creating tools is a natural consequence of assuming that everyone[*] is like me. I’ve found this problem that I need to solve, surely everyone needs to solve this problem so I’ll write a tool. Then I can tell people how to use this tool and the problem will be solved for everyone!

[*]Obviously not everyone, just everyone who gets it. Those clueless [dinosaurs clinging to the old tools|hipsters jumping on the new bandwagons] don’t get it, and I’m not talking about them.

No. Well, not yet. We’ve skipped two important steps out of a three-step enlightenment scheme:

Awareness. Tell me what the unknown that I don’t know is.
Education. Tell me why this thing that I now know about is a big deal, what I’m missing out on, what the drawbacks are, and why solving it would be beneficial.
Training. Now that I know this thing exists, and that I should do something about it, and what that something is, now is the time to show me the tools and how I can use them to solve my new problem.

One of the talks at QCon London was by Damian Conway on dead languages. It covered these three features almost in reverse, to make the point that the tools we use constrain our mental models of the problems we’re trying to solve. Training: here’s a language, this is how it works, this is a code problem solved in that language. Education: the language has these features which lets us write our code in this way with these limitations. Awareness: there are ways to write code, and hence to solve problems in software, that aren’t the way you’re currently doing it.

A lot of what I’ve worked on has covered awareness without going further. The App Makers’ Privacy Pledge raises awareness that privacy in mobile apps is a problem, without discussing the details of the problem or the mechanics of a solution. APPropriate Behaviour contains arguments expressing that programmers should be aware of the social scope in which their programming activities sit.

While I appreciate and even accept the charge of intellectual foreplay, I think a problem looking for a solution is more useful than a solution looking for a problem. Still, with some of us doing the awareness thing and others doing the training thing, a scheme by which we can empower ourselves and future developers is clear: let’s team up and meet in the middle.

Posted in advancement of the self, software-engineering | Comments Off

A note on notes

Posted on 2013-03-02 by Graham

I’ve always had a way to take notes, but have never settled into a particular scheme. This post, more for my benefit than for yours, is an attempt to dig through this history and decide what I want to do about it.

At the high level, the relevant questions are what I want to do with the contents now and how I intend to work with them in the future. Most of the notes I take don’t have a long-term future; my work from my first degree has long been destroyed. I referred to the notes during the degree which gives an upper bound on the lifetime of four years, realistically more like 2 from creation to the exam where I needed the notes.

Said notes were taken on A4 ruled paper with a cartridge pen and a propelling pencil. Being able to use text (including maths symbols etc) and diagrams interchangeably is a supremely useful capability. It even helps with code especially where UI or geometry is involved.

I no longer do this, but my strategy then was to take rapid notes in lectures and classes, and produce fair copies later. This meant absorbing more from the notes as I re-read them and put them into different words, and let me add cross references to textbooks or other materials.

I’ve used pen-and-paper note taking at other times. Particularly in classrooms or conferences, it’s much faster than typing. At various phases of my career I’ve also kept log books, either for my own benefit or other people. That’s not something I do currently. The weapons of choice in this sphere are now fountain pen, propelling pencil and Moleskine.

Evernote is my note shoebox of choice, and my destination for typing notes (in fact this draft was built up in Evernote on an iPhone, rather than a blog editor). I don’t just use Macs and iOS so an iCloud-based note shoebox wouldn’t work for me.

I sometimes put notes handwritten in books or on whiteboards in there too, but don’t really worry about tagging because I usually search chronologically. My handwriting is so poor that Evernote’s transcription doesn’t work at all which is probably something that keeps me away from search. When it comes to symbols etc I’m more likely to put LaTeX markup in the text than draw equation images or use the extended characters palette.

When I was at O2 I had a dalliance with the Bamboo stylus and Penultimate. I still use those for drawing but never for writing as the poor sensitivity makes my narrow handwriting look even worse. I haven’t tried anything with a dedicated stylus sensor like the Jot stylus, or the Galaxy S-pen. Again these get dumped into Evernote. I don’t tend to change colours or pens; I tried Paper by 53 but don’t use it much in practice.

Mind maps or outlines: sometimes. I only ever do these in software, never on paper.

I think the summary is that handwritten notes are fastest and allow the biggest variation in formatting and content. Sticking the resulting notes in Evernote helps to go back through them, but I should try to recover the discipline of writing up a fair copy. It helps cement the content in my mind and gives me a chance to add external references and citations that I would otherwise miss out.

The trick with paper-based notes is to always have a notebook and pen to hand; I don’t often carry things around with me so I’d either have to get into the habit of wearing a manbag or leave notebooks around wherever I’m likely to want to write something.

Posted in tool-support | Comments Off

How to version a Mach-O library

Posted on 2013-03-01 by Graham

Yes, it’s the next instalment of “cross-platform programming for people who don’t use Macs very much”. You want to give your dynamic library a version number, probably of the format major.minor.patchlevel. Regardless of marketing concerns, this helps with dependency management if you choose a version convention such that binary-compatible revisions of the libraries can be easily discovered. What could possibly go wrong?

The linker will treat your version number in the following way (from the APSL-licensed ld64/ld/Options.cpp) if you’re building a 32-bit library:

//
// Parses number of form X[.Y[.Z]] into a uint32_t where the nibbles are xxxx.yy.zz
//
uint32_t Options::parseVersionNumber32(const char* versionString)
{
	uint32_t x = 0;
	uint32_t y = 0;
	uint32_t z = 0;
	char* end;
	x = strtoul(versionString, &end, 10);
	if ( *end == '.' ) {
		y = strtoul(&end[1], &end, 10);
		if ( *end == '.' ) {
			z = strtoul(&end[1], &end, 10);
		}
	}
	if ( (*end != '\0') || (x > 0xffff) || (y > 0xff) || (z > 0xff) )
		throwf("malformed 32-bit x.y.z version number: %s", versionString);

	return (x << 16) | ( y << 8 ) | z;
}

and like this if you’re building a 64-bit library (I’ve corrected an obvious typo in the comment here):

//
// Parses number of form A[.B[.C[.D[.E]]]] into a uint64_t where the bits are a24.b10.c10.d10.e10
//
uint64_t Options::parseVersionNumber64(const char* versionString)
{
	uint64_t a = 0;
	uint64_t b = 0;
	uint64_t c = 0;
	uint64_t d = 0;
	uint64_t e = 0;
	char* end;
	a = strtoul(versionString, &end, 10);
	if ( *end == '.' ) {
		b = strtoul(&end[1], &end, 10);
		if ( *end == '.' ) {
			c = strtoul(&end[1], &end, 10);
			if ( *end == '.' ) {
				d = strtoul(&end[1], &end, 10);
				if ( *end == '.' ) {
					e = strtoul(&end[1], &end, 10);
				}
			}
		}
	}
	if ( (*end != '\0') || (a > 0xFFFFFF) || (b > 0x3FF) || (c > 0x3FF) || (d > 0x3FF)  || (e > 0x3FF) )
		throwf("malformed 64-bit a.b.c.d.e version number: %s", versionString);

	return (a << 40) | ( b << 30 ) | ( c << 20 ) | ( d << 10 ) | e;
}

The specific choice of bit widths in both variants is weird (why would you have more major versions than patchlevel versions?) and the move from 32-bit to 64-bit makes no sense to me at all. Nonetheless, there’s a general rule:

Don’t use your SCM revision number in your version numbering scheme.

The rule of thumb is that the major version can always be less than 65536, the minor versions can always be less than 256 and you can always have up to two minor version numbers. Trying to supply a version number that doesn’t fit in the bitfields defined here will be a linker error, and you will not go into (address) space today.

Posted in Uncategorized | Comments Off

How to handle Xcode in your meta-build system’s iOS or Mac app target

Posted on 2013-02-28 by Graham

OK, I’ve said before in APPropriate Behaviour that I dislike build systems that build other build systems:

Some build procedures get so complicated that they spawn another build system that configures the build environment for the target system before building. An archetypal example is GNU autotools – which actually has a three-level build system. Typically the developers will run `autoconf`, a tool that examines the project to find out what questions the subsequent step should ask and generates a script called `configure`. The user downloads the source package and runs `configure`, which inspects the compilation environment and uses a collection of macros to create a Makefile. The Makefile can then compile the source code to (finally!) create the product.

As argued by Poul-Henning Kamp, this is a bad architecture that adds layers of cruft to work around code that has not been written to be portable to the environments where it will be used. Software written to be built with tools like these is hard to read, because you must read multiple languages just to understand how one line of code works.

One problem that arises in any cross-platform development is that assumptions about “the other platforms” (being the ones you didn’t originally write the software on) are sometimes made based on one of the following sources of information:

none
a superficial inspection of the other platform
analogy to the “primary” platform

An example of the third case: I used to work on the Mac version of a multi-platform product, certain core parts of which were implemented by cross-platform libraries. One of these libraries just needed a little configuration for each platform: tell it what file extension to use for shared libraries, and give it the path to the Registry.

What cost me a morning today was an example of the second case: assuming that all Macs are like the one you tried. Let me show you what I mean. Here’s the contents of /Developer on my Mac:

$ ls /Developer/
WebObjects

Wait, where’s Xcode? Oh right, they moved it for the App Store builds didn’t they?

$ ls /Applications/Xcode.app
ls: /Applications/Xcode.app: No such file or directory

WHAAAAA?

OMFG!

Since Xcode 2.5, Xcode has been relocatable and can live anywhere on the filesystem. Even if it is in one of the usual places, that might not be the version a developer wants to use. I keep a few different Xcodes around: usually the current one, the last one I knew everything worked on, and a developer preview release when there is one. I then also tend to forget to throw old Xcodes away, so I’ve got 4 different versions at the moment.

But surely this is all evil chaos from those crazy precious Mac-using weirdos! How can you possibly cope with all of that confusion? Enter xcode-select:

$ xcode-select -print-path
/Applications/Xcode4.6.app/Contents/Developer

Xcode-select is in /usr/bin, so you don’t have the bootstrapping problem of trying to find the tool that lets you find the thing. That means that you can always rely on it being in one place for your scripts or other build tools. You can use it in a shell script:

XCODE_DEVELOPER_DIR=`/usr/bin/xcode-select -print-path`

or in a CMake file:

exec_program(/usr/bin/xcode-select ARGS -print-path OUTPUT_VARIABLE XCODE_DEVELOPER_DIR)

or in whatever other tool you’re using. The path is manually chosen by the developer (using the -switch option), so if for some reason it doesn’t work out (like the developer has deleted that version of Xcode without updating xcode-select), then you can fall back to looking in default locations.

Please do use xcode-select as a first choice for finding Xcode or the developer folder on any Mac system, particularly if your project uses a build generator. It’s more robust to changes—either from Apple or from the users of that Mac—than relying on the developer tools being installed to their default location.

Posted in code-level, tool-support | Comments Off

I just updated Appropriate Behaviour

Posted on 2013-02-23 by Graham

The new release of Appropriate Behaviour—the book about things programmers should do that aren’t programming—is now up. The most obvious, and most awesome, change in this update is a fabulous new cover, designed by Sebastian Hermida of leanpubcovers.com. Should you be in the market for a cover page, I’d strongly recommend him.

Other changes in this release include additions to the (ends of the) chapters on coding practices and learning, and I’ve added part of a new chapter on requirements engineering. As ever, discussion of the book is welcome in its glassboard, details of which are in the introduction.

I’ve found it really interesting in researching this book that I can go back decades and find information that has either been forgotten, or was seemingly ignored even at the time of publication. I think it’s quite clear that there’s a gulf between software as practiced by people who make software, and software as researched by academics; it’s therefore not surprising to see journal articles that apparently never got read by commercial sector developers.

What is more interesting is the extent to which “mainstream” programming books, including ones that apparently made a big splash at the time of their publication, no longer seem relevant. They’ve either been completely dropped from our consciousness (hands up everyone who’s read Peopleware in the last five years), or have been adapted into a one-sentence précis that’s become part of the mythology of programming. A thought experiment by way of an example of this mythologising: quote any sentence from The Mythical Man Month except the one about adding people to a late project. What was the rest of the book about? Is anything else in it relevant to what we do today? Do we know that even that sentence is relevant, or does it just sound plausible?

I’ve been having lots of fun discovering these forgotten entries in our history and bringing some of them into a modern story about programming. But Appropriate Behaviour is not a history book; if anything, it’s a book on social anthropology. The lesson to learn from this post is that it’s not the first anthropological study of programmers; I’d argue that 1971’s The Psychology of Computer Programming is more anthropology than it is psychology. It’s very different from Appropriate Behaviour but they both tread the same ground, analysing the problems faced by a programmer that aren’t directly related to telling a computer what to do.

I imagine the history book would be fun to write, though for the moment I present this, which I hope is also fun to read.

Posted in advancement of the self, books | Comments Off

Happy Birthday, Objective-C!

Posted on 2013-02-22 by Graham

OK, I have to admit that I actually missed the party. Brad Cox first described his “Object-Oriented pre-compiler”, OOPC, in The January 1983 issue of ACM SIGPLAN Notices.

This describes the Object Oriented Pre-Compiler, OOPC, a language and a run-time library for producing C programs that operate by the run-time conventions of Smalltalk 80 in a UNIX environment. These languages offer Object Oriented Programming in which data, and the programs which may access it, are designed, built and maintained as inseparable units called objects.

Notice that the abstract has to explain what OOP is: these were early days at least as far as the commercial software industry viewed objects. Reading the OOPC paper, you can tell that this is the start of what became known as Objective-C. It has a special syntax for sending Smalltalk-style messages to objects identified by pointers to structures, though not the syntax you’ll be used to:

someObject = {|Object, "new"|};
{|myArray, "addObject:", someObject|};

The infix notation [myArray addObject:someObject]; came later, but by 1986 Cox had published the first edition of Object-Oriented Programming: An Evolutionary Approach and co-founded Productivity Products International (later Stepstone) to capitalise on the Objective-C language. I’ve talked about the version of ObjC described in this book in this post, and the business context of this in Software ICs and a component marketplace.

It’s this version of Objective-C, not OOPC, that NeXT licensed from PPI as the basis of the Nextstep API (as distinct from the NEXTSTEP operating system: UNIX is case sensitive, you know). They built the language into a fork of the GNU Compiler Collection, and due to the nature of copyleft this meant they had to make their adaptations available, so GCC on other platforms gained Objective-C too.

Along the way, NeXT added some features to the language: compiler-generated static instances of string classes, for example. They added protocols: I recorded an episode of NSBrief with Saul Mora discussing how protocols were originally used to support distributed objects, but became important design tools. This transformation was particularly accelerated by Java’s adoption of protocols as interfaces. At some (as far as I can tell, not well documented) point in its life, Stepstone sold the rights to ObjC to NeXT, then licensed it back so they could continue supporting their own compiler.

There isn’t a great deal of change to Objective-C from 1994 for about a decade, despite or perhaps due to the change of stewardship in 1996/1997 as NeXT was purchased by Apple. Then, in about 2003, Apple introduced language-level support for exceptions and critical sections. In 2007, “Objective-C 2.0” was released, adding a collection enumeration syntax, properties, garbage collection and some changes to the runtime library. Blocks—a system for supporting closures that had been present in Smalltalk but missing from Objective-C—were added in a later release that briefly enjoyed the name “Objective-C 2.1”, though I don’t think that survived into the public release. To my knowledge 2.0 is the only version designation any Apple release of Objective-C has had.

Eventually, Apple observed that the autozone garbage collector was inappropriate for the kind of software they wanted Objective-C programmers to be making, and incorporated reference-counted memory management from their (NeXT’s, initially) object libraries into the language to enable Automatic Reference Counting.

And that’s where we are now! But what about Dr. Cox? Stepstone’s business was not the Objective-C language itself, but software components, including ICPak101, ICPak201 and the TaskMaster environment for building applications out of objects. It turned out that the way they wanted to sell object frameworks (viz. in a profitable way) was not the way people wanted to buy object frameworks (viz. not at all). Cox turned his attention to Digital Rights Management, and warming up the marketplace to accept pay-per-use licensing of digital artefacts. He’s since worked on teaching object-oriented programming, enterprise architecture and other things; his blog is still active.

So, Objective-C, I belatedly raise my glass to you. You’re nearly as old as I am, and that’s never likely to change. But we’ve both grown over that time, and it’s been fun growing up with you.

Posted in AAPL, code-level, social-science | Comments Off

How to find me

Posted on 2013-02-15 by Graham Lee

It came to my attention this week that people are finding me via Google, which (unsurprisingly) links to here. I’ve been blogging for a couple of years at Secure Mac Programming, and I’m on twitter as @iwasleeg. I’m +Graham Lee on Google Plus, too. My email is graham at iamleeg dot com.

Posted in whatevs | Leave a comment

Structure and Interpretation of Computer Programmers