Skip to content

Why your app is not massively parallel software

That trash can Mac Pro that hasn’t been updated in years? It’s too hard to write software for.

Now, let’s be clear, there are any number of abstractions that have been created to help programmers parallelise their thing, from the process onward. If you’ve got a loop and can add the words #pragma omp parallel for to your code, then your loop can be run in parallel over as many threads as you like. It’s not hard.

Making sure that the loop body can run concurrently with itself is hard, but there are some rules to follow that either make it easy or tell you when to avoid trying. But you’re still only using the CPU, and there’s that whole dedicated GPU to look after as well.

Even with interfaces like OpenCL, it’s difficult to get this business right. If you’ve been thinking about your problem as objects, then each object has its own little part of the data – but now you need to get that information into a layout that’ll be efficient for doing the GPU work, then actually do the copy, then copy the results back from the GPU memory…is doing all of that worth it?

For almost all applications, the answer is no. For almost no applications, the answer is occasionally. For a tiny number of applications, the answer is most of the time, but if you’re writing one of those then you’re a scientist or a data “scientist” and probably not going to get much value out of a deskside workstation anyway.

What’s needed for that middle tier of applications is the tools – by which I mostly mean the libraries – to deal with this problem when it makes sense. You don’t need visualisations that say “hey, if you learned a different programming language and technique and then applied it to this little inner loop you could get a little speed boost for the couple of seconds that one percent of users will use this feature every week” – you need implementations that notice that and get on with it anyway.

The Mac Pro is, in that sense, the exact opposite of the Macintosh. Back in the 1980s, the Smalltalk software was ready well before there was any hardware that could run it well, and the Macintosh was a thing that took this environment that could be seen to have value, and made it kindof work on real hardware. Conversely, the Mac Pro was ready well before there was any software that could make use of it, and that’s a harder sell. The fact that, four years later, this is still true, makes it evident that it’s either difficult or not worth the effort to try to push the kind of tools and techniques necessary to efficiently use Mac Pro-style hardware into “the developer ecosystem”. Yes, there are niches that make very good use of them, but everybody else doesn’t and probably can’t.

Working Effectively with Legacy Code

I gave a talk to my team at ARM today on Working Effectively with Legacy Code by Michael Feathers. Here are some notes I made in preparation, which are somewhat related to the talk I gave.

This may be the most important book a software developer can
read. Why? Because if you don’t, then you’re part of the problem.

It’s obviously a lot easier and a lot more enjoyable to work on
greenfield projects all the time. You get to choose this week’s
favourite technologies and tools, put things together in the ways that
suit you now, and make progress because, well anything is progress
when there’s nothing there already. But throwing away an existing
system and starting from scratch makes it easy to throw away the
lessons learned in developing that system. It may be ugly, and patched
up all over the place, but that’s because each of those patches was
needed. They each represent something we learned about the product
after we thought we were done.

The new system is much more likely to look good from the developer’s
perspective
, but what about the users’? Do they want to pay again
for development of a new system when they already have one that mostly
works? Do they want to learn again how to use the software? We have
this strange introspective notion that professionalism in software
development means things that make code look good to other coders:
Clean Code, “well-crafted” code. But we should also have some
responsibility to those people who depend on us and who pay our way,
and that might mean taking the decision to fix the mostly-working
thing.

A digression: Lehman’s Laws

Manny Lehman identified three different categories of software system:
those that are exactly specified, those that implement
well-understood procedures, and those that are influenced by the
environment in which they run. Most software (including ours) comes
into that last category, and as the environment changes so must the
software, even if there were no (known) problems with it at an earlier
point in its evolution.

He expressed
Laws governing the evolution of software systems,
which govern how the requirements for new development are in conflict
with the forces that slow down maintenance of existing systems. I’ll
not reproduce the full list here, but for example on the one hand the
functionality of the system must grow over time to provide user
satisfaction, while at the same time the complexity will increase and
perceived quality will decline unless it is actively maintained.

Legacy Code

Michael Feather’s definition of legacy code is code without tests. I’m
going to be a bit picky here: rather than saying that legacy code is
code with no tests, I’m going to say that it’s code with
insufficient tests
. If I make a change, can I be confident that I’ll
discover the ramifications of that change?

If not, then it’ll slow me down. I even sometimes discard changes
entirely, because I decide the cost of working out whether my change
has broken anything outweighs the interest I have in seeing the change
make it into the codebase.

Feathers refers to the tests as a “software vice”. They clamp the
software into place, so that you can have more control when you’re
working on it. Tests aren’t the only tools that do this: assertions
(and particularly Design by Contract) also help pin down the software.

How do I test untested code?

The apparent way forward then when dealing with legacy code is to
understand its behaviour and encapsulate that in a collection of unit
tests. Unfortunately, it’s likely to be difficult to write unit tests
for legacy code, because it’s all tightly coupled, has weird and
unexpected dependencies, and is hard to understand. So there’s a
catch-22: I need to make tests before I make changes, but I need to
make changes before I can make tests.

Seams

Almost the entire book is about resolving that dilemma, and contains a
collection of patterns and techniques to help you make low-risk
changes to make the code more testable, so you can introduce the tests
that will help you make the high-risk changes. His algorithm is:

  1. identify the “change points”, the things that need modifying to
    make the change you have to make.
  2. find the “test points”, the places around the change points where
    you need to add tests.
  3. break dependencies.
  4. write the tests.
  5. make the changes.

The overarching model for breaking dependencies is the “seam”. It’s a
place where you can change the behaviour of some code you want to
test, without having to change the code under test itself. Some examples:

  • you could introduce a constructor argument to inject an object
    rather than using a global variable
  • you could add a layer of indirection between a method and a
    framework class it uses, to replace that framework class with a
    test double
  • you could use the C preprocessor to redefine a function call to use
    a different function
  • you can break an uncohesive class into two classes that collaborate
    over an interface, to replace one of the classes in your tests

Understanding the code

The important point is that whatever you, or someone else, thinks
the behaviour of the code should be, actually your customers have paid
for the behaviour that’s actually there and so that (modulo bugs) is
the thing you should preserve.

The book contains techniques to help you understand the existing code
so that you can get those tests written in the first place, and even
find the change points. Scratch refactoring is one technique: look
at the code, change it, move bits out that you think represent
cohesive functions, delete code that’s probably unused, make notes in
comments…then just discard all of those changes. This is like Fred
Brooks’s recommendation to “plan to throw one away”, you can take what
you learned from those notes and refactorings and go in again with a
more structured approach.

Sketching is another technique recommended in the book. You can draw
diagrams of how different modules or objects collaborate, and
particularly draw networks of what parts of the system will be
affected by changes in the part you’re looking at.

Build systems are a huge annoyance

Take Smalltalk. Do I have an object in my image? Yes? Well I can use it. Does it need to do some compilation or something? I have no idea, it just runs my Smalltalk.

Take Python. Do I have the python code? Yes? Well I can use it. Does it need to do some compilation or something? I have no idea, it just runs my Python.

Take C.

Oh my God.

C is portable, and there are portable operating system interface specifications for the system behaviour accessible from C, so you need to have C sources that are specific to the platform you’re building for. So you have a tool like autoconf or cmake that tests how to edit your sources to make them actually work on this platform, and performs those changes. The outputs from them are then fed into a thing that takes C sources and constructs the software.

What you want is the ability to take some C and use it on a computer. What C programmers think you want is a graph of the actions that must be taken to get from something that’s nearly C source to a program you can use on a computer. What they’re likely to give you is a collection of things, each of which encapsulates part of the graph, and not necessarily all that well. Like autoconf and cmake, mentioned above, which do some of the transformations, but not all of them, and leave it to some other tool (in the case of cmake, your choice of some other tool) to do the rest.

Or look at make, which is actually entirely capable of doing the graph thing well, but frequently not used as such, so that make all works but making any particular target depends on whether you’ve already done other things.

Now take every other programming language. Thanks to the ubiquity of the C run time and the command shell, every programming language needs its own build system named [a-z]+ake that is written in that language, and supplies a subset of make’s capabilities but makes it easier to do whatever it is needs to be done by that language’s tools.

When all you want is to use the software.

Tsundoku

I only have the word of the internet to tell me that Tsundoku is the condition of acquiring new books without reading them. My metric for this condition is my list of books I own but have yet to read:

  • the last three parts of Christopher Tolkien’s Histories of Middle-Earth
  • Strategic Information Management: Challenges and Strategies in Managing Information Systems
  • Hume’s Enquiries Concerning the Human Understanding
  • Europe in the Central Middle Ages, 962-1154
  • England in the Later Middle Ages
  • Bertrand Russel’s Problems with Philosophy
  • John Stuart Mill’s Utilitarianism and On Liberty (two copies, different editions, because I buy and read books at different rates)
  • A Song of Stone by Iain Banks
  • Digital Typography by Knuth
  • Merchant and Craft Guilds: A History of the Aberdeen Incorporated Trades
  • The Indisputable Existence of Santa Claus
  • Margaret Atwood’s The Handmaid’s Tale

And those are only the ones I want to read and own (and I think that list is incomplete – I bought a book on online communities a few weeks ago and currently can’t find it). Never mind the ones I don’t own.

And this is only about books. What about those side projects, businesses, hobbies, blog posts and other interests I “would do if I got around to it” and never do? Thinking clearly about what to do next and keeping expectations consistent with what I can do is an important skill, and one I seem to lack.

Full Stack

A full-stack software engineer is someone who is comfortable working at any layer, from code and systems through team members to customers.

FOSDEM

My current record of FOSDEM attendance sees me there once per decade: my first visit was in 2007 and I’m having breakfast in my hotel at the end of my second trip. I should probably get here more often.

Unlike a lot of the corporate conferences I’ve been to in other fields, FOSDEM is completely free and completely organised by its community. An interesting effect of this is that whole there’s no explicit corporate presence, you’ll see companies represented if they actually support free and open source software as much as they claim. Red Hat doesn’t have a stand, but pick up business cards from the folks at CentOS, Fedora, GNOME, ManageIQ…

When it comes to free software, I’m a jack of many trades and a master of none. I have drive-by commits in a few different projects including FreeBSD and clang, and recently launched the GNUstep developer guide to add some necessary documentation, but am an expert nowhere.

That makes FOSDEM an exciting selection box of new things to learn, many of which I know nothing or little about. That’s a great situation to be in; it’s also unsurprising that I know so little as I’ve only been working with free software (indeed, any software) for a little over a decade.

Rust project organisation

Coercion over configuration.

The package management paradox

There was no need to build a package management system since CPAN, and yet npm is the best.
Wait, what?

Every time a new programming language or framework is released, people seem to decide that:

  1. It needs its own package manager.

  2. Simple algorithms need to be rewritten from scratch in “pure” $language/framework and distributed as packages in this package manager.

This is not actually true. Many programming languages – particularly many of the trendy ones – have a way to call C functions, and a way to expose their own routines as C functions. Even C++ has this feature. This means that you don’t need any new packaging system, if you can deploy packages that expose C functions (whatever the implementation language) then you can use existing code, and you don’t need to rewrite everything.

So there hasn’t been a need for a packaging system since at least CPAN, maybe earlier.

On the other hand, npm is the best packaging system ever because people actually consume existing code with it. It’s huge, there are tons of libraries, and so people actually think about whether this thing they’re doing needs new code or the adoption of existing code. It’s the realisation of the OO dream, in which folks like Brad Cox said we’d have data sheets of available components and we’d pull the components we need and bind them together in our applications.

Developers who use npm are just gluing components together into applications, and that’s great for software.

On the fitness for purpose of a software model

In which the quantity 1/"booleans per module" is proposed as a software quality metric, and readers are left hanging.

New project: the GNUstep developer guide

I discovered by searching the interwebs that a significant number of people who try out GNUstep get stuck at the “I wanted to do Objective-C on my Linux so I installed GNUstep…now what?” stage. There are some tutorials for GNUstep around, but they’re not necessarily easy to find, and not necessarily pitched at beginners. Otherwise, you’re told to look at the Cocoa documentation, and as Xcode’s user interface turned into a combine harvester, Apple moved to Swift, and other changes happened, the relevance of Apple’s documentation to GNUstep has been on the wane for years.

Therefore today I’m launching the GNUstep Developer Guide. It’s not yet pretty, it’s not yet complete, but it is a place to look for GNUstep documentation written for GNUstep programmers. The first guide is up: the introduction to ProjectCenter and GORM.

Let me know if you find it useful!