Two books

A member of a mailing list I’m on recently asked: what two books should be on every engineer’s bookshelf? Here’s my answer.

Many software engineers, the ones described toward the end of Code Complete 2, would benefit most from Donald Knuth’s The Art of Computer Programming and Computers and Typesetting. It is truly astounding that one man has contributed so comprehensively to the art of variable-height monitor configurations.

If, to misquote Bill Hicks, “you’ve got yourself a reader”, then my picks are coloured by the fact that I’ve been trying to rehabilitate Object-Oriented Design for the last few years, by re-introducing a couple of concepts that got put aside over the recent decades:

  1. Object orientation; and
  2. Design.

With that in mind, my two recommendations are the early material from that field that I think shows the biggest divergence in thinking. Readers should be asking themselves “are these two authors really writing about the same topic?”, “where is the user of the software system in this book?”, “who are the users of the software system in this book?”, and “do I really need to choose one or other of these models, why not both or bits of both?”

  1. “Object-Oriented Programming: an evolutionary approach” by Brad Cox (there is another edition with Andrew Novobilski as a co-author). Cox’s model is the npm/CPAN model: programmers make objects (“software ICs”), describe their characteristics in a data sheet, and publish them in a catalogue. Integrators choose likely-looking objects from the catalogue and assemble an application out of them.

  2. “Object-Oriented Software Construction” by Bertrand Meyer. Meyer’s model is the “software engineering” model: work out what the system should do, partition that into “classes” based on where the data should naturally live, and design and build those classes. In designing the classes, pay particular attention to the expectations governing how they communicate: the ma as Alan Kay called the gaps between the objects.

Concurrent objects and SCOOP

Representing concurrency in an object-oriented system has been a long-standing problem. Encapsulating the concurrency primitives via objects and methods is easy enough, but doesn’t get us anywhere. We still end up composing our programs out of threads and mutexes and semaphores, which is still hard.

Prior Art

It’s worth skimming the things that I’ve written about here: I’ve put quite a lot of time into this concurrency problem and have written different models as my understanding changed.

Clearly, this is a broad problem. It’s one I’ve spent a lot of time on, and have a few satisfactory answers to if no definitive answer. I’m not the only one. Over at codeotaku they’ve concluded that Fibers are the right solution, though apparently on the basis of performance.

HPC programs are often based on concurrent execution through message passing, though common patterns keep it to a minimum: the batch processor starts all of the processes, each process finds its node number, node 0 divvies up the work to all of the nodes (a message send), then they each run through their part of the work on their own. Eventually they get their answer and send a message back to node 0, and when it has gathered all of the results everything is done. So really, the HPC people solve this problem by avoiding it.

You’re braining it wrong, Graham

Many of these designs try to solve for concurrency in quite a general way. Serial execution is a special case, where you only have one object or you don’t submit commands to the bus. The problem with this design approach, as described by Bertrand Meyer in his webinar on concurrent Object-Oriented Programming, is that serial execution is the only version we really understand. So designing for the general, and hard-to-understand, case means that generally we won’t understand what’s going on.

The reason he says this is so is that we’re better at understanding static relationships between things than the dynamic evolution of a system. As soon as you have mutexes and condition locks and so on, you are forced to understand the dynamic behaviour of the system (is this lock available? Is this condition met?). Worse: you have to understand it holistically (can anything that’s going on at the moment have changed this value?).

Enter SCOOP

Meyer’s proposal is that as serial programs are much easier to understand (solved, one might say, if one has read Dijkstra’s A Discipline of Programming) we should make our model as close to serial programming as possible. Anything that adds concurrency should be unsurprising, and not violate any expectations we had if we tried to understand our program as a sequential process.

He introduced SCOOP (Simple Concurrent Object-Oriented Programming) developed by the Concurrency Made Easy group at ETH Zürich and part of Eiffel. Some of the design decisions he presented:

  • a processor is an abstraction representing sequential execution
  • there is a many-to-one mapping of objects to processors (this means that an object’s execution is always serial, and that all objects are effectively mutexes)
  • where an object messages another on a different processor, commands will be asynchronous (but executed in order) and queries will be synchronous
  • processors are created dynamically and opportunistically (i.e. whenever you create an object in a “separate” and as-yet unpopulated domain)

An implementation of this concurrency model in Objective-C is really easy. A proxy object representing the domain separation intercepts messages, determines whether they are commands or queries and arranges for them to be run on the processor. It inspects the objects returned from methods, introducing proxies to tie them to the relevant processor. In this implementation a “processor” is a serial operation queue, but it could equivalently be a dedicated thread, a thread pulled from a pool, a dedicated CPU, or anything else that can run one thing at a time.

This implementation does not yield all of the stated benefits of SCOOP. Two in particular:

  1. The interaction of SCOOP with the Eiffel type system is such that while a local (to this processor) object can be referred to through a “separate” variable (i.e. one that potentially could be on a different processor), it is an error to try to use a “separate” object directly as if it were local. I do not see a way, in either Swift’s type system or ObjC’s, to maintain that property. It looks like this proposal, were it to cover generic or associated types, would address that deficiency.

  2. SCOOP turns Eiffel’s correctness preconditions into wait conditions. A serial program will fail if it tries to send a message without satisfying preconditions. When the message is sent to a “separate” object, this instead turns into a requirement to wait for the precondition to be true before execution.

Conclusions

Meyer is right: concurrent programming is difficult, because we are bad at considering all of the different combinations of states that a concurrent system can be in. A concurrent design can best be understood if it is constrained to be mostly like a serial one, and not require lots of scary non-local comprehension to understand the program’s behaviour. SCOOP is a really nice tool for realising such designs.

This is something I can help your team with! As you can see, I’ve spent actual years understanding and thinking about software concurrency, and while I’m not arrogant enough to claim I have solved it I can certainly provide a fresh perspective to your team’s architects and developers. Book an office hours appointment and let’s take a (free!) hour to look at your concurrency problems.

Microservices for the Desktop

In OOP the Easy Way, I make the argument that microservices are a rare instance of OOP done well:

Microservice adopters are able to implement different services in different technologies, to think about changes to a given service only in terms of how they satisfy the message contract, and to independently replace individual services without disrupting the whole system. This […] sounds a lot like OOP.

Microservices are an idea from service-oriented architecture (SOA) in which each application—each microservice—represents a distinct bounded context in the problem domain. If you’re a movie theatre complex, then selling tickets to people is a very different thing from showing movies in theatres, that are coupled loosely at the point that a ticket represents the right to a given seat in a given theatre at a given showing. So you might have a microservice that can tell people what showings there are at what times and where, and another microservice that can sell people tickets.

People who want to write scalable systems like microservices, because they can scale different parts of their application separately. Maybe each franchisee in the theatre chain needs one instance of one service, but another should scale as demand grows, sharing a central resource pool.

Never mind all of that. The real benefit of microservices is that they make boundary-crossing more obvious, maybe even more costly, and as a result developers think about where the boundaries should be. The “problem” with monolithic (single-process) applications was never, really, that the deployment cost too much: one corollary of scale is that you have more customers. It was that there was no real enforcement of separate parts of the problem domain. If you’ve got a thing over here that needs that data over there, it’s easy to just change its visibility modifier and grab it. Now this thing and that thing are coupled, whoops!

When this thing and that thing are in separate services, you’re going to have to expose a new endpoint to get that data out. That’s going to make it part of that thing’s public commitment: a slightly stronger signal that you’re going down a complex path.

It’s possible to take the microservices idea and use it in other contexts than “the backend”. In one Cocoa app I’m working on, I’ve taken the model (the representation in objects of the problem I’m solving) and put it into an XPC Plugin. XPC is a lot like old-style Distributed Objects or even CORBA or DCOM, with the exception that there are more safety checks, and everything is asynchronous. In my case, the model is in Objective-C in the plugin, and the application is in Swift in the host process.

“Everything is asynchronous” is a great reminder that the application and the model are communicating in an arm’s-reach fashion. My model is a program that represents the domain problem, as mentioned before. All it can do is react to events in the problem domain and represent the changes in that domain. My application is a reification of the Cocoa framework to expose a user interface. All it can do is draw stuff to the screen, and react to events in the user interface. The app and the model have to collaborate, because the stuff that gets drawn should be related to the problem, and the UI events should be related to desired changes in the domain. But they are restricted to collaborating over the published interface of the XPC service: a single protocol.

XPC was designed for factoring applications, separating the security contexts of different components and giving the host application the chance to stay alive when parts of the system fail. Those are valid and valuable benefits: the XPC service hosting the model only needs to do computation and allocate memory. Drawing (i.e. messaging the window server) is done elsewhere. So is saving and loading. And that helps enforce the contract, because if I ever find myself wanting to put drawing in the model I’m going to cross a service boundary, and I’m going to need to think long and hard about whether that is correct.

If you want to talk more about microservices, XPC services, and how they’re different or the same, and how I can help your team get the most out of them, you’re in luck! I’ve recently launched the Labrary—the intersection of the library and the laboratory—for exactly that purpose.

OOP the Easy Way: now 100% complete

Hello readers, part 3, the final part of the “OOP the Easy Way” journey, has now been published at Leanpub! Thanks for joining me along the way! As ever, corrections, questions, and comments are welcome (you can comment here if you like), and as ever, readers who buy the book now will receive free updates for the lifetime of the book. While there’s nothing new to add, this means that corrections and expansions will be free to all readers.

If you enjoy OOP the Easy Way or found it informative (or maybe even both), please recommend it to your friends, colleagues and followers. It’d be great if they could enjoy it, be informed by it, or both, too!

Book update: OOP the Easy Way

Obejct-Oriented Programming the Easy Way gets ever closer, as the first part (of three) is now substantively complete. If you have been holding off from buying the book, now would be a great opportunity to jump in, as a whole part of the book’s argument is now laid out. As ever, your feedback is welcome, and readers who buy now will get free updates throughout the development of the book.

OOP the Easy Way

It’s still very much a work in progress, but OOP the Easy Way is now available to purchase from Leanpub (a free sample is also available from the book’s Leanpub page). Following the theme of my conference talks and blog posts over the last few years, OOP the Easy Way starts with an Antithesis, examining the accidental complexity that has been accumulating under the banner of Object-Oriented Programming for nearly four decades. This will be followed by a Thesis, constructing a model of software objects from the essential core, then a Synthesis, investigating problems that remain unsolved and directions that remain unexplored.

At this early stage, your feedback on the book is very much welcome and will help yourself and fellow readers to get more from the book. You will automatically get updates for free as they are published through Leanpub.

I hope you enjoy OOP the Easy Way!

Or maybe, because we want to

How (and Why) Developers Use the Dynamic Features of Programming Languages: The Case of Smalltalk is an interesting analysis of the reality of dynamic programming in Smalltalk (Squeak and Pharo, really). Taking the 1,000 largest projects on SqueakSource, the authors quantitatively examine the use of dynamic features in projects and qualitatively consider why they were adopted.

The quantitative analysis is interesting: unsurprisingly a small number (under 1.8%) of methods use dynamic features, but they are spread across a large number of projects. Applications make up a large majority of the projects analysed, but only a small majority of the uses of dynamic features. The kinds of dynamic features most commonly used are those that are also supplied in “static” languages like Java (although one of the most common is live compilation).

The qualitative analysis comes from a position of extreme bias: the poor people who use dynamic features of Smalltalk are forced to do so through lack of alternatives, and pity the even poorer toolsmiths and implementors whose static analysis, optimisation and refactoring tools are broken by dynamic program behaviour! Maybe we should forgot that the HotSpot optimisation tools in Java come from the Smalltalk-ish Self environment, or that the very idea of a “refactoring browser” was first explored in Smalltalk.

This quote exemplifies the authors’ distaste for dynamic coding:

Even if Smalltalk is a language where these features are comparitively easier to access than most programming languages, developers should only use them when they have no viable alternatives, as they significantly obfuscate the control flow of the program, and add implicit dependencies between program entities that are hard to track.

One of the features of using Object-Oriented design is that you don’t have to consider the control flow of a program holistically; you have objects that do particular things, and interesting emergent behaviour coming from the network of collaboration and messages passed between the objects. Putting “comprehensible control flow” at the top of the priority list is the concern of the structured programmer, and in that situation it is indeed convenient to avoid dynamic rewriting of the program flow.

I have indeed used dynamic features in software I’ve written, and rather than bewailing the obfuscation of the control flow I’ve welcomed the simplicity of the solution. Looking at a project I currently have open, I have a table data source that uses the column identifier to find or set a property on the model object at a particular row. I have a menu validation method that builds a validation selector from the menu item’s action selector. No, a static analysis tool can’t work out easily where the program counter is going, but I can, and I’m more likely to need to know.

Inheritance still doesn’t make any sense

Some ideas based on feedback to the Why inheritance never made any sense:

Feedback: Subtypes are necessary

The only one of these that is practically workable is behaviour inheritance <=> subtype inheritance: I’m sorry that you were exposed to Java at such an impressionable age. The compilers of languages like Java enable subclass = subtype, by automatically assuming that a subclass is is a valid value for variable binding, for example. However they do nothing to ensure subclass = subtype. This is valid C#, a language very like Java for this discussion:

namespace QuickTestThing
{
    class Class1
    {
        override public string ToString()
        {
            return "Class1";
        }
    }
    class Class2 : Class1
    {
        public override string ToString()
        {
            throw new Exception();
        }
    }
}

Now is Class2 a subtype of Class1? Does the compiler let you pretend that it is?

You don’t even need inheritance

As discussed in the original post, the whole “favour composition over inheritance” movement gets by fine with no inheritance. Composition and delegation (I don’t know about this message, I’ll forward it to someone who does) let you get the same behaviour.

Feedback: build it yourself

Can I demonstrate a language that has all three of subtype inheritance, behaviour inheritance, and categorical inheritance as distinct language features? Yes, but I would need to learn Racket first. I’m on it.

But in the meantime, re-read the “you don’t even need inheritance” paragraph and think about how you would build each of those three ideas out of delegation.