The balloon goes up

To this day, many Smalltalk projects have a hot air balloon in their logo. These reference the cover of the issue of Byte Magazine in which Smalltalk-80 was shared with the wider programming community.

A hot air balloon bearing the word "Smalltalk" sails over a castle on a small island.

Modern Smalltalks all have a lot in common with Smalltalk-80. Why? If you compare Smalltalk-72 with Smalltalk-80 there’s a huge amount of evolution. So why does Cincom Smalltalk or Amber Smalltalk or Squeak or even Pharo still look quite a lot like Smalltalk-80?

My answer is because they are used. Actually, Alan’s answer too:

Basically what happened is this vehicle became more and more a programmer’s vehicle and less and less a children’s vehicle—the version that got put out, Smalltalk ’80, I don’t think it was ever programmed by a child. I don’t think it could have been programmed by a child because it had lost some of its amenities, even as it gained pragmatic power.

So the death of Smalltalk in a way came as soon as it got recognized by real programmers as being something useful; they made it into more of their own image, and it started losing its nice end-user features.

I think there are two different things you want from a programming language (well, programming environment, but let’s not split tree trunks). Referencing the ivory tower on the Byte cover, let’s call them “academic” and “industrial”, these two schools.

The industrial ones are out there, being used to solve problems. They need to be stable (some of these problems haven’t changed much in decades), easy to understand (the people have changed), and they don’t need to be exciting, they just need to work. Cobol and Fortran are great in this regard, as is C and to some extent C++: you take code written a bajillion years ago, build it, and it works.

The academic ones are where the new ideas get tried out. They should enable experiment and excitement first, and maybe easy to understand (but if you need to be an expert in the idea you’re trying out, that’s not so bad).

So the industrial and academic languages have conflicting goals. There’s going to be bad feedback set up if we try to achieve both goals in one place:

  • the people who have used the language as a tool to solve problems won’t appreciate it if new ideas come along that mean they have to work to get their solution building or running correctly, again.
  • the people who have used the language as a tool to explore new ideas won’t appreciate it if backwards compatibility hamstrings the ability to extend in new directions.

Unfortunately at the moment a lot of languages are used for both, which leads to them being mediocre at either. The new “we’ve done C but betterer” languages like Go, Rust etc. feature people wanting to add new features, and people wanting not to have existing stuff broken. Javascript is a mess of transpilation, shims, polyfills, and other words that mean “try to use a language, bearing in mind that nobody agrees how it’s supposed to work”.

Here are some patterns for managing the distinction that have been tried in the past:

  • metaprogramming. Lisp in particular is great at having a little language that you can use to solve your problems, and that you can also use to make new languages or make the world work differently to see how that would work. Of course, if you can change the world then you can break the world, and Lisp isn’t great at making it clear that there’s a line between following the rules and writing new ones.
  • pragmas. Haskell in particular is great at having a core language that people understand and use to write software, and a zillion flags that enable different features that one person pursued in their PhD that one time. Not all of the flag combinations may be that great, and it might be hard to know which things work well and which worked well enough to get a dissertation out of. But these are basically the “enable academic mode” settings, anyway.
  • versions. Perl and Python both ran for years in which version x was the safe, stable, industrial language, and version y (it’s not x+1: Python’s parallel versions were 2 and 3000) in which people could explore extensions, removals, or other changes in potentially breaking ways. At some point, each project got to the point where they were happy with the choices, and declared the new version “ready” and available for industrial use. This involved some translation from version x, which wasn’t necessarily straightforward (though in the case of Python was commonly overblown, so people avoided going from 2 to 3 even when it was easy). People being what they are, they put a lot of store in version numbers. So some people didn’t like that folks were recommending to use x when there was this clearly newer y available.
  • FFIs. You can call industrial C89 code (which still works after three decades) from pretty much any academic language you care to invent. If you build a JVM language, it can do what it wants, and still call Java code.

Anyway, I wonder whether that distinction between academic and industrial might be a good one to strengthen. If you make a new programming language project and try to get “users” too soon, you could lose the ability to take the language where you want it to go. And based on the experience of Smalltalk, too soon might be within the first decade.

Image

I love my Testsphere deck, from Ministry of Testing. I’ve twice seen Riskstorming in action, and the first time that I took part I bought a deck of these cards as soon as I got back to my desk.

I’m not really a tester, though I have really been a tester in the past. I still fall into the trap of thinking that I set out to make this thing do a thing, I have made it do a thing, therefore I am done. I’m painfully aware when metacognating that I am definitely not done at that point, but back “in the zone” I get carried away by success.

One of the reasons I got interested in Design by Contract was the false sense of “done” I feel when TDDing. I thought of a test that this thing works. I made it pass the test. Therefore this thing works? Well, no: how can I keep the same workflow, and speed of progress but improve the confidence in the statement?

The Testsphere cards are like a collection of mnemonics for testers, and for people who otherwise find themselves wondering whether this software really works. Sometimes I cut the deck, look at the card I’ve found, and think about what it means for my software. It might make me think about new ways to test the code. It might make me think about criticising the design. It might make me question the whole approach I’m taking. This is all good: I need these cues.

I just cut the deck and found the “Image” card, which is in the Heuristics section of the deck. It says that it’s a consistency heuristic:

Is your product true to the image and reputation you or your app’s company wishes to project?

That’s really interesting. How would I test for that? OK, I need to know what success is, which means I need to know “the image and reputation [we wish] to project”. That sounds very much like a marketing thing. Back when I ran the mobile track at QCon London, Jaimee Newberry gave a great talk about finding the voice for your product. She suggested identifying a celebrity whose personality embodies the values you want to project, then thinking about your interactions with your customers as if that personality were speaking to them.

It also sounds like there’s a significant user or customer experience part to this definition. Maybe marketing can tell me what voice, tone, or image we want to suggest to our customers, but what does it mean to say that a touchscreen interface works like Lady Gaga? Is that swipe gesture the correct amount of quirky, unexpected, subversive, yet still accessible? Do the features we have built shout “Poker Face”?

We’ll be looking at user interface design, too. Graphic design. Sound design. Copyediting. The frequency of posts on the email list, and the level of engagement needed. Pricing, too: it’s no good the brochure projecting Fortnum & Mason if the menu says Five Guys.

This doesn’t seem like something I’m going to get from red to green in a few minutes in Emacs. And it’s one of a hundred cards.

Mapping software engineering tools

Despite the theory that everything can be done in software (and of course, anything that can’t be done could in principle be approximated using numerical methods, or fudged using machine learning), software engineering itself, the business of writing software, seems to be full of tools that are accepted as de facto standards but, nonetheless, begrudgingly accepted by many teams. What’s going on? Why, if software is eating the world, hasn’t it yet found an appealing taste for the part of the world that makes software?

Let’s take a look at some examples. Jira is very popular among many people. I found a blog post literally called Why I Love Jira. And yet, other people say that Jira is an anti pattern, a sentiment that gets reasonable levels of community support.

Jenkins is almost certainly the (“market”, though it’s free) leader among continuous delivery tools, a position it has occupied since ousting Hudson, from which it was forked. Again, it’s possible to find people extolling the virtues and people hating on it.

Lastly, for some quantitative input, we can find that according to the Stack Overflow 2018 survey, most respondents (78.9%) love Rust, but most people use JavaScript (69.8%). From this we draw the interesting conclusion that the most popular tool in the programming language realm is not, actually, the one that wins the popularity contest.

So, weird question, why does everybody do this to themselves? And then more specifically, why is your team doing it to yourselves, and what can you do about it?

My hypothesis is that all of these tools succeed because they are highly configurable. I mean, JavaScript is basically a configuration language for Chromium (don’t @ me) to solve/cause your problem. Jira’s workflows are ridiculously configurable, and if Jenkins doesn’t do what you want then you can find a plugin to do it, write a plugin to do it or make a Groovy script that will do it.

This appeals to the desire among software engineers to find generalisations. “Look,” we say, “Jenkins is popular, it can definitely be made to do what we want, so let’s start there and configure it to our needs”.

Let’s take the opposing view for the moment. I’m going to drop the programming language example of JS/Rust, because all programming languages are, roughly speaking, entirely interchangeable. The detail is in the roughness. The argument below still applies, but requires more exposition which will inevitably lead to dissatisfaction that I didn’t cover some weird case. So, for the moment, let’s look at other tools like Jira and Jenkins.

The exact opposing view is that our project is distinct, because it caters to the needs of our customers and their (or these days, probably our) environment, and is understood and worked on by our people with our processes, which is not true for any other project. So rather than pretend that some other tool fits our needs or can be bent into shape, why don’t we build our own?

And, for our examples, building such a tool doesn’t appear to be a big deal. Using the expansive software engineering term “just”, a CD tool is “just” a way to run each step in the deployment pipeline and tell someone when a step fails. A development-tracking tool is “just” a way to list the things the team is or could be working on.

This is more or less a standard “build or buy” question, with just one level of indirection: both building and buying are actually measured in terms of time. How long would it take the team to write a new CD tool, and to maintain it? How long would it take the team to configure Jenkins, and to maintain it?

The answer should be fairly easy to consider. Let’s look at the map:

We are at x, of course. We are a short way from the Path of Parsimony, the happy path along which the generic tools work out of the box. That distance is marked on the map as .

Think about how you would measure for your team. You would consider the expectations of the out-of-the-box tool. You would consider the expectations of your team, and of your project. You would look at how those expectations differ, and try to quantify the result.

This tells you something about the gap between what the tool provides by default and what you need, which will help you quantify the amount of customisation needed (the cost of building a spur out from the Path of Parsimony to x). You can then compare that with the cost of building a tool that supports your position directly (the cost of building your own path, running through x).

But the map also suggests another option: why don’t we move from x closer to the path, and make smaller? Which of our distinct assumptions are incidental and can be abandoned, which are essential and need to be supported, and which are historical and could be revised? Is there a way to change the context so that adopting the popular tool is cheaper?

[Left out of the map but just as important is the related question: has somebody else already charted a different path, and how far are we from that? In other words, is there a different off-the-shelf product which needs less configuration than the one we’ve picked, so the total migration-plus-configuration cost is less than sticking where we are?]

My impression is that these questions tend to get asked once at the start of a project or initiative, then not again until the team is so far away from the Path of Parsimony that they are starting to get tangled and stung by the Weeds of Woe. Teams that change tooling such as their issue trackers or CD pipeline tend to do it once the existing way is already hurting too much, and the route back to the path no longer clear.

Figurative Programming and Gloom: the [G]raphical [LOOM]

Donald Knuth is pretty cool. One of the books he wrote that I own and have actually read[*] is Literate Programming, in which he describes (among other things) weaving program text and documentation together in a single narrative.

Two of his books that I own and have sort of dipped into here and there are TeX: the Program, and METAFONT: the Program. These are literate programs, created from webs in which Human text and Computer text are interleaved to tell the story of what the program does.

Human text and computer text, but not images. If you want pictures, you have to carry them around separately. Even though we are highly visual organisms, and many of the programs we produce have significant graphical components, very few programming environments treat images as anything other than external files that can be looked at and maybe previewed. The only programming environment I know of that lets you include images in program source is TempleOS.

I decided to extend the idea of the Literate web to the realm of Figurative Programming. A gloom (graphical loom) web can contain human text, computer text, and image descriptions (e.g. graphviz, plantuml, GLE…) which get included in the human-readable document as figures.

The result is gloom. It’s written in itself, so the easiest way to get started is with the Xcode project at gloomstrap which can extract the proper gloom sources from the gloom web. Alternatively, you can dive in and read the PDF it made about itself.

Because I built gloomstrap first, gloom is really a retelling of that program in a Figurative Programming web, rather than a program that was designed figuratively. Because of that, I don’t really have experience yet of trying to design a system in gloom. My observation was that the class hierarchy I came up with in building gloomstrap didn’t always lend itself to a linear storytelling for inclusion in a web. I expect that were I to have designed it in noweb rather than Xcode, I would have had a different hierarchy or even no classes at all.

Similarly, I didn’t try test-firsting in gloom, and nor did I port the tests that I did write into the web. Instinct tells me that it would be a faff, but I will try it and find out. I think richer expressions of program intention can only be a good thing, and if Figurative Programming is not the way in which that can be done, then at least we will find out something about what to do instead.

[*] Coming up in January’s De Programmatica Ipsum: The Art of _The Art of Computer Programming_, an article about a book that I have _definitely_ read _quite a few bits here and there_ of.

Packaging software

I’ve been learning about Debian Packaging. I’ve built OS X packages, RPMs, Dockerfiles, JARs, and others, but never dpkgs, so I thought I’d give it a go.

My goal is to make a suite of GNUstep packages for Debian. There already are some in the base distribution, and while they’re pretty up to date they are based on a lot of “default” choices. So they use gcc and the GNU Objective-C runtime, which means no blocks and no modern objc features. The packages I’m making are, mostly thanks to overriding some choices in building gnustep-make, built using clang, the next-generation objc runtime, libdispatch etc.

The Debian packaging tools are very powerful, very well documented, and somewhat easy to use. But what really impressed me along this journey was CPack. I’ve used cmake before, on a team building a pretty big C++/Qt project. It’s great. It’s easier to understand than autoconf, easier to build correct build rules over a large tree than make, and can generate fast builds using ninja or IDE-compatible projects for Xcode, IntelliJ and (to some extent) Eclipse.

What cpack adds is the ability to generate packages, of various flavours (Darwin bundles, disk images, RPMs, DEBs, NullSoft installers, and more) from the information about the build targets. That’s really powerful.

Packaging software is a really important part of the customer experience: what is an “App Store” other than a package selection and distribution mechanism? It irks me that packaging systems are frequently either coupled to the target environment (Debian packages and OpenBSD ports are great, but only work in those systems), or baroque (indeed autoconf may have gone full-on rococo). Package builder-builders give distributors a useful respite, using a single tool to build packages that work for any of their customers.

It’s important that a CD pipeline produces the same artefacts that your customers use, and also that it consumes them: you can’t make a binary, test it, see that it works, then bundle it and ship it. You have to make a binary, bundle it, ship it, install it, then test it and see that it works. (Obviously the tests I’m talking about here are the “end-to-end”, or “customer environment” tests. You don’t wait until your thing is built to see whether your micro-tests pass, you wait until your micro-tests pass to decide whether it’s worth building the thing.)

I know that there are other build tools that also include packaging capabilities. The point is, using one makes things easier for you and for your customers. And, it turns out, CMake is quite a good choice for one.

More on UIAutomation tests

Update

The information below is mostly redundant. After filing a bug report with Apple, their engineers determined that the Xcode-detected set of macro actions (find a text field, double click, enter text) weren’t working because the double click action wasn’t editing the text field. It is possible to use UIAutomation Tests, you just have to carefully review the UI actions and determine that they have the effect expected, particularly after letting Xcode record UI macros.

Original Post

Unfortunately my work to organise UIAutomation tests has hit the stumbling block that the UI Automation runner doesn’t use the main thread for main-thread-only APIs.

In Xcode 9 and High Sierra, the authors of that post I just linked found that it was possible to turn off the main thread checker in the Test configuration of the build scheme and get working tests anyway. Unfortunately that doesn’t work for me in Xcode 10 and Mojave: the main thread checker isn’t killing the app: the TSM subsystem is just refusing to do its thing. So my tests can’t do straightforward things like write text into a text field. Unfortunately this is a “it’s not me, it’s you” moment, and I don’t think I can carry on using Xcode’s UI tests for my goals.

However, I still want to be able to write “end-to-end” level tests to drive my development. I have (at least) three ways to proceed:

  • I could find a third party library and discover whether it has the main thread problem. Calabash doesn’t support Mac apps, and the other examples I can find (Cucumberish and TABTestKit) both rely on UI Automation so presumably don’t address the main thread problem.
  • I could write the tests in AppleScript. That would be a good way to build up the AppleScript UI for the app, but it doesn’t represent an end-to-end test of the GUI.
  • I could write the tests using NSApplication.sendEvent(_ event:) to simulate clicks, scrolls and text entry, and use the unit test runner to host them. That could work, but I have my doubts (I would guess that the runner is synchronous and stalls the main thread).

I discovered that it is possible to write the test “at the UI level” but in the unit runner, using a combination of key events and AppKit API like sendAction( to:). The trade-offs of this approach:

  • it takes longer, as the abstractions needed to easily find and use AppKit controls don’t (currently) exist
  • it doesn’t use the Accessibility interface so isn’t an accessibility audit at the same time as a correctness test
  • you don’t hit the same problems as with the UI Automation runner
  • it’s much faster

This may be the best approach for now, though I’d welcome other views.

To become a beginner, first become an expert

We have a whole load of practices in programming that only really work well if you’re already good at whatever the process is supposed to help with.

Scrum is a process improvement framework, but only if you already know how to do process improvement. If you don’t, then Scrum is just the baseline mini-waterfall process with a chance to air your dirty laundry every fortnight.

Agile is good at helping you embrace change, but only if you’re already good enough at managing change to understand which changes should be embraced.

#NoEstimates helps you avoid the overhead of estimates, but only if you’re already good enough at estimates to know that you always write user stories that take 0.5-2 days to implement.

TDD helps you design your APIs, but only if you’re already good enough at API design to understand things like dependency injection and loose coupling.

Microservices help you isolate modules, but only if you’re already good enough at modularity not to get swamped in HTTP calls.

This is all very well for selling consultancy (“if your [agile] isn’t working, then you aren’t [agiling] hard enough, let me [agile] you some more”) but where’s the on-ramp?

In which new developer tools are dull

Over on lobste.rs I said that I don’t hold out much hope for another “blue plane” style event in developer tools. In one of Alan Kay’s presentations, he referred to the ordinary way of things as the pink plane, and incremental advances in the state of affairs being movements in that plane. Like the square in Edwin Abbot’s Flatland that encounters a sphere, a development could take us out of the pink plane into the (orthogonal) blue plane. These blue plane ideas are rare because like the square, it’s hard to even conceive of life outside the pink plane.

In what may just be a surprising coincidence, Apple engineers used Blue and Pink to refer to features in evolutionary and revolutionary developments of their operating system.

Software engineering tooling is, for the majority of developers, in a phase of conservative retreat

Build UIs on the web and you probably won’t use a graphical builder, you’ll type HTML and JavaScript (and maybe JSX) into a text editor.

Build native apps and even where there is a GUI builder, you’ll find people recommending against its use and wanting to do things “programmatically” (by which they mean “through typing”, even though the GUI builder tools are another way to construct a program).

In the last couple of decades, interest in CASE tooling has shrunk to conservative interest in text editors with some syntax highlighting, like vim or Atom. Gone even is the “build and run” button from IDEs, to be replaced with command-line invocations of grunt tasks (a fancy phrase meaning shell scripts), npm scripts (a fancy phrase meaning shell scripts) or rake tasks (you get the idea).

Where previously there were live development environments embedded in the deployment environment (and the Javascript VM is almost perfectly designed for that task), there is now console.log and unit tests. The height of advanced interaction with your programming tools are the REPL (an interactive shell) and the Playground/InstaREPL (an interactive shell that echoes stdin and stdout in different places).

For the most part, and I say that to avoid the inevitable commenter who thinks that a counterexample like LabView or Mathematica or that one person they met who uses Expression Blend renders the whole argument broken, developers have doubled down on the ceremony of programming: the typing of arcane text into an 80×24 character display. Now to be fair, text is an efficient and compact graphical representation of a linear sequence of connected concepts. But it is not the only one, nor the most efficient nor most compact, and neither are many software systems linear.

The rewards in making software to make software are scarce.

You can do like IntelliJ do, and make a better version of the 80×24 text entry thing. You can work for a platform vendor, and make their version of the 80×24 thing. You can go and get an engineering grade 6 or above job in Silicon Valley and tell your manager that whatever it is their business does, you’re going to focus on the 80×24 thing (“at scale”) instead.

What you don’t seem to be able to do is to disrupt the 80×24 thing. It’s free (at least as in beer), it’s ubiquitous, and whether or not it’s as good as it could be it certainly seems to be good enough for the people who not only get paid to make bad software, but get paid again to fix it.

Why I don’t have a favourite programming language

This is my take on Ilya Sher’s similar post, though from a different context. He is mainly interested in systems programming, I have mostly written user apps and backend services, and also some developer tools.

I originally thought that I would write a list of the languages and difficulties I have with them, but I realised that there’s an underlying theme that can be extracted. Programming languages I have used either have too much vendor dependence (I love writing ObjC, but can’t rely on GNUstep when I’m not on Apple), too little interaction with the rest of the software world (I love writing Pharo, but don’t love going through its FFI to use anything else) or, and this is the biggest kicker, I don’t like the development environments.

When I work on JavaScript, my environment is a text editor (something like VSCode or emacs) that has syntax highlighting, maybe has auto-completion…and that’s about it. When I work in something like Java, ObjC or C++, I have a build button, an integrated debugger, and the ability to run tests. And, if I’m lucky, a form designer. When I work in something like Swift or Clojure, I have insta-repls. When I work in Pharo, I have all the live browsers and things you hear about from smug people, but I still have to type code for things you might expect to be ‘live’ in such an environment. I get confused by the version control tools, but that might be because I’m not familiar with image-based development.

It feels like, details of the languages aside, there’s a synthesis of programming language with environment, where the programming language is a tool integrated into the environment just like the compiler and debugger, and the tools are integrated into the programming language, like the Lisp macro system. It feels like environments like Oberon, Lisp machines and Smalltalks all have some of this integration, and that popular programming environments for other languages all have less of it.

I’m not entirely sure what the ideal state is, and whether that’s an ideal just for me or would benefit others. I wrote my MSc thesis on an exploration of this problem, and still have more research to do.