Episode 52: Software Freedom is a Civil Liberties Issue

Software freedom is a free speech issue. This has important consequences

Leave a comment

Why are we like this?

The recent post on addressing “technical debt” did the rounds of the usual technology forums, where it raised a reasonable question: why are people basing these decisions on balancing engineering-led with customer-led tasks on opinion? Why don’t engineers take an evidence-based approach to such choices?

The answer is complex but let’s start at the top: there’s too much money in software. There have been numerous crises in the global economy since software engineering has existed, but really the only one with any material effect on the software sector was the dot-com crash. The lesson there was “have a business plan”: plenty of companies had raised billions in speculative funding on the basis that they were on the internet but once the first couple started to fold, the investors pulled out en masse and the money was sucked from the room. This is the time that gave us Agile (constantly demonstrate that you’re delivering value to your customer), Lean Startup (demonstrate that you’re adding value with as little expenditure as possible), and Lean Software Development (eliminate all of the waste in your process).

Nobody ever demonstrated that Agile, Lean or whatever were better in some objective metric, what they did was tell convincing stories. Would you like to find out that you’ve built the wrong thing two years from now, or two weeks from now? Would you prefer to read an interim draft functional specification, or use working software? Let’s be clear though, nobody ever showed that what we were doing before that was better in any objective way either: software was written by defence contractors and electronics hardware companies, and they grandfathered in the processes used to procure and develop hardware. You can count the number of industry pundits advocating for a genuinely evidence-led approach to software cost control on two fingers (Barry Boehm and Watts Humphries) and you can still raise valid questions about the validities of either of their systems.

Since then, software teams have become less fragile to economic shock. This was already happening in the 2007 credit crunch (the downturn at the beginning of the 2007-2008 global financial crisis). The CFO where I worked explained that bookings of their subscription-based software would go up during a recession. Why? Because people were not confident enough to buy outright or to enter relatively cheap, long-term arrangements like three year contracts. They would instead take the more expensive but more flexible shorter-term contracts so that they could cancel or move if their belts needed tightening. After the crisis, the adoption of subscription-based pricing models has only increased in software, and extended to adjacent fields like media and hardware.

All of this means that there is relative stability in software businesses, and there is still growing demand for software engineers. That has meant that there isn’t the need for systematic approaches to cost-reduction hawked by every single thinker in the “software crisis” era: note that there hasn’t been significant movement beyond Agile, Lean or whatever in the subsequent two decades. They’re good enough, and there is no impetus to find out what’s better. In fact both Agile with its short increments and Lean Startup with its pivoting are optimised for the “get out quickly at any cost” flexibility that also leads customers to choose short-term subscription pricing: when the customers for your VR pet grooming business dry up you can quickly pivot to online fraud detection.

With no need to find or implement better approaches there’s also no need to particularly require software engineers to have a systematic approach or a detailed understanding of the knowledge of their industry. Thus software engineering—particularly programming—remains a craft-based discipline where anyone with an interest can start out at the bottom, learn on the job through mentoring and self-study, and use a process of survivor bias to get along. Did anyone demonstrate in 2002 that there’s objective benefit to a single-page application? Did anyone demonstrate in 2008 that there’s objective benefit to a native mobile app? Did anyone demonstrate in 2016 that there’s objective benefit to a Dapp? Has anyone crunched the numbers to resolve whether DevOps or Site Reliability Engineering is the one true way to do operations? No, but it doesn’t matter: there’s more than enough money to put into these things. And indeed most of those choices listed above are immaterial to where the money comes from or goes, but would be the sorts of “technical debt” transitions that engineering teams struggle to pay for.

You might ask why I’m at all interested in taking a systematic approach to our work when I also think it’s not necessary. Even if it isn’t necessary for survival, it’s definitely professional, and justifiable. When the clients do come to reduce their expenditure, or even when they don’t but are deciding who to go with, the people who can demonstrate that they systematically maximise the output of their work will be the preferred choice.

Posted in software-engineering | Leave a comment

When to “address” “technical debt”?

The phrase “technical debt” appears in scare quotes here because, as observed in The Unreasonable Ineffectiveness of Considering Things Harmful, technical debt has quite a specific meaning and I’m talking about something broader here. Quoting Ward Cunningham:

Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite. Objects make the cost of this transaction tolerable. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.

Ward Cunningham, the Wycash Portfolio Management System

It’s not old code that’s technical debt, it’s lightly-designed code. That thing that seemed like the solution when you first thought of it. Yes, ship it, by all means, but be ready to very quickly rewrite it when you learn more.

Some of what people mean when they say “we need to bring our technical debt under control” is that kind of technical debt, struggling under the compound interest of if statement accrual as multiple developers have added behaviour without adding design. But there are other things. Cutting corners is not technical debt, it’s technical gambling. Updating external dependencies is not technical debt repayment, but it does still need to be done. Removing deprecated symbols is paying for somebody else’s technical debt, not yours: again you still have to do it. Replacing last month’s favoured npm modules with this month’s is not technical debt, it’s buying yourself a new toy.

But all of these things get done, and all of these things need to get done. It’s the cost of deploying a system into an evolving context (and as I’ve said before, even that act of deployment itself triggers evolution). So the question is when, how often, how much?

Some teams put their “engineering requirements”, their name for the evolution-coping tasks, onto the same backlog as the product requirements, then try to advocate for prioritising them alongside feature requests and bug fixes. Unfortunately this rarely works: the perceived benefit of the engineering activity is zero retained customers plus zero acquired customers = zero revenue, and yet it costs the same as fixing a handful of customer-reported bugs.

So, other groups just try to carve out time. Maybe it’s “20% of developer effort on the sprint is not for product tasks”. Maybe it’s “there is a week between delivering one iteration and starting the next”. Maybe it’s “whoever is on support rotation can pick up engineering tasks when there’s no fire to put out”. And the most anti- of all the patterns is the “hardening sprint”: once per quarter we’ll spend two weeks fixing the problems we’ve been making for ourselves in the intervening time. All of these have the benefit of giving a predictable cadence, though they still suffer a bit from that product envy problem: why are we paying for these engineers to do non-useful work when they could be steadily adding value?

The key point is that part about steadily adding value. We know the reason we need to do this: it’s to avoid being brought to Ward’s stand-still. We need to consolidate what we’ve learned, we need to evolve the system to adapt to evolutionary changes in its context, we need to fix past mistakes. And we need to do it constantly. Remember the quote: “Every minute spent on not-quite-right code counts as interest on that debt”.

Ultimately, these attempts to carve out time are requests to do our jobs properly, directed at people who don’t have the same motivations that we do. That’s not to say that their motivations are wrong. Like us, they only have a partial view of the overall picture. Unlike us, that view does not extend to an understanding of how expensive a liability our source code is.

When we ask for time between this iteration and the next to “service technical debt”, we are saying “I know that I’m doing a bad job, I know what I need to do to be doing a good job, and I would like to do a good job for four hours in a fortnight’s time on Friday afternoon, if that’s alright with you”. Ironically we do not end up doing a better job, we normalise doing a bad job for the next couple of weeks (and undoubtedly finding that some delivery/support/operations problem gets in the way for those four hours anyway).

I recommend to my mentees, reports, and any engineer who will listen to avoid advocating for time-boxed good work. I propose building the trust relationship where the people who need the code written are happy that the code is being written, and being written well, without feeling the need to check over our shoulders to see how the sausage is made. Then we don’t need to justify doing a good job, and certainly don’t need to ask permission: we just do it while we’re going. When someone asks how long it’ll take to do something, the answer is how long it’ll take to do properly, with all the rewriting, testing, and everything else it takes to do it properly. What they get out of the end is something worth having, that doesn’t need hardening, or 20% of the effort dedicated to patching it up.

And of course, what they get is something that will almost immediately need a rewrite.

Posted in process | 3 Comments

So that’s how it works

Back in Apple Silicon, Xeon Phi, and Amigas I asked how Apple would scale the memory up in a hypothetical Mac Pro based on the M1. We still don’t know because there still isn’t one, although now we sort of do know.

The M1 Ultra uses a dedicated interconnect allowing two (maybe more, but definitely two) M1 Max to act as a single SoC. So in an M1 Ultra-powered Mac Studio, there’ll be two M1 packages connected together, acting as if the memory is unified.

It remains to be seen whether the interconnect is fast enough that the memory appears unified, or whether we’ll start to need thread affinity APIs to say “this memory is on die 0, so please run this thread on one of the cores in die 0”. But, as predicted, they’ve gone for the simplest approach that could possibly work.

BTW here’s my unpopular opinion on the Mac Studio: it’s exactly the same as the 2013 Mac Pro (the cylinder one). Speeds, particularly for external peripherals on USB and Thunderbolt, are much faster, so people are ready to accept that their peripherals should all be outside the box. But really the power was all in using the word Studio instead of Pro, so that people don’t think this is the post-cheesegrater Mac.

Posted in AAPL, arm, Mac | Leave a comment

Episode 51: Responding to Change

Sometimes it just seems like our customers are fickle flibbertigibbets who change their minds at the drop of a hat, right? Let’s look at what might be going on, and how to work with that.

Don’t forget that you can subscribe to the newsletter to keep up to date with everything Graham Lee and software engineering on the internet!

Leave a comment

Having the right data

In the beginning there was the relational database, and it was…OK, I guess. It was based on the relational model, and allowed operations that were within the relational algebra.

I mean it actually didn’t. The usual standard for relational databases is ISO 9075, or SQL. It doesn’t really implement the relational model, but something very similar to it. Still, there is a standard way for dealing with relational data, using a standard syntax to construct queries and statements that are mathematically provable.

I mean there actually isn’t. None of the “SQL databases” you can get hold of actually implement the SQL standard accurately or in its entirety. But it’s close enough.

At some point people realised that you couldn’t wake up the morning of your TechCrunch demo and code up your seed-round-winning prototype before your company logo hit the big screen, because it involved designing your data model. So the schemaless database became popular. These let you iterate quickly by storing any data of any shape in the database. If you realise you’re missing a field, you add the field. If you realise you need the data to be in a different form, you change its form. No pesky schemata to migrate, no validation.

I mean actually there is. It’s just that the schema and the validation are the responsibility of the application code: if you add a field, you need to know what to do when you see records without the field (equivalent to the field being null in a relational database). If you realise the data need to be in a different form, you need to validate whether the data are in that form and migrate the old data. And because everyone needs to do that and the database doesn’t offer those facilities, you end up with lots of wasteful, repeated, buggy code that sort of does it.

So the pendulum swings back, and we look for ways to get all of that safety back in an automatic way. Enter JSON schema. Here’s a sample of the schema (not the complete thing) for Covid-19 cases in Global.health:

{
  bsonType: 'object',
  additionalProperties: false,
  properties: {
    location: {
      bsonType: 'object',
      additionalProperties: false,
      properties: {
        country: { bsonType: 'string', maxLength: 2, minLength: 2 },
        administrativeAreaLevel1: { bsonType: 'string' },
        administrativeAreaLevel2: { bsonType: 'string' },
        administrativeAreaLevel3: { bsonType: 'string' },
        place: { bsonType: 'string' },
        name: { bsonType: 'string' },
        geoResolution: { bsonType: 'string' },
        query: { bsonType: 'string' },
        geometry: {
          bsonType: 'object',
          additionalProperties: false,
          required: [ 'latitude', 'longitude' ],
          properties: {
            latitude: { bsonType: 'number', minimum: -90, maximum: 90 },
            longitude: { bsonType: 'number', minimum: -180, maximum: 180 }
          }
        }
      }
    }
  }
}

This is just the bit that describes geographic locations, relevant to the falsehoods we believed about countries in an earlier post. This schema is stored as a validator in the database (you know, the database that’s easier to work with because it doesn’t have validators). But you can also validate objects in the application if you want. (Actually we currently have two shadow schemas: a Mongoose document description and an OpenAPI specification, in the application. It would be a good idea to normalise those: pull requests welcome!)

Posted in software-engineering | Leave a comment

Episode 50: Organisation and Community

I look at the historical basis of the white collar/blue collar divide in defining occupations, and the problems this distinction has with comprehending modern roles like engineering and various technician occupations. I then have difficulty fitting software roles into any of those categories. This is important because it helps to understand the various commitments practitioners make to different organisations: professional societies, their employers, trade unions, technology user groups, craft guilds and so on.

Finally, there’s a call for you: what have you seen of various modes of organisation in your experience in software? Please do let me know by commenting below or emailing grahamlee@acm.org.

Leave a comment

Aphorism Considered Harmful

Recently Dan North asked the origin of the software design aphorism “make it work, make it right, make it fast”. Before delving into that story, it’s important to note that I had already heard this phrase. I don’t know where, it’s one of those things that’s absorbed into the psyche of some software engineers like “goto considered harmful”, “adding people to a late project makes it later” or “premature optimisation is the root of all evil”.

My understanding of the quote was something akin to Greg Detre’s description: we want to build software to do the right thing, then make sure it does it correctly, then optimise.

Make it work. First of all, get it to compile, get it to run, make sure it spits out roughly the right kind of output.

Make it right. It’s time to test it, and make sure it behaves 100% correctly.

Make it fast. Now you can worry about speeding it up (if you need to). […]

When you write it down like this, everyone agrees it’s obvious.

Greg Detre, “Make it work, make it right, make it fast

That isn’t what everybody thinks though, as Greg points out. For example, Henrique Bastos laments that some teams never give themselves the opportunity to “make it fast”. He interprets making it right as being about design, not about correctness.

Just after that, you’d probably discovered what kind of libraries you will use, how the rest of your code base interacts with this new behavior, etc. That’s when you go for refactoring and Make it Right. Now you dry things out and organize your code properly to have a good design and be easily maintainable.

Henrique Bastos, “The make it work, make it right, make it fast misconception

We already see the problem with these little pithy aphorisms: the truth that they convey is interpreted by the reader. Software engineering is all about turning needs into instructions precise enough that a computer can accurately and reliably perform them, and yet our knowledge is communicated in soundbites that we can’t even agree on at the human level.

It wasn’t hard to find the source of that quote. There was a special issue of Byte magazine on the C programming language in August 1983. In it, Stephen C. Johnson and Brian W. Kernighan describe modelling systems processing tasks in C.

But the strategy is definitely: first make it work, then make it right, and, finally, make it fast.

Johnson and Kernighan, “The C Language and Models for Systems Programming

This sentence comes at the end of a section on efficiency, which follows a section on “Higher-Level Models” in which the design of programs that use C structures to operate on problem models, rather than bits and words, are described. The efficiency section tells us that higher-level models can make a program less efficient, but that C gives people the tools to get close to the metal to speed up the 5% of the code that’s performance critical. That’s where they lead into this idea that making it fast comes last.

Within context, the “right” that they want us to make appears to be the design/model type of “right”, not the correctness kind of right. This seems to make sense: if the thing is not correct, in what sense are you suggesting that you have already “made it work”?

A second source, contemporary with that Byte article, seems to seal the deal. Butler Lampson’s hints deal with ideas from various different systems, including Unix but also the Xerox PARC systems, Control Data Corporation mainframes, and others. He doesn’t use the phrase we’re looking for but his Figure 1 does have “Does it work?” as a functionality problem, from which follow “Get it right” and “Make it fast” as interface design concerns (with making it fast following on from getting it right). Indeed “Get it right” is a bullet point and cautionary tale at the end of the section on designing simple interfaces and abstractions. Only after that do we get to making it fast, which is contextualised:

Make it fast, rather than general or powerful. If it’s fast, the client can program the function it wants, and another client can program some other function. It is much better to have basic operations executed quickly than more powerful ones that are slower (of course, a fast, powerful operation is best, if you know how to get it). The trouble with slow, powerful operations is that the client who doesn’t want the power pays more for the basic function. Usually it turns out that the powerful operation is not the right one.

Butler W. Lampson, Hints for Computer System Design

So actually it looks like I had the wrong idea all this time: you don’t somehow make working software then correct software then fast software, you make working software and some inputs into that are the abstractions in the interfaces you design and the performance they permit in use. And this isn’t the only aphorism of software engineering that leads us down dark paths. I’ve also already gone into why the “premature optimisation” quote is used in misguided ways, in mature optimisation. Note that the context is that 97% of code doesn’t need optimisation: very similar to the 95% in Johnson and Kernighan!

What about some others? How about the ones that don’t say anything at all? It used to be common in Perl and Cocoa communities to say “simple things simple; complex things possible”. Now the Cocoa folks think that the best way to distinguish value types from identity types is the words struct and class (not, say, value and identity) so maybe it’s no longer a goal. Anyway, what’s simple to you may well be complex to me, and what’s complex to you may well be simplifiable but if you stop at making it possible, nobody will get that benefit.

Or the ones where meanings shifted over time? I did a podcast episode on “working software over comprehensive documentation”. It used to be that the comprehensive documentation meant the project collateral: focus on building the thing for your customer, not appeasing the project office with TPS reports. Now it seems to mean any documentation: we don’t need comments, the code works!

The value in aphorisms is similar to the value in a pattern language: you can quickly communicate ideas and intents. The cost of aphorisms is similar to the cost in a pattern language: if people don’t understand the same meaning behind the terms, then the idea spoken is not the same as the idea received. It’s best with a lot of the aphorisms in software that are older than a lot of the people using them to assume that we don’t all have the same interpretation, and to share ideas more precisely.

(I try to do this for software engineers and for programmers who want to become software engineers, and I gather all of that work into my fortnightly newsletter. Please do consider signing up!)

Posted in whatevs | 3 Comments

What is software engineering?

I suppose if I’m going to have a tagline like “from programming to software engineering”, we ought to have some kind of shared understanding of what that journey entails. It would be particularly useful to agree on the destination.

The question “what is software engineering?” doesn’t have a single answer. Plenty of people have the job title “software engineer”, or work in the “engineering” department of a software company, so we could say that it’s whatever they do. But that’s not very constructive. It’s too broad: anyone who calls themselves a software engineer could be doing anything, and it would become software engineering. It’s also too narrow: the people with the job title “software engineer” are often the programmers, and there’s more to software engineering than programming. I wrote a whole book, APPropriate Behaviour, about the things programmers need to know beyond the programming; that only scratches the surface of what goes into software engineering.

Sometimes the definition of software engineering is a negative-space definition, telling us what it isn’t so that something else can fill that gap. In Software Craftsmanship: The New Imperative, Pete McBreen described that “the software engineering approach” to building software is something that doesn’t work for many organisations, because it isn’t necessary. He comes close to telling us what that approach is when he says that it’s what Watts Humphries is lamenting in Why Don’t They Practice What We Preach? But again, there are reasons not to like this definition. Firstly, it’s a No True Scotsman definition. Software engineering is whatever people who don’t do it properly, the way software craftsmen do it, do. Secondly it’s just not particularly fruitful: two decades after his book was published, most software organisations aren’t using the craftsmanship model. Why don’t they practice what he preaches?

I want to use a values-oriented definition of software engineering: software engineering is not what you do, it’s why you do what you do, and how you go about doing it. No particular practice is or isn’t software engineering, but the way you evaluate those practices and whether or not to adopt them can adopt an engineering perspective. Similarly, this isn’t methodology consultancy: the problems your team has with Agile aren’t because you aren’t Agiling hard enough and need to hire more Agile trainers. But the ways in which you reflect on and adapt your processes can be informed by engineering.

I like Shari Lawrence Pfleeger’s definition, in her book Software Engineering: The Production of Quality Software:

There may be many ways to perform a particular task on a particular system, but some are better than others. One way may be more efficient, more precise, easier to modify, easier to use, or easier to understand. Consequently, Software Engineering is about designing and developing high-quality software.

There’s a bit of shorthand, or some missing steps here, that we could fill in. We understand that of the many ways to build a software system, we can call some of them “better”. We declare some attributes that contribute to this “betterness”: efficiency, precision, ease of adaptation, ease of use, ease of comprehension. This suggests that we know what the properties of the software are, which ones are relevant, and what values a desirable system would have for those properties. We understand what would be seen as a high-quality product, and we choose to build the software to optimise for that view of quality.

The Software Engineering degree course I teach on offers a similar definition:

Software Engineering is the application of scientific and engineering principles to the development of software systems—principles of design, analysis, and management—with the aim of:

  • developing software that meets its requirements, even when these requirements change;
  • completing the development on time, and within budget;
  • producing something of lasting value—easy to maintain, re-use, and re-deploy.

So again we have this idea that there are desirable qualities (the requirements, the lasting value, ease of maintenance, re-use, and re-deployment; and also the project-level qualities of understanding and controlling the schedule and the cost), and the idea that we are going to take a principled approach to understanding how our work supports these properties.

Let me summarise: software engineering is understanding the desired qualities of the software we build, and taking a systematic approach to our work that maximises those qualities.

Posted in software-engineering | Leave a comment

Diagnosing a Docker image build problem

My Python script for Global.health was not running in production, because it couldn’t find some imports. Now the real solution is to package up the imports with setuptools and install them at runtime (we manage the environments with poetry), but the quick solution is to fix up the path so that they get imported anyway. Or so I thought.

The deployment lifecycle of this script is that it gets packaged into a Docker image and published to Amazon Elastic Container Repository. An EventBridge event triggers a Batch job definition using that image to be queued. So to understand why the imports aren’t working, we need to understand the Docker image.

docker create --name broken_script sha256:blah gives me a container based on the image. Previously I would have started that image and launched an interactive shell to poke around, but this time I decided to try something else: docker export broken_cleanup | tar tf - gives me the filesystem listing (and all I would’ve done with the running shell was various ls incantations, so that’s sufficient).

Indeed my image had various library files alongside the main script:

/app/clean_old_ingestion_source_files.py
/app/EventBridgeClient.py
/app/S3Client.py
/app/__init__.py

Those supporting files should be in a subfolder. My copy command was wrong in the Dockerfile:

COPY clean_old_ingestion_source_files.py aws_access ./

This copies the content of aws_access into the current folder, I wanted to copy the folder (and, recursively, its content). Simple fix: break that line into two, putting the files in their correct destinations. Now rebuild the image, and verify that it is fixed. This time I didn’t export the whole filesystem from a container, I exported the layers from the image.

docker image save sha256:blah | tar xf -

This gives me a manifest.json showing each layer, and a tar file with the content of that layer. Using this I could just get the table of content for the layer containing my Python files, and confirm that they are now organised correctly.

Posted in whatevs | Leave a comment