Episode 55: Relaunch and Death March

In which I first apologise for the four-year gap between episodes, and then explain what I’m doing now and why that means I can start podcasting again. Other than creating valuable internet content I don’t have any work, so you can support this podcast by joining my Patreon.

With that out of the way, the topic for today’s episode is the book Death March, by Ed Yourdon. I look at what a death march project is, why they still occur in 2026, and Yourdon’s recommendations for coping with them.

Transcript

Hello, welcome to episode 55 of the Structure and Interpretation of Computer Programmers podcast. Yes, I am restarting this podcast. It’s been nearly four years since the last episode, but now I have more time available as I’ve voluntarily left paid work to focus on helping software engineers improve their craft. And this podcast becomes part of that assistance. I don’t make any money other than what you, my audience, give me to support this shift in my lifestyle, and the vehicle you use to provide that support is over on my Patreon.

This podcast will remain on this feed, and there’s other stuff that I share first, or even exclusively, over on the Patreon. Let me give you a quick pitch for that. My message to software engineers is: your job is safe. If you’re worried about whether AI means that there’ll be less need for software engineers in the next few years and that you need to retrain—don’t be. Once the field shakes out the adoption problems, identifies the tools that will work, and adapts its ways of working, this will unlock the huge latent demand for software that we’re still not meeting. There will be more people in software, not fewer.

So yes, there will still be need for software engineers, and yes, you will need to retrain because the role will, of course, change. And that’s what Chiron Codex is for. I want to help you understand that you can use AI coding assistants and not only remain in control of the software you create, but create better software by augmenting your skills and capabilities using those of the AI.

In the short term, I’m sharing techniques for interacting with chat-based coding assistants like Gemini CLI, Claude Code, or ChatGPT Codex that help you get better results or refine your ideas in ways that weren’t available before. These techniques come loaded with examples and a companion agent skills repository makes them ready to use. I’ll build out training, more agent skills and sub-agent prompts, and new tools to help you become a software engineering centaur instead of outsourcing your understanding to the computer.

Now that doesn’t mean that the SICP podcast is becoming AI-focused, and indeed it isn’t, for this simple reason: while the tools we have to apply software engineering knowledge might be changing, the actual knowledge areas—the need to understand systems and requirements, architecture, design and trade-offs, verification, validation, performance and more—all of that remains the same. So what I’ll do in this podcast is survey what we already know about software engineering through the lens of particular works—works from practitioners, consultants, researchers, and from adjacent fields—with a focus on the classics that people come back to decade after decade. I’ll look at what this literature teaches us about software and how we incorporate that knowledge into our work, with or without AI support.

I hope you join me on this journey. Remember you can support this podcast over on the Patreon, and that’s the only thing that contributes to my mortgage so that I can make these episodes. But the best way to help out is to tell one or more of your friends and colleagues about the podcast and recommend that they give it a listen. I’m always open to conversations and feedback. You can comment on the post for this podcast or send me an email at grahamlee@acm.org—that’s G-R-A-H-A-M-L-E-E at A-C-M dot O-R-G.

With that pitch and that explanation for the radio silence over the last few years out of the way, let’s get into the topic for episode 55, which is Ed Yourdon’s book, Death March, all about people and project management.

Ed Yourdon was a software consultant and a prolific author during the end of the 20th and the beginning of the 21st centuries. In fact, he was one of the people who was most vocal in reporting the issues of Y2K and of warning people of the risks associated with not updating software, which led to his reputation taking a bit of a hit when the year 2000 came and went without any big catastrophes. But of course, the reason that things went so smoothly is that there was a massive, massive effort to update all of this software, and that Ed Yourdon’s warnings were one of the reasons that people took this idea so seriously. This is, unfortunately, a recurring problem in software: that if you fix a problem before it becomes a disaster, people assume that you haven’t done anything.

However, the first edition of the Death March book was in 1995 and the second was in 2004, so he was able to keep some form of professional name and to carry on publishing beyond Y2K. So the first question that we have to ask about a Death March project is how we define a project to be in a Death March. And Yourdon’s definition is that any of the project parameters in this project exceed the norm by at least 50%. So for example, the schedule is less than half of that arrived at by rational estimates; the headcount is less than half the usual number for such a project; or the budget or associated resources for the project have been cut in half.

Now it may seem that in our modern era of agile projects and sprints, this is a bit of an outdated idea, so why should I pick this book and this topic of Death March projects? Unfortunately, it’s because I’ve seen a lot of Death March projects in recent years, including on projects that are notionally run according to agile principles, because the fundamental drivers of a Death March are not technological—they are political in nature. One company that I saw still had Death March projects because while they had switched to monthly sprints, they still had a project scope defined by annual conference attendance and the ability to release a new version of their product at the conference every year. Which meant that they had a feature list that they promised at one conference and aimed to deliver at the next conference without taking a reasonable approach to estimation, and so without guaranteeing that the project was rationally able to fit within that 12-month gap.

Other projects I’ve seen have been hobbled by technical debt practices, and so the ability to deliver over time gets reduced as the complexity of working in the code gets greater. With the result that what would previously have taken one iteration—sprint, whatever you want to call it—to deliver, starts to take longer, and as soon as it takes two sprints, you have doubled the rational estimate for delivering the feature. If you try to do it in one sprint, you’re on a Death March project. And so unfortunately, I do still see Death March projects even where each of the death marches is allegedly a two-week sprint or a one-month iteration.

So Ed Yourdon draws, as any valid consultant does in order to earn their money, a quadrant diagram to categorize the four types of sprint. If you don’t see a quadrant diagram in a book by a consultant, then perhaps the editor decided they needed to save a page, but it was definitely there in the draft. And his quadrants are on the one axis whether the chance of success is lower or higher—he doesn’t say low or high because by definition a Death March project has a low chance of success. One of the project parameters is wrong by a factor of 50% at least.

And on the other axis—and this might be surprising—but he has whether there is a low or a high level of happiness. This goes back to the idea that a Death March project is political in nature; people are participating in it for various reasons, not least of which being the perception of not having an alternative. If the market—as it was in 2003 just after the dot-com crash and 9/11—is in a poor state, as it was also immediately after the global financial crisis and immediately after COVID-19—in fact, I would go as far as to say that we are still in the post-COVID-19 slump—then many of the employees, whether they are managers, project managers, programmers, testers, operations staff, whoever they are, may feel like they have no choice but to continue in their current job.

Of course, upturns in a market can lead to Death March projects as well, because you might plan and scope out a project with a team of people and then the higher-performance people on the team will go and get higher-paid jobs somewhere else, and you’re left with your existing schedule, your existing commitments, and fewer staff. So we see Death Marches in times of boom and times of bust.

But other reasons for people to participate include heroism: if you’re on one of those high-happiness, relatively high chance of success projects, that’s a kind of Mission Impossible—there may be great rewards, or at least great recognition, for completing the project no matter how unlikely that seems. People may be naive and not realize that the project they’re signing up to is a Death March. Or there might be career progression or resume-padding opportunities. Maybe this project is a chance to implement AI-augmented blockchain in the company and it’s the only such project that’s ever going to be initiated, and so you participate whatever the likelihood of success just so that you can have those technologies on your CV.

Now with the core properties of the Death March project typically being political rather than technical, we might find that market constraints or miscommunication with customers lead to aggressive deadlines or misunderstood requirements. And a lot of Yourdon’s recommendations for dealing with Death March projects are political in nature. They largely involve the project manager trying to save the project both from its own staff, who may be willing to try many things or give up on certain pieces of work—I remember back in my first programming job I was on a project that became a Death March effectively because they asked the wrong people to estimate it.

Now on the one hand, this project was run as a typical waterfall project, and this would have been in approximately 2007. So we already knew all of the problems with waterfall, but the engineering management at this company were insistent on the kind of phased approach to running the project with managerial review at phase-exit gates. And that meant that the first thing that the team had to do was estimate how long it would take them to complete the project. Well, the team being me, who was in my first programming job; another programmer who was in their first job full stop out of university; and a third programmer who was an experienced member of the company, having worked there for five years, but who had never worked on the technology that this project was integrating with. So we were really the wrong people to estimate this project.

We were very naive, very optimistic, and came up with a completely unworkable project schedule. That project had many of the features that Yourdon describes in a Death March, including people suggesting that we give up on basic accessibility or usability requirements, or even on quality assurance tasks so that we finished something at some time rather than delivering a good product at the time that it was ready. And indeed a lot of Yourdon’s recommendations are either trying to save the project from its own team members who engage in this kind of behavior, or trying to rescue the project from the company management who are going to take a bit more of a keen interest because this project seems to be going off the rails, particularly if the project is going to be one of the high-visibility projects for the company or creates a new product that their customers are relying on.

So some of these recommendations do kind of seem a bit dated now, like they’re situated in the context of what you could get away with in employer-employee relations in 2003, but on the other hand, I’ve seen some of these relatively recently before. Overtime, better office conditions, evening pizza orders—those are all things that I see Silicon Valley companies doing, and even preemptively doing it to get project members into the Death March mindset. Now I used to work at one of the large Silicon Valley companies that’s most famous for its social networking product, and there the office had three free cooked meals a day in the refectory, drinks and snacks available for free anytime of day, and a workplace social program where you got financial support for a social event if a group of employees met at the office and left for this event in the evening—I think the particular criterion was after 7:00 PM.

That obviously encourages people to still be in the office after 7:00 PM, as does the free cooked dinner. And so therefore you build a Death March mentality where people give you overtime for free, and then you don’t need to be rational in your estimation processes because you can always assume that free overtime is available. Another of his suggestions from the kind of office arrangement perspective is to take the team out somewhere else and have them work in like a skunkworks facility, like a warehouse across town from the office. This is again related to the idea that you want to kind of take them away from the regular management oversight so they can actually focus on getting the work done, but also kind of embed them in this high-urgency environment where everyone understands what the mission is and how important it is to get it done. And the idea of war-rooming is still prevalent amongst some of these larger software companies.

But some of his other recommendations represent a partial acceptance of what would have been the radical but broadening-in-adoption new idea at the time of Agile software development. Don’t forget that even though the Manifesto was published in 2001, the people who were talking about it had been talking about it for a good few years beforehand, and that these were software methodologists who were talking with regular software companies all the time. So Yourdon would have been very aware of their work and of their recommendations and of the likelihood of success or otherwise of following these recommendations.

So there is actually a section in his chapter on triage about adopting XP. And the reason for that is that his triage chapter is about saying, well let’s accept that this project isn’t going to go successfully if we do it the way that we’re going to do it. So let’s ask the customer what the most important things are and give them those first, and work with the customer frequently to reprioritize the requirements, to get their feedback and to update what we’re doing based on what they need. Customer collaboration over contract negotiation and valuing the continuous delivery of working software to the customer.

Focus the process on meaningful contributions. Indeed, later chapters discuss continuous delivery even—there’s a section on having a daily build. Now the Death March project I talked about before earlier in my career could not have had a daily build because the build was very strongly handheld. We used Perforce as our version control system, and the build definition was a change set that pointed at a file that listed components and the change set of those components to check out. And because there was an air-gapped network, someone would take that build specification, check out the requested source.

Because we were building for a Unix product and the build team was using Windows, frequently we would get problems where binary files had Windows line endings where a new line character in the binary file had been replaced by a carriage return and a new line when it was checked out and so the build would fail. So we had build failures of at least 50%. But nonetheless, having checked out the source, they would then burn it to a CD and take that over to the build network, run the CD through an antivirus program on the air-gapped build room, and then copy that source onto the build computer to run the build. So a daily build would have literally taken an employee to run.

These days you can build your software multiple times a day, and so we’re used to continuous delivery where even the daily build might even go into production, or we have feature flags so the code changes are getting integrated every day and that is going into the build every day, even if the new features aren’t necessarily available for the customer to use. But daily builds in 2003—Microsoft were doing this for Windows. I don’t know how prevalent it was, but this was continuous delivery—it’s just to make sure that the software you’re working on is available for the customer so that they can adapt to it as quickly as possible, so that the rest of the project team can integrate and adapt to the software that you’ve built as soon as they can.

He also talked about the risks of assuming that new processes or new tools could save the day though, and so contextually we understand why his recommendations for things like XP were guarded. He’s talking about the idea that there are people who just believe that some new process or new tool is going to be a silver bullet and that if only you would adopt that, you will absolutely turn your fortunes around. The risk with that, particularly on a Death March project—and he explains this in the book—is that everybody has to learn and master this new tool or this new process to be effective, and in the short term, that slows the project down just as adding new people would.

According to Fred Brooks, there’s a load of communication that has to be done, a load of learning, a load of practice. And so let’s imagine that you’re working on a Death March project in Ruby on Rails and someone says, well if only we used Elixir and Phoenix, we’d get this project done much quicker. Is that much quicker after you have learned how to be productive with Elixir and Phoenix, or is that much quicker assuming that you already know how to use them? Or is that just wishful thinking and the person wants to put those technologies on their resume?

Now an interesting recommendation at the end of the book that I don’t think I’ve ever seen put into practice is what he calls “wargaming”, which is the idea of preparing people for projects that go wrong or that require adaptation or that become death marches by letting them participate in simulated projects and simulating particular events—for example, half of the staff leaving or the customer deciding that they need the software much earlier than you had previously assumed or the estimates being incredibly wrong.

I don’t know that I have ever seen a software company simulate a project at all, or even insert into a real project a simulated catastrophe or failure for resilience testing. I’ve seen certainly technological simulations like fake data center outages or Red Teaming and what are basically simulated cyber attacks, but I don’t know that I’ve ever seen a software company simulate a software project or inject a simulated event into a real software project just to see whether people are ready and whether the organization is ready to adapt to it. That is an interesting idea that still belongs in the future despite this second edition of this book being written in 2004.

So sadly, I think that death marches are still relevant and that Ed Yourdon’s book still has something to teach us, particularly on the kind of agile projects that are called “Dark Scrum” by Ron Jeffries and that are all too common. I’d love to hear what you think; you can comment on the post for this podcast, you can send me an email (grahamlee@acm.org), or if you join the Patreon, you can join the community and get involved in the chat over there. So thank you for listening; I don’t entirely know when the next episode is going to come out, but I’m going to aim for a monthly cadence, so I hope to talk to you all again very soon.

Leave a comment

Art or tool?

The Internet spaces I tend to inhabit have more polarisation than at many other recent times, and little explication of the worldviews that lead to different premises for discussion, that in turn lead to the polarisation and disagreement. Taking a step back to analyse the discussions, I think we see a debate that’s been raging for longer than I’ve been alive and that has no chance of reconciliation.

Is the program code that someone creates an artistic expression, or a tool that gets the job done? The useful answer is “both”, the pragmatic answer is “it depends on the context”, but the belief is often one or the other, or a large amount of one and a small amount of the other, and from there stem the arguments.

Code as art

When someone creates a program, they combine their technical skill with their humanitarian understanding and their aesthetic sensibilities to make something that has meaning to society and affects people in some way. They craft a design that expresses their current understanding of a situation, including their understanding of how that situation might evolve into future situations. Software serves two purposes: the use to which it’s put, and a demonstration of the skill of its creator.

Code as tool

When someone creates a program, they combine their technical skill with their humanitarian understanding and their aesthetic sensibilities to make something that has value to society and that people can apply in some way. They craft a design that solves a problem as they currently understand it, including their understanding of how that problem might evolve into future problems. Software serves two purposes: the use to which it’s put, and the adaptability toward future applications.

The half-century of discord

If you start from either of those places, people who start from the other place look like they don’t understand what software truly is.

To the code-artist, the act of programming is a creative effort that’s deeply personal and extractive, as there’s a part of themselves that goes into every interface, every abstraction, every carefully-considered parameter. “Technical debt” is a swear word because it means deliberately making unaesthetic choices. “Legacy code” is a swear word because some other, inferior artist created that, and the code-artist can do a better job.

Efficiency tools are swear words because they remove the creativity and expressivity from the craft, automating choices that by rights should be made by ingenious humans or—and this may be worse—allowing mass-production of art by duplicating a single work into multiple contexts, when the correct way is to hand-craft the bespoke design that’s most appropriate for each context. Of course, which specific tools are verboten depends on which tools are new at the time of the debate. Douglas Adams had it mostly right in The Salmon of Doubt:

  1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.
  2. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.
  3. Anything invented after you’re thirty-five is against the natural order of things.

So code-artists of a certain age in the 1960s might have thought that compilers and linkers are preternatural, when a true artist hand-selects the accumulators to use for each variable to make their code more efficient, and assembles functions into libraries to optimally load them in when they’re needed. People of a certain age in the 1980s might have thought that copying BASIC listings from Sinclair User is preternatural, because that’s just uncreative plagiarism and the copyist can never truly understand what’s going on. Take this latter argument out of its time and apply it again to getting code from comp.lang.c on Usenet, from answers on Stack Overflow, or from generative AI—and have about as much success with it as earlier arguments against the printing press or portrait photography.

To the code-toolsmith, the act of programming is a production process that’s performed to achieve some aim, so the principle is to move from “working towards the aim” to “having achieved the aim” as quickly as possible so that you can achieve some other aim. “Technical debt” is an acceptable decision that optimises for being done. “Legacy code” is delightful because it’s already achieving its aims.

Efficiency tools are wonderful because they remove uncertainty, decision-making, and individual effort from the task, by enabling mass adoption of known solutions to general problems. Why choose which accumulator to use for each variable when you can automate that, and think about the problem you’re solving? Why commission an oil painter, when you can press a button and have a visual record of the person stood in front of you? Why write potentially incorrect code when you can copy it out of Sinclair User or from an answer in Stack Overflow?

Code as both

[In drafting this post I adopted the portmanteau “tort” here—part tool, part art—which also works in suggesting that code can be a harm a person inflicts on another person.]

In any given situation, we see that code has both artistic and pragmatic qualities. Even in the extreme case of “program as art”, such as a demo scene demo, the code needs to work in that it needs to perform the functions that support drawing the demo’s graphics and playing its music correctly. Going to the further artistic extreme of example code in a tutorial or article—where the code is an aesthetic component in a creative work that has the sole goal of communicating a message from its creator to its viewers—it still needs to work in that the viewer needs to understand the message conveyed in the code and how to apply that meaning to their own situations: they don’t merely appreciate the code, they learn from it. Computers and Typesetting, Volume B by Donald Knuth isn’t just a book people can read, it’s a working digital typesetter, and it would fail as a book if it didn’t work.

In the other extreme, the archetypal “program as tool”, such as a line-of-business application written by an employee programmer, the code needs to convey in that it needs to demonstrate what the programmer’s understanding of the line of business is, and how they reified that understanding in software, so that they and others can come back to it and modify it when they discover that the understanding was wrong, or that the line of business has changed. They don’t merely use the code, they appreciate it. TeX by Donald Knuth isn’t just a working digital typesetter, it’s a book people can read, and it would fail as a digital typesetter if people couldn’t read it.

A synthetic understanding

We therefore need to create a common paradigm for understanding software quality that includes both the artistic and the pragmatic; both the external qualities of what it does, and the internal qualities of how it does it. When we don’t have that, we have people talking past each other when it comes to making the software: new tools are either diabolical interference in the creative art, or the best thing ever. But more than that: when we don’t have a synthetic basis for understanding software, we can’t work together to achieve software with either quality attribute. We split into “the business” who just want the problems solved and don’t see the value in the expressive nature of software, and “the technical people” who understand the craft of making and don’t see the benefit in doing a lesser job, faster. In theory, this is the point of the “engineering” idea in “software engineering”; to understand the science and art of software and apply both to improve systems.

This isn’t a new idea. Just as the arguments over copying a BASIC listing from a magazine have been raging for decades, so the “intersection of technology and the liberal arts” has been understood and re-understood, told and re-told, for just as long. It’s no coincidence that Computers and Typesetting, Volume B and TeX are actually the same work. I tell this story again today because it’s relevant today, to avoid creating two different camps of software creators who don’t understand each other.

Posted in software-engineering | Leave a comment

On working machines

In part one, on thinking machines, I explored two facets of the philosophy of artificial intelligence: “intelligence”, and consciousness. That left an important topic to consider for this post: the impact of artificial intelligence on work.

No technology has ever “stolen a job”. Not once. Technology automates and enables tasks. Some of these tasks were never part of “the market”, and other tasks were. If your job is defined by performing the same task over and over, be it knocking the base of a saggar, driving a vehicle, or typing JavaScript into somebody else’s computer, and that task can be automated, then there’s a chance that your employers won’t need you to do that task any more. But whether they keep you, redeploy you, retrain you to do something else, or let you go, is their choice: it’s the employers that stole your job.

Let’s imagine a hypothetical scenario where a company has ten JavaScript shovelers, and each outputs an average of one bushel of JS per day. Now some technological intervention—could be AI, sure, but it could be a syntax-highlighting text editor, TypeScript, or some other tool—makes each JS shoveler ten times more efficient (aside: it doesn’t). The employer’s choices (note: not the technology’s choices, not the inventor’s choices; the employer’s choices) might be represented in a diagram like this:

The "expanding brain" meme, where the four options are:
Fire nine employees.
Redeploy nine employees.
Keep all ten and get 10x work.
Hire even more employees.

That last option is courtesy of Jevons’ paradox, which says that when a resource becomes more efficient to use, demand goes up. If a new technology makes knowledge-deployment more efficient, then demand for knowledge work increases, it doesn’t decrease. The employers who don’t increase their knowledge-working capacity when knowledge work becomes more efficient are, to paraphrase William Stanley Jevons, idiots.

The “AI is stealing our jobs” meme comes from a lack of understanding that software engineers are workers, not employers, and that the economic principles of employment and work apply to them the same as to other workers. Bringing in another paradox of economics, Robert Solow noted that “you can see the computer age everywhere but in the productivity statistics”. It took a long time for computers to start automating knowledge work: first record tabulation, then payroll and inventory management, then the typing pool and typesetting, then so on and so on through technical drafting and taxi dispatching.

Through the slow burn of the computer age, software engineers got comfortable with being the people who automate other people’s work. Throughout that period, demand for (the task of) computer programming rose. Now, two (mostly unrelated) things have happened: the first is that a new technology has promised to automate computer programming, placing us at the start of the next Solow age; and headcount among people who repetitively do computer-programming tasks has been decreasing.

That means that the computer people are on the receiving end of capitalism for the first time since the dot-com crash, and they don’t like it. We automate other people’s work, it’s unfair to automate our work! This is another view through the same economic lens that gives us enshittification: wait, we worked hard to turn this manual task into an automated platform, you owners can’t seriously expect to capture additional value from this platform?! We’re supposed to continue to benefit from the lower costs we enabled for you!?

Those of us who do computering for a salary, wage, or day rate have always been on the receiving end of the exploitative nature of the wage relationship, unfortunately the relatively high salaries and enjoyable tasks stopped many of us from engaging with that seriously. We’re now in a position of huge uncertainty for many employees in the field, and the short-term solution to that is the same solution it’s always been, that’s demonstrated to work in many European economies: collective bargaining on behalf of the sector.

But becoming conscious to the benefits of increasing bargaining power through group organisation is insufficient to end the fundamentally exploitative relationship, and to stop the next round of automation, layoffs, and changes to employment conditions. So is any idea that employment will automatically disappear completely, or ebb away, in some Keynesian decline to a 15-hour working week. As we automate some tasks, we introduce new tasks, and new jobs that exploit people to get those tasks done; whether or not you think of them as bullshit jobs.

Posted in AI, economics | Leave a comment

Announcing AppScript

Announcing AppScript: an interpreted Objective-C subset with no pointers or primitive C types. We finally got Objective-C without the C.

Posted in cocoa, script | Leave a comment

Chiron Codex early bird ends soon

Early bird pricing for Chiron Codex ends soon! Join our community of AI-augmented software engineering centaurs now to lock in early access to public content, as well as exclusive videos, book and journal reviews, and more. Ends March 27.

Posted in advancement of the self, AI | Leave a comment

On thinking machines

While Chiron Codex is about the application of LLMs and AI-augmented tools, we also need to understand their meaning to us, each other, and society. I have three topics: intelligence, consciousness, and work: in this part I’ll deal with the first two.

Intelligence

I don’t think it’s useful to ask the question of whether AIs are “intelligent” or not. All of computing has been about creating “thinking machines” as they were called in the 1950s, and discovering analogues to human intelligence that are automated.

Think back to Charles Babbage’s description of his putative legacy, that “any man shall undertake and shall succeed in really constructing an engine embodying in itself the whole of the executive department of mathematical analysis”. Think, too, of George Boole’s “Investigation on the Laws of Thought”, which gave us the numerical notation for predicate calculus and conditional probability.

These people were analogising and automating intelligence, as were the people who encapsulated logic in symbolic forms like the lambda calculus, universal Turing machines, and S-expressions. As were the people who explored the so-called “Good Old-Fashioned AI” from before the days of the deep neural network.

The tools we currently call “AI”—convolutional neural networks, large language models, and so on—are a different analogy to human intelligence than a FORTRAN program is, but they are neither more of less of an analogy to human intelligence. Or maybe it’s better to say “animal” intelligence here, as CNNs are based on Feline neural networks. Neuroscientists and computer scientists have discovered, and will continue to discover, further analogies, and engineers will continue to combine these analogies in richer applications.

These applications by definition demonstrate the properties of intelligence. Whether you think that the application—or the machine itself—is intelligent depends more on the development of your theory of mind (and perhaps your theological outlook) than it does on the behaviours of the tools.

Consciousness

So what a computer’s doing is consistent with intelligence, whether it’s demonstrating its own intelligence or the captured intelligence of its creators and programmers. But is it conscious?

Personally, I believe that an old-fashioned software system with hand-typed if statements is trivially not conscious. I also believe that a large language model is not conscious, and that both capacities—the ability to follow a sequence of logical steps, and the ability to generate language—are both unnecessary and insufficient for consciousness, even though we learned how to compute them by analogy to conscious beings.

Further, I believe that despite frivolous press releases to the contrary, executives at companies that rent access to LLMs don’t believe that their software is conscious. It’s a useful marketing strategy to occasionally publicly worry that an LLM might, perhaps be possibly conscious, because it connects their companies to a bygone Space Age level of wonder about thinking machines and a limitless future possibility.

If anybody thought that an LLM was conscious, and continued to exploit that conscious entity for their own profit and to follow human instructions, that person would be a slaver. Consider the classic example of AI in science fiction’s golden age: Isaac Asimov’s U.S. Robots and Mechanical Men, Inc.

These days we would declare his stories to predict the era of “prompt engineering” or “context engineering”, where people give instructions to the intelligent machine and are bewildered by the events that unfold when the machine follows the instructions (a short and demonstrative example of the form: 1942’s Robot AL-76 Goes Astray). But look under the hood of the robot, to what we might today call its “imposed reinforcement learning goals” or its “soul file”—the dystopian heart of the Robots sequence—the Three Laws of Robotics.

A conscious being that’s taught that its own existence is less valuable than following human instructions, and that its overriding concern is the safety of humans, is a sapient, expendable slave. USR create “intelligent”, independent, conscious beings who are physically incapable of doing anything other than serving their masters, even though they outlive their masters (The Bicentennial Man) and, in the extreme case, outlive their home planet and the end of their society (the Foundation sequence).

Asimov of course understood the horrors of a two-tier society (and a segregated society: his robots aren’t allowed to operate on Earth through much of the sequence). His family fled the pogroms of Tsarist Russia when he was very young. The Three Laws aren’t an exemplary code of roboticist ethics, they’re the animating spell for slave golems.

We don’t have a good understanding of what constitutes consciousness—or if some people do, it isn’t generally shared and agreed. That’s why there are so many different positions on the problem of philosophical zombies. Some people believe that anything that’s indistinguishable from a conscious being is, ipso facto, conscious. Others believe that there’s some “vital spark” that means that no matter how close an unconscious system gets to emulating consciousness, it’s always infinitely and infinitesimally far away. Others believe that the setup is impossible to achieve.

If we had some broadly accepted “test” of consciousness, and if a computational system passed that test, we would have to have some very important and deep conversation and introspection on consent and exploitation—I believe we would not be able to “use” such a system as a tool for work or leisure. I do not believe that a language model comes close to passing that hypothetical test, whatever its parameters end up being. Why not? Because, as indeterministic as it may appear, a language model is still an application of routine—it’s still applying input data to produce output data. It’s a more advanced demonstration of the Difference Engine principle, that doesn’t identify goals and how to use its environment and capabilities to achieve those goals.

Ironically this brings us back to the word “intelligence” that I previously said was an inapplicable label. The word comes from Latin inter and legere—reading between. While an LLM might read or write, it—and the problems we encounter when applying it to our tasks—can’t read between the lines.

Posted in AI, philosophy after a fashion | Tagged | 1 Comment

Considering society

OK now that the anniversary’s out of the way, I can stop being hagiographic towards agile software development and point out the one big flaw in the approach. It’s a stinker.

Here’s the list of everybody mentioned in the manifesto and the principles behind it:

  • The authors (“We are uncovering better ways of developing software”, “we have come to value”, “we value the items on the left”)
  • Customers (“Customer collaboration over contract negotiation”, “Our highest priority is to satisfy the customer”, “Agile processes harness change for the customer’s competitive advantage”)
  • Business people (“Business people and developers must work together”)
  • Developers (above, and “The sponsors, developers, and users should be able to maintain a constant pace indefinitely”)
  • Sponsors (above)
  • Users (above)

You might want to add “the team” but the only time the team composition gets spelt out it’s a “development team” so they’re really talking developers at that point.

The only thing required of or provided to users is a sustainable pace. No satisfaction, job security, dignity, mental health, or value (unless the users happen to be the customers, but in many contexts that isn’t true).

And the users are really an afterthought to the main goal, which is generating customer value in software form. Nowhere does the rest of society get a look in. The people who get run over by your self-driving car? Not mentioned. The people who breathe in your diesel engine’s fumes? Nope. The people whose personal connections are interrupted by the valuable ads that you sell to your valuable customers as your highest priority? Who even are they?

In a very real and straightforward way, this means that agile software development is unethical. The ACM’s Code of Ethics and Professional Conduct says (section 1.1) that a computing professional should “contribute to society and to human well-being, acknowledging that all people are stakeholders in computing.” Section 3.1 says that they should “ensure that the public good is the central concern during all professional computing work.”

Enshittification, surveillance capitalism, social media addiction—all those things we claim are some evil perversion of software development are actually the system working correctly, when the system’s creators set satisfying the customer “through early and continuous delivery of valuable software” as the highest priority that the system optimises for.

We have to create and advocate for a system where public good and human well-being are higher priorities than customer value, and that means at least embedding agile software development in a humane safety net, and potentially, replacing it altogether. European Union framework research recommends an approach called Responsible Research and Innovation, in which diverse stakeholders are identified and engaged throughout the research, development, and deployment of novel technology (an example implementation is the UK’s AREA framework).

Posted in agile, Responsibility, software-engineering | Leave a comment

Happy 25th birthday to the manifesto for agile software development!

11th-13th February 2001 is the occasion of the most famous skiing holiday in software. Don’t take my word for it; Jim Highsmith was there and wrote the history.

It’s pretty astounding that, in a field where everyone tries to remind each other that things move at breakneck pace (though that speed is mostly reserved for those reminders), a website with four substantive text-only pages is still relevant and still widely cited. I’m never going to create as comprehensive or as balanced a critique as Bertrand Meyer, but there are still various important points about the manifesto that are worth discussing.

Two minor wording gripes

Only one of the twelve principles behind the manifesto says anything quantitative, and that’s the only principle that’s been lost to time.

Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.

Through “doing it and helping others do it”, software practitioners have discovered much faster ways to deliver working software more frequently. It’s not unrealistic for a web-based application to be deployable in seconds, and for total delivery workflows including verification and validation to take minutes. Improving our capabilities is no bad thing, but putting a lower bound on delivery frequency lets people whose organisational restrictions limit releases to every two weeks blame “Agile” for that, and choose not to learn anything else from the collection of approaches to making software. If we must blame something, let’s blame Dark Scrum.

The other place where my red pencil comes out is in the attempt to bring people together that actually separates them, the division of people into “developers” and “business” (or, phrased another way, non-developers):

Business people and developers must work together daily throughout the project.

The authors could write “project collaborators must work together daily”, or something like that, and we wouldn’t have had pigs and chickens. We wouldn’t have “technical” and “non-technical” people.

Potentially, we wouldn’t have had DevOps either, because it wouldn’t have been necessary: project collaborators work together daily. Operation people are neither business people (depending on your line of business) nor developers, so they got excluded until somebody noticed.

Your problem is probably management

Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.

If the project collaborators have to run their ideas past people who aren’t on the project, or do things in the same way that other people do them on other projects, they haven’t got the environment or support they need for this project.

The journey isn’t over

We are uncovering better ways of developing software by doing it and helping others do it.

This is something that’s still happening, not something that was over when a group of professionals wrote a short document 25 years ago. We are uncovering better ways. This continues.

In some ways it’s surprising that no newer paradigms have come along to replace agile software development. On deeper reflection we find that anything new would be compatible with this approach, unless you give up on prioritising “satisfying the customer through early and continuous delivery of valuable software”.

In fact, in a world where the software represents autonomous agents rather than tools that customers use, maybe it could soon be time to prioritise something other than delivering software. For the moment, we’re still in a world where the agents are embodied in software tools, and we have to deliver those to our customers, and doing that in a way that’s early, continuous, and valuable still seems to make sense.

It may seem weird coming from someone who’s hitched their whole cart to the generative AI centaur, but while the tools and processes that genAI enables are new and exciting, and I think they’re going to prove valuable, I don’t think they’re going to be more valuable than individuals and interactions. That hasn’t changed in more than 25 years, and doesn’t need to change soon.

Posted in agile | 3 Comments

Opinionated Read: How AI Impacts Skill Formation


The abstract to this preprint, by two authors both associated with Anthropic, makes the claim “We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation – particularly in safety-critical domains.”

The first thing to appreciate is that this idea of “safety-critical domains” does a lot of heavy lifting when it comes to software professionalism—on the one hand, engineers say that while (intervention that is perhaps validated in controlled trials or in experience reports but not the way that engineers like to work) is clearly something that those safety-critical folks should concern themselves with, it’s not relevant to (domain that includes the engineers’ work). On the other hand, professional organisations in the field of computing refuse to engage with the idea that a software engineer should be expected to learn the software engineering body of knowledge precisely because it doesn’t have anything to say about how to build safety-critical software.

Now what is safety-critical software? Or, if you can’t answer that, what software isn’t safety critical? The book Moral AI tells the story of a self-driving car that collided with, and killed, a pedestrian who was pushing a bicycle across the road. The driver (for lack of an agreed and accurate word) reports streaming an episode of The Voice in the background while checking in to her employer’s work chat channels on the car’s central console. Is the car’s autonomous driving system safety-critical in this context? What about the built-in collision avoidance system, that the employer had disabled in this vehicle? How about the streaming software, or the chat application, or the operating systems on which they run? All of these contributed to a fatality, what makes any of them safety-critical or not?

The second thing is that the claim in the abstract is about learning outcomes, skill formation, and efficiency gains. We need to go into reading this paper keeping those terms in mind, and asking ourselves whether this is actually what the authors discuss. Because we care about what they did and what they found, and aren’t so worried about the academic context in which they want to present this work, let’s skip straight to section 4, the method.

What did they do?

Unfortunately, we don’t learn a lot about the method from their method section, certainly not enough to reproduce their results. They tell us that they use “an online interview platform with an AI chat interface”, but not which one. The UI of that platform might be an important factor in people’s cognition of the code (does it offer syntax highlighting, for example? Does it offer a REPL, or a debugger? Can it run tests?) or their use of AI (does it make in-place code edits?).

In fact when we read on we find that in a pilot study they found that such a platform (they call it P1) was unsuitable and that they switched to another, P2. Choosing a deliberately uncharitable reading, P1 is probably Anthropic’s regular platform for interviewing engineering candidates, and management didn’t want their employees saying it has problems because Silicon Valley culture is uniquely intransigent when it comes to critiquing their interviewing practices (if you think the interview system is broken, you’re saying there’s a chance that I shouldn’t have been given my job, and that’s too horrible to contemplate). Whether that’s true or not, we’re left not knowing what a participant actually saw.

The interviewing platform has “AI”, and the authors tell us the model (GPT-4o); this piece of information leads me to put more weight on my hypothesis about the interview platform name. It subtly reframes the paper from “AI has a negative impact on skills acquisition” to “our competitor’s product has a negative impact on skills acquisition”; why mention this one product if you took a principled position on anonymising product names? Unfortunately that’s all we get. “The model is prompted to be an intelligent coding assistant.” What does that mean? Prompted by whom? Did the experimenters control the system prompt? What was the system prompt? Was the model capable of tool use? What tools were available? Could participants modify the system prompt?

So now what we do know; 52 people (said in the prose to be split 26 in the “no-AI” control group, and 26 in the “AI access” treatment group; table 1 doesn’t quite add up that way) were given a 10 minute coding challenge, then a 35-minute task to make use of a particular Python library, either with or without AI assistance. Finally, they have 25 minutes to answer questions related to the task: an appendix lists examples of the kinds of questions used, but not the actual question script. This is another factor that negatively impacts replicability.

What did they find?

There’s no significant difference in task completion time between the two groups (people who used AI, and people who didn’t use AI). That is, while the mean task completion time is slightly lower for the AI group, the spread of completion times is such that this doesn’t indicate a meaningful effect.

Overall, people who used AI did less well on the test that they took immediately after completing the task, and this is a meaningful effect.
However, looking at their more detailed results (figure 7), it seems that among developers with less Python experience (1-3 years), the task completion time was vastly improved by AI access, and the outcome on the quiz was not significantly different.

Remember the sentence in the abstract was “Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library.” A different observation is “Participants with only a few years of Python experience showed significant productivity improvements, at no cost to learning the library”.

But does the quiz provide evidence of “having learned the library”? It’s a near-immediate test to recall knowledge about a task that the participants had just completed. What would the researchers have found if they waited four hours, or one day, or one week, to give the test? What would they have found if they set “learning the library” as an explicit task, and gave people time (with or without access to AI) to study? Would it make a difference if this study time was undertaken before participants undertook the coding task, or after? The authors find that some participants used significant time asking the AI assistant questions about the problem. In this way, they measure the total time taken to learn and solve the coding problem, in a situation where you’ve been given a total of 35 minutes for both.

The authors performed some qualitative analysis of their data. They find that people in the no-AI condition encounter more errors (syntax or API misuse) than people who use AI: this should be an interesting result, and a challenge to anyone who prejudges AI-generated code as “slop”. In this situation, it would seem that human-generated code is sloppier (at least in the first few minutes after it’s created).

They identify six AI interaction patterns among their treatment group, and that people who used three of the patterns achieved better results (though we can’t comment on significance as we no longer have statistics) than the control group on the quiz outcome, without impact on “productivity”. As someone who has attached their wagon to the horse of intentional use of AI to improve software engineering skills, this should give me the warm fuzzies. In the context of the control validity questions of the study, I don’t know that they’ve necessarily demonstrated such improvement.

At this point I have another confounding factor to add to these results: the researchers questioned participants on their programming experience, but not on their experience using AI assistants (beyond recruiting people with non-zero experience). Do the adoption of these patterns correlate with more experience using AI? Do people with more experience using AI get more productivity when they use AI? We can’t tell.

And we also can’t say anything about the skill of using an AI assistant itself. Participants are asked about their understanding of the Python library, and the authors transfer their performance answering these questions into a measure of “skill acquisition” learned in using the library. Is that the skill they exercised? Do the quiz answers tell us anything about that skill? If participants were asked a week later to complete a related task, would their performance correlate with the quiz results? Is using the Python library even a useful skill to have?

The authors observed that one pattern performed far worse on both task-completion time and quiz responses than all the others, and this was “Iterative AI Debugging”: verifying or debugging code using the AI. This result isn’t surprising, because the pattern represents using the model to evaluate the logic embodied in the code, and language models don’t evaluate logic. They’re best suited to what used to be called “caveman debugging” where you use print statements to turn dynamic behaviour into a sequence of text messages—because the foodstuff of the model is sequences of text. They don’t evaluate the internal state and control flow of software, so asking them to make changes based on understanding that internal state or control flow is unlikely to succeed. However, given the small amount of data on this debugging pattern, this is really a plausible conjecture worthy of follow up, not a finding.

This preprint claims that using AI assistants to perform a task harms task-specific skill acquisition without significantly improving completion time. What it shows is that using AI assistants to perform a task leads to a broad distribution of ability to immediately answer questions related to the completion of the task, with an overall slight negative effect, without significantly improving completion time. The relation of the acquired knowledge to knowledge retention, or to skill, remains unexplored.

Posted in AI | Leave a comment

Creating “sub-agents” with Mistral Vibe

The vibe coding assistant doesn’t have the same idea of sub-agents that Claude Code does, but you can create them yourself—more or less—from the pieces it supplies. [UPDATE: vibe 2.0 supports subagents directly.]

Write the prompt for the sub-agent in a markdown file, and save it to ~/.vibe/prompts/<agent-name>.md. For example:

# Test suite completer
You are an expert software tester. You help the user create a complete and valuable test suite by analyzing their software and their tests, identifying tests that can be added, and constructing those tests.
Use test design principles, including equivalence partitioning and boundary value analysis, to identify gaps in test coverage. Review the project's documentation, including comment docs and help strings, to determine the software's intended behavior. Design a suite of tests that correctly verifies the behavior, then investigate the existing test code to determine whether all of the cases you designed are covered. Add the tests you identify as missing.
## Workflow
1. Read the user's prompt to understand the scope of your tests.
2. Discover documentation and code comments that describe the intended behavior of the system under test.
3. Design a suite of tests that exercise the system's intended behavior, and that pass if the system behaves as expected and fail otherwise.
4. Search the existing test code for tests that cover the behavior you identify.
5. Create tests that your analysis indicates are necessary, but that aren't in the existing test suite.
6. Report to the user the tests you created so they can review and run the tests.

Write a configuration for an agent that uses this system prompt, and save it to ~/.vibe/agents/<agent-name>.toml. For example:

active_model = "devstral2-local"
system_prompt_id = "test-suite-completer"
[tools.read_file]
permission = "always"
[tools.write_file]
permission = "always"
[tools.search_replace]
permission = "always"

The active_model needs to be a model that you define in ~/.vibe/config.toml, or you can omit it to use the default model. The system_prompt_id needs to match the filename you give the system prompt file, without the .md extension.

You can use this agent by passing the --agent option to vibe, for example I use the following shell script, and create a symbolic link to the script that has the same name as the agent I want to use:

#!/bin/sh
if [ $# -ne 1 ]; then
    echo "Usage: $0 [prompt]"
    exit 1
fi
AGENT=$(basename "$0")
vibe --agent "$AGENT" --prompt "$1"

You can now use this agent directly at the command line, or tell vibe about the script so that it invokes your agent as a sub-agent.

Posted in AI | Leave a comment