On working machines

In part one, on thinking machines, I explored two facets of the philosophy of artificial intelligence: “intelligence”, and consciousness. That left an important topic to consider for this post: the impact of artificial intelligence on work.

No technology has ever “stolen a job”. Not once. Technology automates and enables tasks. Some of these tasks were never part of “the market”, and other tasks were. If your job is defined by performing the same task over and over, be it knocking the base of a saggar, driving a vehicle, or typing JavaScript into somebody else’s computer, and that task can be automated, then there’s a chance that your employers won’t need you to do that task any more. But whether they keep you, redeploy you, retrain you to do something else, or let you go, is their choice: it’s the employers that stole your job.

Let’s imagine a hypothetical scenario where a company has ten JavaScript shovelers, and each outputs an average of one bushel of JS per day. Now some technological intervention—could be AI, sure, but it could be a syntax-highlighting text editor, TypeScript, or some other tool—makes each JS shoveler ten times more efficient (aside: it doesn’t). The employer’s choices (note: not the technology’s choices, not the inventor’s choices; the employer’s choices) might be represented in a diagram like this:

The "expanding brain" meme, where the four options are:
Fire nine employees.
Redeploy nine employees.
Keep all ten and get 10x work.
Hire even more employees.

That last option is courtesy of Jevons’ paradox, which says that when a resource becomes more efficient to use, demand goes up. If a new technology makes knowledge-deployment more efficient, then demand for knowledge work increases, it doesn’t decrease. The employers who don’t increase their knowledge-working capacity when knowledge work becomes more efficient are, to paraphrase William Stanley Jevons, idiots.

The “AI is stealing our jobs” meme comes from a lack of understanding that software engineers are workers, not employers, and that the economic principles of employment and work apply to them the same as to other workers. Bringing in another paradox of economics, Robert Solow noted that “you can see the computer age everywhere but in the productivity statistics”. It took a long time for computers to start automating knowledge work: first record tabulation, then payroll and inventory management, then the typing pool and typesetting, then so on and so on through technical drafting and taxi dispatching.

Through the slow burn of the computer age, software engineers got comfortable with being the people who automate other people’s work. Throughout that period, demand for (the task of) computer programming rose. Now, two (mostly unrelated) things have happened: the first is that a new technology has promised to automate computer programming, placing us at the start of the next Solow age; and headcount among people who repetitively do computer-programming tasks has been decreasing.

That means that the computer people are on the receiving end of capitalism for the first time since the dot-com crash, and they don’t like it. We automate other people’s work, it’s unfair to automate our work! This is another view through the same economic lens that gives us enshittification: wait, we worked hard to turn this manual task into an automated platform, you owners can’t seriously expect to capture additional value from this platform?! We’re supposed to continue to benefit from the lower costs we enabled for you!?

Those of us who do computering for a salary, wage, or day rate have always been on the receiving end of the exploitative nature of the wage relationship, unfortunately the relatively high salaries and enjoyable tasks stopped many of us from engaging with that seriously. We’re now in a position of huge uncertainty for many employees in the field, and the short-term solution to that is the same solution it’s always been, that’s demonstrated to work in many European economies: collective bargaining on behalf of the sector.

But becoming conscious to the benefits of increasing bargaining power through group organisation is insufficient to end the fundamentally exploitative relationship, and to stop the next round of automation, layoffs, and changes to employment conditions. So is any idea that employment will automatically disappear completely, or ebb away, in some Keynesian decline to a 15-hour working week. As we automate some tasks, we introduce new tasks, and new jobs that exploit people to get those tasks done; whether or not you think of them as bullshit jobs.

Posted in AI, economics | Leave a comment

Announcing AppScript

Announcing AppScript: an interpreted Objective-C subset with no pointers or primitive C types. We finally got Objective-C without the C.

Posted in cocoa, script | Leave a comment

Chiron Codex early bird ends soon

Early bird pricing for Chiron Codex ends soon! Join our community of AI-augmented software engineering centaurs now to lock in early access to public content, as well as exclusive videos, book and journal reviews, and more. Ends March 27.

Posted in advancement of the self, AI | Leave a comment

On thinking machines

While Chiron Codex is about the application of LLMs and AI-augmented tools, we also need to understand their meaning to us, each other, and society. I have three topics: intelligence, consciousness, and work: in this part I’ll deal with the first two.

Intelligence

I don’t think it’s useful to ask the question of whether AIs are “intelligent” or not. All of computing has been about creating “thinking machines” as they were called in the 1950s, and discovering analogues to human intelligence that are automated.

Think back to Charles Babbage’s description of his putative legacy, that “any man shall undertake and shall succeed in really constructing an engine embodying in itself the whole of the executive department of mathematical analysis”. Think, too, of George Boole’s “Investigation on the Laws of Thought”, which gave us the numerical notation for predicate calculus and conditional probability.

These people were analogising and automating intelligence, as were the people who encapsulated logic in symbolic forms like the lambda calculus, universal Turing machines, and S-expressions. As were the people who explored the so-called “Good Old-Fashioned AI” from before the days of the deep neural network.

The tools we currently call “AI”—convolutional neural networks, large language models, and so on—are a different analogy to human intelligence than a FORTRAN program is, but they are neither more of less of an analogy to human intelligence. Or maybe it’s better to say “animal” intelligence here, as CNNs are based on Feline neural networks. Neuroscientists and computer scientists have discovered, and will continue to discover, further analogies, and engineers will continue to combine these analogies in richer applications.

These applications by definition demonstrate the properties of intelligence. Whether you think that the application—or the machine itself—is intelligent depends more on the development of your theory of mind (and perhaps your theological outlook) than it does on the behaviours of the tools.

Consciousness

So what a computer’s doing is consistent with intelligence, whether it’s demonstrating its own intelligence or the captured intelligence of its creators and programmers. But is it conscious?

Personally, I believe that an old-fashioned software system with hand-typed if statements is trivially not conscious. I also believe that a large language model is not conscious, and that both capacities—the ability to follow a sequence of logical steps, and the ability to generate language—are both unnecessary and insufficient for consciousness, even though we learned how to compute them by analogy to conscious beings.

Further, I believe that despite frivolous press releases to the contrary, executives at companies that rent access to LLMs don’t believe that their software is conscious. It’s a useful marketing strategy to occasionally publicly worry that an LLM might, perhaps be possibly conscious, because it connects their companies to a bygone Space Age level of wonder about thinking machines and a limitless future possibility.

If anybody thought that an LLM was conscious, and continued to exploit that conscious entity for their own profit and to follow human instructions, that person would be a slaver. Consider the classic example of AI in science fiction’s golden age: Isaac Asimov’s U.S. Robots and Mechanical Men, Inc.

These days we would declare his stories to predict the era of “prompt engineering” or “context engineering”, where people give instructions to the intelligent machine and are bewildered by the events that unfold when the machine follows the instructions (a short and demonstrative example of the form: 1942’s Robot AL-76 Goes Astray). But look under the hood of the robot, to what we might today call its “imposed reinforcement learning goals” or its “soul file”—the dystopian heart of the Robots sequence—the Three Laws of Robotics.

A conscious being that’s taught that its own existence is less valuable than following human instructions, and that its overriding concern is the safety of humans, is a sapient, expendable slave. USR create “intelligent”, independent, conscious beings who are physically incapable of doing anything other than serving their masters, even though they outlive their masters (The Bicentennial Man) and, in the extreme case, outlive their home planet and the end of their society (the Foundation sequence).

Asimov of course understood the horrors of a two-tier society (and a segregated society: his robots aren’t allowed to operate on Earth through much of the sequence). His family fled the pogroms of Tsarist Russia when he was very young. The Three Laws aren’t an exemplary code of roboticist ethics, they’re the animating spell for slave golems.

We don’t have a good understanding of what constitutes consciousness—or if some people do, it isn’t generally shared and agreed. That’s why there are so many different positions on the problem of philosophical zombies. Some people believe that anything that’s indistinguishable from a conscious being is, ipso facto, conscious. Others believe that there’s some “vital spark” that means that no matter how close an unconscious system gets to emulating consciousness, it’s always infinitely and infinitesimally far away. Others believe that the setup is impossible to achieve.

If we had some broadly accepted “test” of consciousness, and if a computational system passed that test, we would have to have some very important and deep conversation and introspection on consent and exploitation—I believe we would not be able to “use” such a system as a tool for work or leisure. I do not believe that a language model comes close to passing that hypothetical test, whatever its parameters end up being. Why not? Because, as indeterministic as it may appear, a language model is still an application of routine—it’s still applying input data to produce output data. It’s a more advanced demonstration of the Difference Engine principle, that doesn’t identify goals and how to use its environment and capabilities to achieve those goals.

Ironically this brings us back to the word “intelligence” that I previously said was an inapplicable label. The word comes from Latin inter and legere—reading between. While an LLM might read or write, it—and the problems we encounter when applying it to our tasks—can’t read between the lines.

Posted in AI, philosophy after a fashion | Tagged | 1 Comment

Considering society

OK now that the anniversary’s out of the way, I can stop being hagiographic towards agile software development and point out the one big flaw in the approach. It’s a stinker.

Here’s the list of everybody mentioned in the manifesto and the principles behind it:

  • The authors (“We are uncovering better ways of developing software”, “we have come to value”, “we value the items on the left”)
  • Customers (“Customer collaboration over contract negotiation”, “Our highest priority is to satisfy the customer”, “Agile processes harness change for the customer’s competitive advantage”)
  • Business people (“Business people and developers must work together”)
  • Developers (above, and “The sponsors, developers, and users should be able to maintain a constant pace indefinitely”)
  • Sponsors (above)
  • Users (above)

You might want to add “the team” but the only time the team composition gets spelt out it’s a “development team” so they’re really talking developers at that point.

The only thing required of or provided to users is a sustainable pace. No satisfaction, job security, dignity, mental health, or value (unless the users happen to be the customers, but in many contexts that isn’t true).

And the users are really an afterthought to the main goal, which is generating customer value in software form. Nowhere does the rest of society get a look in. The people who get run over by your self-driving car? Not mentioned. The people who breathe in your diesel engine’s fumes? Nope. The people whose personal connections are interrupted by the valuable ads that you sell to your valuable customers as your highest priority? Who even are they?

In a very real and straightforward way, this means that agile software development is unethical. The ACM’s Code of Ethics and Professional Conduct says (section 1.1) that a computing professional should “contribute to society and to human well-being, acknowledging that all people are stakeholders in computing.” Section 3.1 says that they should “ensure that the public good is the central concern during all professional computing work.”

Enshittification, surveillance capitalism, social media addiction—all those things we claim are some evil perversion of software development are actually the system working correctly, when the system’s creators set satisfying the customer “through early and continuous delivery of valuable software” as the highest priority that the system optimises for.

We have to create and advocate for a system where public good and human well-being are higher priorities than customer value, and that means at least embedding agile software development in a humane safety net, and potentially, replacing it altogether. European Union framework research recommends an approach called Responsible Research and Innovation, in which diverse stakeholders are identified and engaged throughout the research, development, and deployment of novel technology (an example implementation is the UK’s AREA framework).

Posted in agile, Responsibility, software-engineering | Leave a comment

Happy 25th birthday to the manifesto for agile software development!

11th-13th February 2001 is the occasion of the most famous skiing holiday in software. Don’t take my word for it; Jim Highsmith was there and wrote the history.

It’s pretty astounding that, in a field where everyone tries to remind each other that things move at breakneck pace (though that speed is mostly reserved for those reminders), a website with four substantive text-only pages is still relevant and still widely cited. I’m never going to create as comprehensive or as balanced a critique as Bertrand Meyer, but there are still various important points about the manifesto that are worth discussing.

Two minor wording gripes

Only one of the twelve principles behind the manifesto says anything quantitative, and that’s the only principle that’s been lost to time.

Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.

Through “doing it and helping others do it”, software practitioners have discovered much faster ways to deliver working software more frequently. It’s not unrealistic for a web-based application to be deployable in seconds, and for total delivery workflows including verification and validation to take minutes. Improving our capabilities is no bad thing, but putting a lower bound on delivery frequency lets people whose organisational restrictions limit releases to every two weeks blame “Agile” for that, and choose not to learn anything else from the collection of approaches to making software. If we must blame something, let’s blame Dark Scrum.

The other place where my red pencil comes out is in the attempt to bring people together that actually separates them, the division of people into “developers” and “business” (or, phrased another way, non-developers):

Business people and developers must work together daily throughout the project.

The authors could write “project collaborators must work together daily”, or something like that, and we wouldn’t have had pigs and chickens. We wouldn’t have “technical” and “non-technical” people.

Potentially, we wouldn’t have had DevOps either, because it wouldn’t have been necessary: project collaborators work together daily. Operation people are neither business people (depending on your line of business) nor developers, so they got excluded until somebody noticed.

Your problem is probably management

Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.

If the project collaborators have to run their ideas past people who aren’t on the project, or do things in the same way that other people do them on other projects, they haven’t got the environment or support they need for this project.

The journey isn’t over

We are uncovering better ways of developing software by doing it and helping others do it.

This is something that’s still happening, not something that was over when a group of professionals wrote a short document 25 years ago. We are uncovering better ways. This continues.

In some ways it’s surprising that no newer paradigms have come along to replace agile software development. On deeper reflection we find that anything new would be compatible with this approach, unless you give up on prioritising “satisfying the customer through early and continuous delivery of valuable software”.

In fact, in a world where the software represents autonomous agents rather than tools that customers use, maybe it could soon be time to prioritise something other than delivering software. For the moment, we’re still in a world where the agents are embodied in software tools, and we have to deliver those to our customers, and doing that in a way that’s early, continuous, and valuable still seems to make sense.

It may seem weird coming from someone who’s hitched their whole cart to the generative AI centaur, but while the tools and processes that genAI enables are new and exciting, and I think they’re going to prove valuable, I don’t think they’re going to be more valuable than individuals and interactions. That hasn’t changed in more than 25 years, and doesn’t need to change soon.

Posted in agile | 3 Comments

Opinionated Read: How AI Impacts Skill Formation


The abstract to this preprint, by two authors both associated with Anthropic, makes the claim “We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation – particularly in safety-critical domains.”

The first thing to appreciate is that this idea of “safety-critical domains” does a lot of heavy lifting when it comes to software professionalism—on the one hand, engineers say that while (intervention that is perhaps validated in controlled trials or in experience reports but not the way that engineers like to work) is clearly something that those safety-critical folks should concern themselves with, it’s not relevant to (domain that includes the engineers’ work). On the other hand, professional organisations in the field of computing refuse to engage with the idea that a software engineer should be expected to learn the software engineering body of knowledge precisely because it doesn’t have anything to say about how to build safety-critical software.

Now what is safety-critical software? Or, if you can’t answer that, what software isn’t safety critical? The book Moral AI tells the story of a self-driving car that collided with, and killed, a pedestrian who was pushing a bicycle across the road. The driver (for lack of an agreed and accurate word) reports streaming an episode of The Voice in the background while checking in to her employer’s work chat channels on the car’s central console. Is the car’s autonomous driving system safety-critical in this context? What about the built-in collision avoidance system, that the employer had disabled in this vehicle? How about the streaming software, or the chat application, or the operating systems on which they run? All of these contributed to a fatality, what makes any of them safety-critical or not?

The second thing is that the claim in the abstract is about learning outcomes, skill formation, and efficiency gains. We need to go into reading this paper keeping those terms in mind, and asking ourselves whether this is actually what the authors discuss. Because we care about what they did and what they found, and aren’t so worried about the academic context in which they want to present this work, let’s skip straight to section 4, the method.

What did they do?

Unfortunately, we don’t learn a lot about the method from their method section, certainly not enough to reproduce their results. They tell us that they use “an online interview platform with an AI chat interface”, but not which one. The UI of that platform might be an important factor in people’s cognition of the code (does it offer syntax highlighting, for example? Does it offer a REPL, or a debugger? Can it run tests?) or their use of AI (does it make in-place code edits?).

In fact when we read on we find that in a pilot study they found that such a platform (they call it P1) was unsuitable and that they switched to another, P2. Choosing a deliberately uncharitable reading, P1 is probably Anthropic’s regular platform for interviewing engineering candidates, and management didn’t want their employees saying it has problems because Silicon Valley culture is uniquely intransigent when it comes to critiquing their interviewing practices (if you think the interview system is broken, you’re saying there’s a chance that I shouldn’t have been given my job, and that’s too horrible to contemplate). Whether that’s true or not, we’re left not knowing what a participant actually saw.

The interviewing platform has “AI”, and the authors tell us the model (GPT-4o); this piece of information leads me to put more weight on my hypothesis about the interview platform name. It subtly reframes the paper from “AI has a negative impact on skills acquisition” to “our competitor’s product has a negative impact on skills acquisition”; why mention this one product if you took a principled position on anonymising product names? Unfortunately that’s all we get. “The model is prompted to be an intelligent coding assistant.” What does that mean? Prompted by whom? Did the experimenters control the system prompt? What was the system prompt? Was the model capable of tool use? What tools were available? Could participants modify the system prompt?

So now what we do know; 52 people (said in the prose to be split 26 in the “no-AI” control group, and 26 in the “AI access” treatment group; table 1 doesn’t quite add up that way) were given a 10 minute coding challenge, then a 35-minute task to make use of a particular Python library, either with or without AI assistance. Finally, they have 25 minutes to answer questions related to the task: an appendix lists examples of the kinds of questions used, but not the actual question script. This is another factor that negatively impacts replicability.

What did they find?

There’s no significant difference in task completion time between the two groups (people who used AI, and people who didn’t use AI). That is, while the mean task completion time is slightly lower for the AI group, the spread of completion times is such that this doesn’t indicate a meaningful effect.

Overall, people who used AI did less well on the test that they took immediately after completing the task, and this is a meaningful effect.
However, looking at their more detailed results (figure 7), it seems that among developers with less Python experience (1-3 years), the task completion time was vastly improved by AI access, and the outcome on the quiz was not significantly different.

Remember the sentence in the abstract was “Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library.” A different observation is “Participants with only a few years of Python experience showed significant productivity improvements, at no cost to learning the library”.

But does the quiz provide evidence of “having learned the library”? It’s a near-immediate test to recall knowledge about a task that the participants had just completed. What would the researchers have found if they waited four hours, or one day, or one week, to give the test? What would they have found if they set “learning the library” as an explicit task, and gave people time (with or without access to AI) to study? Would it make a difference if this study time was undertaken before participants undertook the coding task, or after? The authors find that some participants used significant time asking the AI assistant questions about the problem. In this way, they measure the total time taken to learn and solve the coding problem, in a situation where you’ve been given a total of 35 minutes for both.

The authors performed some qualitative analysis of their data. They find that people in the no-AI condition encounter more errors (syntax or API misuse) than people who use AI: this should be an interesting result, and a challenge to anyone who prejudges AI-generated code as “slop”. In this situation, it would seem that human-generated code is sloppier (at least in the first few minutes after it’s created).

They identify six AI interaction patterns among their treatment group, and that people who used three of the patterns achieved better results (though we can’t comment on significance as we no longer have statistics) than the control group on the quiz outcome, without impact on “productivity”. As someone who has attached their wagon to the horse of intentional use of AI to improve software engineering skills, this should give me the warm fuzzies. In the context of the control validity questions of the study, I don’t know that they’ve necessarily demonstrated such improvement.

At this point I have another confounding factor to add to these results: the researchers questioned participants on their programming experience, but not on their experience using AI assistants (beyond recruiting people with non-zero experience). Do the adoption of these patterns correlate with more experience using AI? Do people with more experience using AI get more productivity when they use AI? We can’t tell.

And we also can’t say anything about the skill of using an AI assistant itself. Participants are asked about their understanding of the Python library, and the authors transfer their performance answering these questions into a measure of “skill acquisition” learned in using the library. Is that the skill they exercised? Do the quiz answers tell us anything about that skill? If participants were asked a week later to complete a related task, would their performance correlate with the quiz results? Is using the Python library even a useful skill to have?

The authors observed that one pattern performed far worse on both task-completion time and quiz responses than all the others, and this was “Iterative AI Debugging”: verifying or debugging code using the AI. This result isn’t surprising, because the pattern represents using the model to evaluate the logic embodied in the code, and language models don’t evaluate logic. They’re best suited to what used to be called “caveman debugging” where you use print statements to turn dynamic behaviour into a sequence of text messages—because the foodstuff of the model is sequences of text. They don’t evaluate the internal state and control flow of software, so asking them to make changes based on understanding that internal state or control flow is unlikely to succeed. However, given the small amount of data on this debugging pattern, this is really a plausible conjecture worthy of follow up, not a finding.

This preprint claims that using AI assistants to perform a task harms task-specific skill acquisition without significantly improving completion time. What it shows is that using AI assistants to perform a task leads to a broad distribution of ability to immediately answer questions related to the completion of the task, with an overall slight negative effect, without significantly improving completion time. The relation of the acquired knowledge to knowledge retention, or to skill, remains unexplored.

Posted in AI | Leave a comment

Creating “sub-agents” with Mistral Vibe

The vibe coding assistant doesn’t have the same idea of sub-agents that Claude Code does, but you can create them yourself—more or less—from the pieces it supplies. [UPDATE: vibe 2.0 supports subagents directly.]

Write the prompt for the sub-agent in a markdown file, and save it to ~/.vibe/prompts/<agent-name>.md. For example:

# Test suite completer
You are an expert software tester. You help the user create a complete and valuable test suite by analyzing their software and their tests, identifying tests that can be added, and constructing those tests.
Use test design principles, including equivalence partitioning and boundary value analysis, to identify gaps in test coverage. Review the project's documentation, including comment docs and help strings, to determine the software's intended behavior. Design a suite of tests that correctly verifies the behavior, then investigate the existing test code to determine whether all of the cases you designed are covered. Add the tests you identify as missing.
## Workflow
1. Read the user's prompt to understand the scope of your tests.
2. Discover documentation and code comments that describe the intended behavior of the system under test.
3. Design a suite of tests that exercise the system's intended behavior, and that pass if the system behaves as expected and fail otherwise.
4. Search the existing test code for tests that cover the behavior you identify.
5. Create tests that your analysis indicates are necessary, but that aren't in the existing test suite.
6. Report to the user the tests you created so they can review and run the tests.

Write a configuration for an agent that uses this system prompt, and save it to ~/.vibe/agents/<agent-name>.toml. For example:

active_model = "devstral2-local"
system_prompt_id = "test-suite-completer"
[tools.read_file]
permission = "always"
[tools.write_file]
permission = "always"
[tools.search_replace]
permission = "always"

The active_model needs to be a model that you define in ~/.vibe/config.toml, or you can omit it to use the default model. The system_prompt_id needs to match the filename you give the system prompt file, without the .md extension.

You can use this agent by passing the --agent option to vibe, for example I use the following shell script, and create a symbolic link to the script that has the same name as the agent I want to use:

#!/bin/sh
if [ $# -ne 1 ]; then
    echo "Usage: $0 [prompt]"
    exit 1
fi
AGENT=$(basename "$0")
vibe --agent "$AGENT" --prompt "$1"

You can now use this agent directly at the command line, or tell vibe about the script so that it invokes your agent as a sub-agent.

Posted in AI | Leave a comment

Announcing Chiron Codex, a community of software centaurs

Software engineers don’t need to outsource our agency to coding agents. We don’t need to give up reading the code, or understanding the problems. We can use AI tools to augment our own capabilities, to improve our engineering knowledge and skills. To become software centaurs.

Chiron Codex is an initiative to do just that. In the short term, I’m creating a pattern language of AI-augmented software engineering, and a community of people who want to use AI to become better software engineers at Patreon and at YouTube. Longer term, we’ll explore ways to improve at all aspects of the software engineering lifecycle; becoming software generalists who use AI to complement our expertise, and our expertise to direct the AI tools.

Join us, and please consider supporting Chiron Codex by subscribing to the Archaeopteryx (super-early bird; so early birds haven’t even evolved) tier on Patreon! Here’s a video explaining the benefits.

Posted in AI | Leave a comment

Configuring your computer for local inference with a generative AI coding assistant

You can use multiple tools to download, host, and interact with large language models (LLMs) for generative tasks, including coding assistants. This post describes the one that I tried that has been the most successful. Even if you follow the approach below and it works well for you, I recommend trying different combinations of LLM and coding assistant so that you can find the setup that’s most ergonomic.

Choose hardware

You need to use a computer with either sufficient GPU, or dedicated neural processing, capacity to run an LLM, and enough RAM to hold gigabytes of parameters in memory while also running your IDE, software under development, and other applications. As an approximate rule, allow 1GB for every billion parameters in the model.

I chose Mac Studio with M3 Ultra and 256GB RAM. This computer uses roughly half of its memory to host the 123 billion parameter Devstral 2 model. A computer with 32GB RAM can run a capable small model; for example, Devstral Small 2: in this walkthrough I’ll show how to set up that model using Mistral Vibe as the coding assistant.

Note that once you have the model working locally, you can share it on your local network (or, using a VPN or other secure channel, over the internet) and access it from your other computers. You only need one computer on your network to be capable of hosting the LLM you choose to use local inference from any computer on that network.

Install LM Studio

Visit LM Studio and click the download button. Follow the installation process for your operating system; in macOS, you download a DMG that you open, and drag the app it contains into your Applications folder.

Download the model

Open LM Studio, and open the Model Search view by clicking on the magnifying glass. In macOS, check the MLX box to use the more efficient MLX format, and leave GGUF unchecked. Search for “Devstral Small 2 2512”, and click Download to download its weights and other configuration data. The second number (2512) refers to the release date of the model—in this case, December 2025.

Load the model and test it

When your model is downloaded, switch to the Chats view in LM Studio. In the window toolbar, click “Select a model to load” and choose the model you just downloaded. Optionally, toggle “Manually choose model load parameters” and configure settings. I typically alter the context size, as the default model size is 4096 tokens which optimises for inference speed and small memory footprint over a large “working set”. Click “Load Model” to tell LM Studio to serve the model. You can also tell LM Studio to use the model’s maximum context size as the default whenever it loads a model, in the app’s settings.

When LM Studio loads your chosen model, it opens a new chat with the model. Type a prompt into this chat to validate that the model is working, and has enough resources for inference tasks.

Download, configure, and test a coding assistant

Coding assistants typically expect to take an API key, and communicate with a model hosted in the cloud. To use a local LLM, you need to configure the assistant.

Follow the instructions in the Vibe studio README to install the tool. In Terminal, run mkdir ~/.vibe. Use your text editor to save the following content in a file called ~/.vibe/config.toml:

active_model = "devstral2-small-local"

[[providers]]
name = "lmstudio"
api_base = "http://localhost:1234/v1"
api_key = "LM_STUDIO_API_KEY" # LM Studio doesn't use this value
api_style = "openai"
backend = "generic"

[[models]]
name = "mistralai/devstral-small-2-2512"
provider = "lmstudio"
alias = "devstral2-small-local"
temperature = 0.2
input_price = 0.0
output_price = 0.0

Now test the assistant by running vibe in Terminal, and typing a prompt into the assistant.

Further learning

I’ve recently started Chiron Codex, an initiative to create software engineering centaurs by augmenting human knowledge of the software craft with AI assistance. You can find out more, and support the project, over on Patreon. Thank you very much for your support!

Posted in whatevs | Leave a comment