Software is Entropy

Mircha Emanuel D'Angelo ·
AI Programming Software-Engineering Reflections Entropy TDD Vibe-Coding
Software is Entropy
Mircha Emanuel D'Angelo

Note: This is an AI-generated translation from my original Italian article: Il software è entropia

Writing code has never been so easy. Maintaining it has never been so hard.


One evening in front of the monitor

It's an ordinary evening. The cup of barley coffee has been cold for an hour, but I only notice now that I've been staring at the screen too long. For a while now I've decided to drink only one coffee a day, and I've already spent this one's quota this morning. So barley, and soon probably an herbal infusion. I'm doing a code review on a project. It's not my project: it's one of those projects that today are called vibe-coded, written almost entirely by letting the AI take the lead, through a series of prompts and chained acceptances.

On my monitor, two windows side by side. On the left the terminal, with Claude Code on standby, ready to respond to the next input. On the right Neovim, where I'm flipping through the code line by line. I open one file, then another, then another. Each file, on its own, looks reasonable. Functions with sensible names, a few comments, even a few green tests.

And yet, as I scroll, I feel that thing anyone who's been programming for years knows well: a dull discomfort. Not a syntax error, not a specific bug. A structural unease. I decide to stop going by gut and to bring in a tool. I set PHPStan to level 7, launch the analysis, and as the output scrolls by I realize the feeling was well-founded: 495 errors. Four hundred ninety-five. Okay. There's definitely work to be done.

I keep scrolling. I find functions that assume things that aren't guaranteed anywhere else. The same piece of logic replicated in three places, each time with a small variant. Calls to three different libraries that do the same thing, because in three different prompts the AI suggested three different solutions and nobody bothered to unify them. Data models that contradict each other from one module to another. Missing type hints, ambiguous return values, nullables treated as if they were guaranteed.

It works. Everything works. The features are there, the average user wouldn't notice anything. But I know, because I've seen plenty of these, that the code is a ticking time bomb. In six months, when someone asks to add a feature that cuts across all those modules, the whole thing will blow up.

I close Neovim. I close the terminal. I get up to grab another barley, or maybe I'll go straight for a chicory infusion. I need to think. I already know that the decision, in the end, will be to redo it from scratch.

And a slide comes to mind.

The 2024 talk

In 2024 I gave a talk at MOCA, the Metro Olografix Camp, on rapid web application development. It was a very practical talk, on Laravel and Filament, two tools that let me spin up a complete, authenticated application with a decent admin interface in a few hours. I had finished the slides that very morning, in pure test in production style. One of those slides was black, white text, and said only one thing.

"Software is like entropy. It is hard to grasp, weighs nothing, and obeys the second law of thermodynamics: it always increases."

Norman R. Augustine, 17th law of Augustine.

Augustine was a former US Air Force pilot, then for a long time an executive in the aerospace industry. He put on paper a series of half-ironic, half-tragic laws about how complexity really works in large projects. The seventeenth is my favorite, because it's the most honest.

Entropy, in the thermodynamic sense, is the measure of disorder. The second principle of thermodynamics says that, in an isolated system, entropy can only increase. Things tend to fall apart, to disorganize, to lose structure. To restore order, you have to do work: spend energy, from outside, to put the system back together. Software, says Augustine, behaves exactly like that. It tends toward disorder. Every new feature added in a hurry, every dependency updated without a careful look, every quick fix left in production "for now," increases the disorder. If nobody spends work to reduce it, disorder wins. Always.

There you go: in over twenty years on the job, I've seen this happen everywhere. And it's not an AI problem. It's a structural problem of software.

Software is a complex system

Think about it. Any web application is not a block of code. It's an ecosystem. There's the code we write, yes, but underneath there are dozens of third-party libraries, each with its own logic, its own versions, its own bugs. There's a database, with its rules about what can and cannot happen to data. There's an operating system, there are containers, there's a network with its unpredictable latencies. There's a browser on the other side — actually, there are a thousand different browsers on a thousand different devices. There are users doing things the designer never imagined. There are integrations with external systems that can change APIs without notice. There are business requirements that change every quarter.

Each component, on its own, is manageable. It's in the interactions that real complexity is born. And interactions grow, combinatorially, much faster than the number of components.

To this you add Brooks's law, fifty years old and still valid: adding people to a late project will make it later still. Because a bigger team means more communication channels, more decisions to align, more context to keep coherent. Complexity isn't only in the code: it's in the team that writes it.

When people talk about "producing software," most of them think about writing code. But the truth, and here's one of the most underestimated facts of the entire industry, is that initial writing is the smallest slice of total cost.

The real cost is maintenance

There's a chart I used in that talk, in turn taken from Enrico Zimuel, that shows the relative cost of fixing a bug depending on the phase in which it's discovered. The model is simple. Let's take the cost of fixing a bug during requirements analysis as the unit: 1. The same bug discovered during development costs about 5 times as much. During integration testing, around ten. In acceptance testing, around fifteen. In production, after release, up to 30 times as much.

Chart of the relative cost of fixing a bug based on the phase of the project in which it's discovered. Slide from my MOCA 2024 talk: the cost of fixing a bug grows non-linearly depending on the phase in which it's discovered.

Thirty times. And I'm not talking about the bug where the button is two pixels misaligned. I'm talking about real functional bugs: a transaction that gets duplicated, a permission granted to someone who shouldn't have it, a calculation that returns the wrong result in an edge case. That kind of bug, in production, is never just a bug. It's a series of costs that add up: the service downtime, the team dropping everything to handle the emergency, the communications to customers, any data to recover, any trust lost, the legal implications if there's a regulation involved.

The exact numbers in the chart should be taken for what they are, an average estimate that varies a lot from context to context. But the underlying message is solid and confirmed by decades of studies: the later you discover a problem, the more it costs to fix. Exponentially, not linearly.

And here's the trap of the historical moment we're in. Tools like AI let us write code much faster. Everyone, from the junior in their first week to the senior with twenty years of experience, produces more. It would seem to be a net gain. But if writing speed grows and care in the early phases drops, something very specific happens: more and more work gets pushed into later phases, where each unit of work costs much, much more.

In other words: we are accelerating the injection of problems in the early phases, where fixing them would cost little, only to discover them in the later ones, where they cost ten, twenty, thirty times as much.

"But TDD is useless now, right?"

This sentence, in various forms, has been said to me more than once in recent months. The stated logic is always the same: the AI writes the code anyway, writing tests first is a waste, just ask it to generate them at the end, and off we go.

For a while, I admit, I half-believed it too. It's one of those ideas that, on the surface, sounds logical. Then I stopped to think about it, and I realized the sentence comes from a wrong idea of what TDD is.

Test Driven Development is not "writing tests before code" as a bureaucratic compliance task. It's a way to force yourself to think about the system's invariants before writing them. It obliges you, during analysis, to ask yourself what your piece of code must guarantee, under what conditions, with what allowed inputs, with what expected outputs, and what should happen when things go wrong. The test is a side effect of this thinking. The real value is the thinking.

When you delegate both the code and the tests to the AI, generated afterwards, you get something very subtle: a system where code and tests are coherent with each other, but neither is anchored to a real analysis of the problem. The tests verify that the code does what the code does. A well-written tautology.

And here I want to tell you something that actually happened to me, a few months ago. I'd been handed the management of a piece of software produced by a serious company, made of competent people. One of the first things I noticed, going through the CI pipeline, was a rule carved in there like a dogma: no deploy without 100% test coverage. None. If coverage dropped even by a single percentage point, the deploy would be blocked, the build would turn red, and someone had to fix it. A round one hundred percent, always. Every single line of code executed by at least one test. I should have relaxed. No bugs, right?

In a few weeks I found and fixed dozens of bugs. Real bugs, of logic, of edge case handling, of assumptions the code was implicitly making and that reality didn't honor.

How is that possible? It happens, and it happens all the time, because coverage doesn't measure what people think it measures. It measures how many lines of code are executed by at least one test. It doesn't measure whether those tests verify anything meaningful. You can comfortably reach 100% coverage by writing tests that traverse all the code without making almost any assertions, or by making trivial assertions, like "this method does not throw exceptions," that exclude no wrong behavior. You can reach 100% coverage without ever having asked yourself, even for a moment, what this system was supposed to not do. And without ever having thought of a single edge case.

100% coverage is a reassuring number. It sits on the dashboard, it sits in the badges, it sits in management slides, it lets you sleep at night. And that's exactly why it's dangerous: it gives the illusion of a safety net that in reality has meshes wide enough to let almost everything through. The number is high, so everyone stops looking.

AI, today, is a wonderful generator of tests that drive coverage up. They are plausible, readable, well-structured tests. Almost none of those tests, generated automatically, will tell you something you didn't already know. The numbers will go up, the real quality of control will go down. And that's how systems with perfect metrics and disastrous behavior get produced.

TDD, today, is not useless. It's more useful than before, because it's one of the few practices that forces you to stop and think in an era where everything invites you to accelerate. And because it brings you back to what counts: not how many lines are covered, but which invariants you've decided, by writing them, to protect.

Software engineering is not disappearing

One of the most widespread narratives of the moment is that the traditional software engineer is on the road to extinction. The truth, from where I stand, is exactly the opposite. Software engineering is not disappearing. It's shifting.

It's shifting from the fingers to the head. It's shifting from "knowing how to write the most elegant for loop" to "knowing how to decide whether that for loop should exist there at all." It's shifting from producing code to caring for the structure, designing the invariants, defining the system's boundaries, keeping the domain model coherent. It's shifting toward what, in another article a few months ago, I called the famous 30% of the work: architecture, security, edge cases, critical thinking.

AI is excellent at producing the 70%. That 70%, however, on its own, is a commodity. It's worth little. In fact, it's dangerously cheap, because it leads people to think that's enough. The value is in the remaining 30%, which is what AI doesn't know how to do, because it requires a holistic view, a memory of what has already happened in the system, an understanding of the business context, and that pinch of healthy paranoia that only develops after you've seen things explode in production.

When I use AI today, and I use it every day, intensely, I don't do it to write less code. I do it to focus my hours on the decisions that count. The decisions about architecture, about data model, about responsibility boundaries between modules, about test strategy, about failure handling. On those I delegate nothing. There I stay awake, there I apply exactly the same software engineering practices I learned in twenty years.

And the same goes for the right framework, for the consolidated libraries, for the mature conventions. Tools like Laravel are not "shortcuts": they are accumulators of tens of thousands of hours of work by engineers who have already thought about the problems you are trying to solve today. Relying on a solid framework is not laziness. It's recognizing that there is such a thing as the state of the art, that it's worth knowing, and that your added value is somewhere else.

Everyone can write software. Almost nobody knows how to maintain it.

This is the sentence that's been spinning in my head for weeks. Everyone can write software. It's true, finally it's true, and it's excellent news. The kid who today wants to build their own little utility has tools that I, at that age, could only dream of. The barrier to entry has crashed. That's a good thing.

But writing a small piece of software, with three features, one user, and zero responsibilities, is a very different exercise from keeping alive a system with hundreds of features, thousands of users, integrations with the outside world, and data someone is responsible for. The distance between the two is enormous. It's the same distance there is between building a hut in the garden and designing a skyscraper. The hut stands up even if you build it badly. The skyscraper does not.

Something happened to me a few weeks ago that left me bewildered. A friend asks me to take a look at a piece of software he developed, and which he's already put into production, complete with an enthusiastic announcement on X. I go to look. The first surprise, somewhat digestible, is to find that the software has been entirely vibe-coded by people who are not software engineers. I get it, today it's possible, it's even an achievement. What I didn't expect is what I find in the first five minutes of analysis.

The API keys for external services, in plain text in the client code. The database tables, hosted on Supabase, completely exposed to the client because nobody had enabled the Row Level Security policies: anyone, from the browser, with the project's public key, could read and write any table. The S3 storage where anyone, with the same key, could upload arbitrary files. I stopped there. I didn't go any further, I didn't need to, and frankly I didn't want to.

That software was online. It had real users, real data, real trust from people who had no way of knowing that their information was sitting behind a door left wide open. None of those who built it, I'm sure, wanted to harm those users. But that's exactly the point: good faith is not enough, and never has been.

And we, as an industry, are about to produce an enormous amount of code written the way you'd build a hut, but destined to hold up like a skyscraper. That code will reach production. It will serve real people. It will handle real money, real data, real lives. And when entropy does its work, and it will, because that's its work, someone will have to clean up.

That someone, today, is still a human being. A software engineer with the patience to read the disaster, the lucidity to understand where it cracked, and the honesty to say: this needs to be redone.

Back to the monitor

I go back to the desk with the steaming infusion. I reopen Neovim, I reopen the terminal. I look at the screen for a few minutes in silence.

I could tell my colleague that yes, there are things to fix, but with two weeks of spot interventions we can make it presentable. It would be the easy answer. It would also, almost certainly, be a dishonest answer, because I know well the story I'd be in for. Patches on top of patches. Bugs popping up in production in places in the code that, on the surface, have nothing to do with the change just made. Every new feature requiring three days just to figure out how to wedge it into that structure. A team that, instead of building value, spends its time doing defensive maintenance, writing release notes full of temporary fix to avoid regression on X. And every temporary fix, we all know, has a different name from the one it claims: it's called technical debt, and it accrues interest.

And here I get back to the point I started from. Entropy, in that codebase, has already picked up too much speed. Every spot intervention doesn't slow it down: it accelerates it. The cost of maintenance, which in any software is already the largest line item in its life cycle, in a system like this grows non-linearly. The further you go patching, the more expensive it becomes to keep going.

I start over. I open a new, empty repository. But I don't write code. I start thinking. I open my notebook and I draw the domain model before even touching the keyboard. I decide the boundaries of the modules, the responsibilities of each, the contracts between the parts, the invariants I want to protect. I decide what the system must do and, above all, what it must not do. It's in this phase that almost everything is played out. And it's the phase in which AI, on its own, doesn't know how to move: it doesn't have my memory of the domain, it doesn't know what has already exploded in the past, it doesn't know the unwritten constraints of the business.

Yes, I use it. I use it a lot, and in this whole article I've never said otherwise. I use it intensively, every day, in a structural way to my work. I'm not here playing the old nostalgic guy who defends the mechanical keyboard. I'm here to say that it makes a huge difference how you use it.

When the mental model starts to be clear, I open Claude Code and use it as an exploration companion, not a stenographer. I work in spikes. The spike, in the world of software development, is an idea by Kent Beck, one of the fathers of Extreme Programming. It's a small code experiment, written quickly and with the explicit intention of throwing it away, that serves to answer a precise technical question. Does this library actually handle case Y the way I think? Does this data model hold up if I load it with a million records? Does this architectural choice have the latency profile I need? The spike is not the product. It's a probe in the ground. You write it, you have it answer the question, and you throw it away. What remains is the information, not the code.

With an agent like Claude Code, spikes have become much faster. I can explore in half an hour architectural hypotheses that would have taken me a day before. I can compare two approaches in the same time it would have taken me to do one. But it's me deciding which spikes to run, me reading the results with a critical eye, me deciding which hypothesis to promote and which to discard. The AI doesn't bring the right questions: I bring them, the AI helps me answer them quickly.

Only afterwards, when the architecture has held up under the spikes test, do I write the first real tests, the ones that crystallize the system's invariants (rather, it's me telling Claude Code which tests to write and how to write them, case by case, assertion by assertion). Not because a 2010 consultant would tell me to, but because I need to force myself to fix in writing what this system must guarantee, before bothering to make it run. It's only at that point that I ask Claude Code to help me write the production code, under the guidance of a structure I've already decided, checking every piece it produces and rejecting what doesn't follow the rules I've set.

The thinking remains mine. The structure remains mine. The responsibility remains mine. AI is a very powerful accelerator, but the direction is set by me. It's in this direction, not in the fingers on the keyboard, that the software engineering I learned in twenty years lives. And no agent, today, can take it on instead of me.

Is it a waste of time? Maybe. Maybe once you think about it first.

Entropy, however, can't be delegated.

It never could be delegated. Not with the low-code platforms of twenty years ago, not with the full-stack frameworks of ten years ago, not with today's AI. The tools change, the speed changes, who can start changes. What doesn't change is who, in the end, has to put things in order. And this, for quite a while still, will be human work.

In fact, it will be more and more human work. Because the faster we produce disorder, the more someone will need to have the skills, the patience, and the head, to put order back.

And that someone, whatever their age and whatever their degree, I will keep calling a software engineer.

#AI #Programming #Software-Engineering #Reflections #Entropy #TDD #Vibe-Coding