The Next Turn of the Spiral: Fixing Vibe Coding Without Reinventing Software Engineering

We've been here before. We know what to do.

Mar 15, 2026

The Problem Everyone Is Getting Wrong

Here is an apparent paradox: everything about LLM-based programming is genuinely new, and none of it is new at all. Both halves of that statement are true simultaneously, and understanding why is the point of this essay.

The enthusiasts are right. Natural language is genuinely a new kind of programming language. The accessibility is genuinely broader. The experience of building things has genuinely changed in ways that would have seemed remarkable even a decade ago. When someone describes what they want in plain English and working code appears, something real has happened.

The skeptics are also right. The underlying principles of specification, complexity, abstraction, and certification are unchanged and inescapable. Complexity doesn’t disappear because you express it in English. Systems built on underspecified foundations fail, regardless of what language was used to specify them. The old rules apply with exactly the same force they always have.

Both are right because they are describing different levels of the same phenomenon. The enthusiasts are observing the surface correctly. The skeptics are observing the structure correctly. The apparent contradiction dissolves once you recognize that we have been exactly here before — not metaphorically, but precisely. Every previous turn of the abstraction spiral in programming history presented the same paradox: a genuinely new language or mode of expression — machine code to assembler, procedural to object-oriented, formal syntax to natural language — that nonetheless had to obey all the old rules, because those rules admit no exceptions regardless of the language in which the work is expressed. And every time, the resolution was the same. The ecosystem eventually reconstructed, at the new level of abstraction, the same disciplines and institutions that had proven necessary at every prior level. Not because anyone planned it that way, but because the underlying principles left no other choice. This essay argues that we are at that moment again — that the apparent conflict between “everything changes” and “nothing changes” is not a paradox to be resolved but a pattern to be recognized, and that recognizing it tells us precisely what work lies ahead.

The Principle That Doesn’t Care What Language You Use

In 1956, the cyberneticist Ross Ashby formulated what became known as the Law of Requisite Variety: a system that regulates another system must have complexity at least equal to the system it regulates.¹ Applied to software specification, the implication is straightforward and unforgiving. A specification that is less complex than the program it describes cannot uniquely determine that program. The missing complexity has to come from somewhere.

In traditional programming, missing complexity surfaces as compiler errors, failed tests, or runtime crashes. The system fails loudly and points at the gap. In LLM-based programming, missing complexity is filled silently by the model’s statistical priors — by whatever the training data suggested was most likely in this context. The program doesn’t fail to compile. It compiles beautifully into something that may or may not be what you wanted, and you may not be able to tell the difference until much later.

This is the actual problem with vibe coding. Not that natural language is imprecise — English can express statements as precise as anything written in C++ or Python. The problem is that natural language, unlike formal programming languages, permits imprecision. It doesn’t enforce complexity. The failure mode is invisible rather than loud, and invisible failures in software are the dangerous kind.

When a vibe coder writes “encrypt the user’s password,” the LLM doesn’t leave a gap where the encryption specification should be. It makes choices — about algorithm, key management, padding scheme, error handling — based on statistical inference from training data. If the training data contains a high proportion of incorrect, insecure, or naïve implementations — and the internet is generously supplied with exactly those, in tutorials, Stack Overflow answers, and code samples written to illustrate something other than security — then the LLM will do its best to ensure that its output has a roughly similar probability of being wrong. The statistical process is not biased toward correctness; it is biased toward resemblance to what it has seen. This is not a flaw in the system. It is the system working exactly as designed. What is commonly called an LLM “hallucination” — a confident wrong answer — is often the architecture functioning correctly: producing output that faithfully resembles the distribution of its training data, errors included. The hallucination framing implies a departure from normal behavior. The more precise framing is that an LLM trained on the internet will replicate the internet, including its confident mistakes, its cargo-culted patterns, and its widespread misunderstandings — at scale, without fatigue, and with complete syntactic fluency. What is missing is any mechanism that introduces correctness as an explicit constraint on a process that has no such constraint by default. The result looks complete. It may even work, in the sense that it runs without errors. But those choices were never made by the developer. They were made by a probability distribution, without the developer’s knowledge or review.

The rest of this essay is about what that mechanism looks like and why we have built versions of it before.

We Have Been Here Before

This is not the first time a new programming language created this problem. It is, in fact, the same problem we have encountered at every previous turn of the abstraction spiral.

Grace Hopper understood it in the 1950s. When she worked to make COBOL as English-like as possible, she was trying to collapse the gap between specification and program entirely — to create what she called an executable specification.² The dream was that a business manager could write something close to plain English and have that be both the description of what was wanted and the artifact that executed it. COBOL was a partial success. The gap narrowed but never closed.

The first solution to the problem of specification complexity was the subroutine. A programmer who needed to search a string did not have to implement string search, and did not have to specify it either. A single terse call stood in for both the implementation and its specification simultaneously — the first complexity voucher. The subroutine was small at first: a few instructions for computing a square root, converting a number to a string, sorting a short list. But the principle it established was unlimited in scope. The complexity encapsulated inside a subroutine call could grow indefinitely while the call itself remained a single terse instruction.

Over decades the subroutine grew. Small utilities became library functions. Library functions became operating system services. Operating system services became entire subsystems. At each step the same principle held: the person calling the function did not need to understand it, only to trust it. The voucher remained valid regardless of how much complexity it encapsulated. By the time this trajectory reached its logical conclusion, the voucher could represent not a few instructions but millions of lines of carefully engineered code — an entire product, callable as though it were a simple function.

There was once a time when a database system, a text editor, a calendar, and a communications tool were entirely separate artifacts. Each stood alone. A programmer who wanted to combine their capabilities faced an impossible task: the products had no interfaces designed for composition, no way to be called rather than simply used. Integration was not merely difficult; it was architecturally excluded. This changed with the recognition that a product could expose its function while keeping its form separate — that the million-line complexity of an entire product could be made available through a defined interface that any other system could call.

In the 1980s at Digital Equipment Corporation, the same problem appeared in a different form. Products were being built as monolithic systems that bundled their interfaces tightly with their underlying functions. A business manager who needed to pull figures from a database, check a calendar, do some arithmetic, and write a memo had to run four separate programs, each with its own commands, its own conventions, its own mental model. There was no way to call the function of one product from within another. Integration was architecturally excluded.

The solution was callable interfaces — DTR$DTR for Datatrieve, TPU$TPU for the Text Processing Utility, and similar interfaces across Digital’s product line. The underlying function was separated from the form through which it was accessed. Anyone could now compose a new interface from existing functional components, calling certified, well-understood capabilities without reimplementing them. The complexity of the underlying function was encapsulated, accessible by reference, no longer requiring full understanding by everyone who used it.

The payoff was demonstrated concretely in Digital’s ALL-IN-1 integrated office system. ALL-IN-1 was able to provide a remarkable breadth of capability — document creation, database access, electronic mail, calendar management, and more — precisely because it was built as an integration of Digital’s other products as callable components into a consistent whole with a common user interface. The ALL-IN-1 engineers did not reimplement what Datatrieve or TPU already did. They called certified, well-understood components and concentrated their effort on the integration itself — the novel interface, the consistent user experience, the coherent composition of capabilities that none of the underlying components provided alone. The result was greater than the sum of its parts, and it required far less code than a ground-up implementation would have demanded. This is what the callable interface architecture made possible: integrated solutions whose complexity was managed through composition rather than reimplementation.

At Microsoft in the early 1990s, the same architectural vision found its fullest expression in the OLE and COM family of technologies. OLE — Object Linking and Embedding — provided the framework for compound documents and inter-application communication; COM Automation extended that foundation to make it accessible to solution builders. The goal was explicit: allow solution developers using Visual Basic or VBA to call components written in C++ without understanding what C++ was doing. A solution builder could interact with a database, an editor, a spreadsheet engine — not by understanding how these things worked, but by calling certified interfaces that guaranteed specific behaviors.

The Microsoft Solutions Development Framework that emerged from this work made a distinction that proved durable: solution developers, who understood user needs and worked in high-level languages, and component builders, who understood machine behavior and worked in lower-level languages. Each group had tools appropriate to their domain. You couldn’t write a device driver in Visual Basic, and writing a user interface in C++ was laborious where VB made it trivial. The distinction wasn’t about intelligence or status. It was about which complexity each role was responsible for managing.

The architectural decision that made this separation durable was the immutability of COM interfaces. COM relied on composition rather than inheritance, which meant that component builders were free to refactor, optimize, or completely reimagine their implementations — but the interface contract could not change without explicit versioning. This was not merely a technical constraint. It was a guarantee to solution builders that the complexity of the component world, however much it evolved, would never silently reach across the boundary and become their problem. The interface was a promise as much as a specification: changes below the line stayed below the line.

The Voucher That Cashes Out

These historical examples share a common structure that goes to the heart of the Ashby problem.

When a VBA programmer wrote database.Query(), they were writing a terse statement that nonetheless carried enormous specification complexity — not because the programmer understood that complexity, but because someone else had already fully specified and implemented it, and the interface was a certified reference to that work. Call it a complexity voucher: a short, accessible token that points to a fully elaborated specification.

The alternative — and the failure mode of vibe coding — is what might be called a fake voucher. When a vibe coder writes “query the database,” they are superficially doing something similar: using a terse phrase to gesture at a complex domain. But database.Query() is a pointer to a determinate specification. “Query the database” is a pointer to a distribution over possible specifications, weighted by the LLM’s training data. The compression is illusory. There is no agreed decompression key.

Experienced developers use LLMs more effectively than novices for exactly this reason. They know which terse phrases function as genuine complexity vouchers — because they know the underlying specifications those phrases reference — and which phrases are merely gestures toward a vague neighborhood of solutions. They know the difference between “use RSA encryption” and “use OpenSSL’s RSA_PKCS1_OAEP_PADDING with 2048-bit keys generated from a hardware entropy source.” Each additional constraint narrows the distribution of possible implementations the LLM samples from. Full specification collapses the distribution to something approaching determinism — but at that point, you’ve essentially written the code yourself and are using the LLM as a syntax formatter.

The leverage of LLM-based programming exists in the gap between specification and implementation. That gap is also where the probabilistic uncertainty lives. The challenge is to capture the leverage while managing the uncertainty — and the answer, as it was at every previous turn of the spiral, is certified abstraction.

A Specification Library

The solution is straightforward to state, though not trivial to build: a library of specifications for common computational functions, design patterns, and system-level invariants, maintained as versioned, human-readable, publicly accessible documents — and a development discipline that requires LLMs to consult that library before generating any code.

This is less novel than it sounds. It is precisely what the software development community already does with code — through open source repositories, package managers, standards bodies, and platform-bundled libraries. npm, PyPI, CTAN, GitHub: each is a versioned, distributed, reputation-governed ecosystem of reusable components. The difference is that the artifacts being shared, versioned, and governed would be specifications rather than implementations. The infrastructure, the trust mechanisms, the contribution patterns, and the governance models are all already proven at scale. What is new is applying them one level up.

The existence of these prior traditions is not a limitation the essay needs to manage. It is confirmation of the essay’s central argument. Design by Contract, developed by Bertrand Meyer in the Eiffel language, showed that software elements should carry specifications governing their interaction. TLA+ and related formal tools showed that system behavior can be specified precisely and machine-checked. Alloy showed that constraint exploration could surface design flaws before implementation. OpenAPI and schema-first API design showed that a human- and machine-readable specification can serve as the source of truth from which validation, documentation, and code artifacts are generated. W3C and OASIS conformance methodology showed that reference implementations and conformance test suites are achievable at scale across diverse implementors. These traditions exist. They work. They are underused precisely because they were developed for a previous turn of the spiral, at a level of formality and tooling investment that the mainstream of software development was unwilling to sustain. The LLM context changes the calculus. A specification that once required a programmer to learn TLA+ before it could do any work now requires only that it be readable — by a human and by an LLM. The prior art does not need to be reinvented. It needs to be recognized, connected, and applied to the generation loop that is already in use. We have been here before. We know what to do. The question is whether we will do it.

There is a more fundamental distinction worth drawing, one that also clarifies why this approach differs from its predecessors more than it might first appear. Spec-first development — write the specification, then implement against it — is a familiar methodology with a mixed track record. It leaves two artifacts that can drift apart: the spec describes intent, the code is reality, and synchronizing them is a perpetual maintenance burden. Literate Programming tried to collapse that gap by interleaving document and code in a single artifact, but the practitioner still authored both, and two things that could diverge still existed.

The vibe coding context is categorically different, and the spec library proposal is designed for that context. The practitioner does not write code. The natural language prompt is not a precursor to code — it is the artifact the practitioner authors and controls. The LLM output is not an implementation of a separately maintained specification; it is the direct execution of intent, constrained by the spec library. This is not spec-first. It is spec-as-code. The specification is not documentation of what the code should do; it is a constraint on what the execution produces. The analogy is not a comment preceding a function — it is a type annotation enforced by a compiler. The synchronization problem of spec-first development disappears because there is only one artifact: the specification and its execution are the same act.

The discipline would work roughly as follows. When given a programming task, the LLM first decomposes the problem into its functional components. It then queries the specification library to find whether certified specifications exist for any of those components. Where specs exist, the LLM’s code generation is constrained by those specs rather than by unconstrained statistical inference. Where no spec exists, the LLM flags the gap explicitly rather than filling it silently.

This changes the character of LLM output fundamentally. Instead of a single artifact — here is your code — the output becomes a structured report: components handled by certified specs, with version numbers and references; components where specs exist but organizational policy decisions are required; components where no spec was found and probabilistic generation was unavoidable; platform and context dependencies where the LLM’s knowledge is asymmetric; and ambiguities in the original specification that decomposition revealed.

The unresolved components list is particularly valuable. It converts invisible uncertainty into a visible engineering artifact — a prioritized research agenda that tells a development team exactly where to focus human attention. The LLM is no longer hiding its uncertainty inside apparently complete code. It is surfacing uncertainty where a human can act on it.

What Goes In the Library

The specification library would contain three broad categories of entries.

Behavioral specifications describe functions whose outputs are deterministic given their inputs. Cryptographic algorithms, network protocols, sort algorithms, encoding schemes. These are the computational equivalents of mathematical definitions — there is a ground truth, and a sufficiently detailed spec can converge on it. The gold standard here already exists: IETF RFCs are precisely this kind of artifact. RFC 8017 is the RSA specification.³ The framework doesn’t need to reinvent these — it needs infrastructure that connects them to the code generation process.

A well-written spec also transcends its original domain. The abstract problem that KMP solves — find a pattern within a longer sequence efficiently — is not specific to text. Bioinformatics researchers recognised this early: the Smith–Waterman algorithm (1981)⁴ and the tools built on it, including BLAST and FASTA, apply the same foundational insight to DNA and protein sequences, adding a domain-specific scoring layer to handle biological realities such as tolerated substitutions and insertion events. An LLM presented with a DNA sequence matching problem and a sufficiently abstract string search spec could reason by analogy, recognising the structural similarity and adapting the specification to the new domain, rather than either failing to make the connection or inventing an approach from scratch that silently violates known requirements. The generality of a spec is a design choice: a spec written at the right level of abstraction becomes applicable far beyond the problem that motivated it.

Constraint specifications describe components that must satisfy a set of conditions while retaining legitimate degrees of freedom. User interface components, accessibility requirements, layout systems, style guides. No single reference implementation exists because many valid implementations exist simultaneously. These specs define what is outside the valid space — no red for colorblind accessibility, minimum font size, right border alignment — while a curated suite of example implementations populates the interior of the valid space. The spec is the statute; the example suite is the case law.

System-level invariants are cross-cutting constraints that apply regardless of which components are used or how they are assembled. Security requirements, privacy constraints, regulatory compliance obligations. A private key must never be stored in plaintext. Personal data must never be logged without consent. These are constraints that currently live in the heads of experienced developers, in post-mortem reports after failures, or in compliance documents that developers may or may not have read. Encoding them as active constraints on code generation converts known requirements from passive reference material into enforced properties of every generated system.

The organizational policy layer sits above all three categories. A company specifies which spec versions are approved for its projects, which system-level invariants apply to all generated code, and what confidence thresholds require human review. Solution builders are generally relieved of these decisions until a breaking change requires organizational action — exactly as COM Automation shielded VBA programmers from C++ implementation changes until an interface version required updating.

Reference Implementations and Conformance Tests

Every specification that can be implemented should be accompanied by a reference implementation and a conformance test suite. This is not a novel idea — the best standards bodies have always done this, and the practice reveals why it matters.

A reference implementation is an executable proof that the specification is internally consistent and complete enough to produce a working system. Any specification that cannot generate a reference implementation has failed its own standard. It contains degrees of freedom the spec author didn’t resolve — places where an LLM generating code would be sampling from a distribution rather than converging on a determinate result. The failure is diagnostic: it tells the spec author exactly what work remains.

A conformance test suite converts the qualitative question “does this implementation conform to the spec” into a quantitative and automatable one.⁵ An LLM-generated implementation can be compared against the reference implementation’s behavior across the conformance test suite, producing a conformance score rather than a binary judgment. Organizational policy can specify minimum conformance thresholds for different risk levels — 98% might be acceptable for a logging component, 100% required for authentication.

Together the spec, the reference implementation, and the conformance test suite constitute a complexity voucher of the highest order — simultaneously human-readable, machine-executable, and LLM-interpretable. The spec is what the solution builder references. The reference implementation is what the component builder produces. The conformance tests are what the CI/CD pipeline checks. Each serves a different consumer without contradiction.

There is an important quality gate implicit in this structure, applicable to behavioral specifications: if a behavioral spec cannot be used to generate a reference implementation and a conformance test suite, the specification is broken. The failure is not in the tooling — it is in the spec itself, which contains unresolved ambiguity or internal contradiction. This gate requires no external authority to apply. A spec that fails to bootstrap its own verification has failed by the very standard the framework is built on. Constraint specifications and system-level invariants require a different quality gate: their constraints must be testable, non-contradictory, and precise enough that a generated system can be checked against them. A constraint spec that any implementation trivially satisfies, or that no implementation could satisfy, has failed its own standard — differently but just as completely.

Design Patterns and the Shape of Interaction

Functional component specs handle the what-each-piece-does problem. But a complete system also requires specifications for how pieces relate to each other — and this is a categorically different kind of specification complexity.

The design patterns literature, beginning with the Gang of Four’s 1994 catalog, addressed exactly this.⁶ Observer, Factory, Command, Mediator, MVC — these are not functions but templates for component interaction. A vibe coder might successfully invoke certified encryption and certified database queries while assembling them in a pattern that has subtle timing or state assumptions violated, producing failures that are extremely difficult to diagnose precisely because each component in isolation appears correct.

The specification library should include pattern specifications alongside functional specs. These would specify constraints on interaction rather than deterministic behaviors — more like constraint specs than behavioral specs. A pattern spec for Observer would define what properties must hold between subject and observer regardless of how either is implemented, what failure modes arise from common violations, and what the observable behavioral differences are between correct and incorrect implementations.

Joel Spolsky, writing in 2002, formulated what he called the Law of Leaky Abstractions: all non-trivial abstractions leak.⁷ The complexity that an abstraction was designed to hide will surface at the worst possible moment, and when it does, the person using the abstraction must understand what is beneath it to handle the failure. This law doesn’t invalidate abstraction — it sets an honest ceiling on what abstraction can achieve and identifies a requirement for spec authorship. Every spec should document its known leakage points: where is this abstraction thin, what underlying complexity is most likely to surface, what should a solution builder know to diagnose failures at this level. The spec library doubles as a leakage diagnostic resource — not preventing leaks but ensuring that when they occur the knowledge needed to understand them is findable, legible, and attributed to people who can be consulted. Spolsky’s law also sets an honest bound on the solution builder / component builder distinction. Abstraction can reduce how often low-level knowledge is required — but it cannot reduce that requirement to zero. The component builder’s role becomes rarer and more specialised at each turn of the spiral; it never disappears. What the spec library provides is not the elimination of that need but the assurance that when the abstraction leaks, the knowledge required to handle it is findable rather than locked in someone’s head.

An Ecosystem, Not a System

The specification library is not a single system maintained by a single authority. It is an ecosystem — diverse, distributed, variably authoritative, and governed by reputation rather than by fiat.

This is how the software library ecosystem works, and for good reason. Some specifications will be formally certified through years of work in IETF, W3C, IEEE, or ISO. Others will be maintained by open source communities with strong reputations and transparent processes. Others will be organizational or product-specific, maintained privately. Others will be developed by individuals for specific purposes. The trust hierarchy is emergent — determined by track record, adoption, transparency of process, and response to discovered errors — rather than assigned by a central authority.

The interesting governance question is not who certifies but how you decide which certifications to trust. And we already know how to solve that problem because we solve it everywhere else. An IETF RFC carries more weight than a university research group’s spec, which carries more weight than an individual’s GitHub repository — not because a central authority enforces this hierarchy but because of accumulated reputation and transparent process. The same mechanisms that make you trust a software package with ten million downloads over one with twelve operate on specifications. The trust signal is emergent and self-correcting.

Applied to the spec library, this means that a contribution to an existing spec — the insight that forced a revision — is as citable and as permanent as the original authorship. Your two lines might appear not in the initial spec but in the update that corrected it. That attribution belongs to you no less than the original author’s belongs to them.

The attribution system also creates a reputation economy for specification knowledge that does not currently exist anywhere in software development. A developer whose track record shows accurate, widely-adopted updates to specs across multiple domains — cryptographic protocols, accessibility constraints, network error handling — has demonstrated something rare and currently invisible: the ability to reason precisely about behavioral requirements across different problem spaces. There is no credential for this today, no reputation system that captures it, no way for an organisation to find such a person or for such a person to demonstrate their value.

Attribution also expands who can earn deserved reputation. The current software ecosystem is deeply winner-take-all: the people who wrote the Linux kernel, created Python, or built React occupy a permanent reputation aristocracy that later contributors cannot challenge regardless of the quality of their subsequent work. The foundational components are taken; the opportunity to build them passed before most current developers were born. The skills required to conceive a foundational component are not the same skills required to find the subtle flaw in a widely-deployed specification that has been in production for a decade. Both are valuable. The current ecosystem rewards only the first. The spec library’s attribution model rewards both — and creates a path to deserved reputation for the analyst, the edge-case finder, and the careful reader who arrives after the foundations are already laid.

The more open a specification is, the more eyes see it, and the more quickly an error in one implementation triggers a spec update that benefits everyone. This is not a complete answer to the failure mode problem — a widely adopted incorrect spec can propagate errors at scale before the error is detected — but it is the same mitigation that open source software relies on, and it has proven reasonably effective there.

New Affordances

Every previous turn of the abstraction spiral reconstructed the same disciplines at a new level. But this turn has properties that previous turns lacked, and they produce genuinely new capabilities.

Specifications are human-readable in a way that code is not. A VBA programmer who encountered a bug in a COM component could not read the C++ source and understand it. A solution builder who encounters unexpected behavior in an LLM-generated system can read the spec that governed the generation and understand what was intended, what constraints applied, and where the leakage occurred. The abstraction is transparent to the people who use it in a way that compiled code never was.

Knowledge can be active rather than passive. Documentation has always been inert — it informed humans who then wrote code. Specifications in this framework are active inputs to the generation process. The distance between knowing a constraint and enforcing it collapses. A security requirement encoded in a system-level invariant spec is enforced in every generated system that touches its domain, automatically, without requiring any developer to remember it.

Tacit knowledge can be externalized at scale. The most dangerous knowledge in software development is the kind that lives only in experienced developers’ heads — the constraints they apply automatically without thinking about them. The framework creates institutional pressure to make that knowledge explicit and encoded, because an unapplied constraint is a visible gap in the spec rather than an invisible absence in someone’s mental model.

Intellectual contribution becomes attributable and durable. Code obscures authorship almost immediately — it gets refactored, dependencies change, languages evolve, systems are rewritten. A specification, being human-readable and implementation-independent, has a much longer half-life. The insight outlasts any particular artifact that embodies it. The people who contribute foundational insights to widely used specs will have their contributions embedded in the infrastructure of software development for generations — legible, citable, and attributed in a form that even non-engineers can find and read.

Specifications can represent legal and regulatory obligations as first-class constraints. Laws and regulations are already a form of specification — they define required behaviors, prohibited actions, and conditions of compliance. GDPR is a privacy constraint spec. HIPAA is a system-level invariant for healthcare data. PCI-DSS is a security behavioral specification for payment systems. Currently these exist as documents that developers may or may not have read, encoding requirements that LLMs have no systematic way to apply. Representing them as spec library entries — with the same versioning, conformance testing, and organizational policy machinery as any other spec — converts regulatory compliance from a manual audit process into an enforced property of generated systems. The longer-range possibility, which the human-readability of specs makes conceivable for the first time, is that legislatures might eventually adopt specification formats for original drafting: law written to be simultaneously human-readable and machine-applicable, without the translation layer that currently separates legal text from enforcement. That remains speculative, but the affordance that makes it possible — a rigorous, human-readable format that machines can also consume — is precisely what the spec library provides.

The feedback loop between practice and specification closes. Currently when a production system fails, the lesson might reach a blog post or a conference talk. Under this framework, the lesson has a natural home — an update to the relevant spec, reviewed and certified, automatically applied to all future generation using that spec. The distance between learned lesson and enforced constraint collapses dramatically.

The Spiral

In 1988, Barry Boehm published his spiral model of software development.⁸ But there is another spiral in the history of programming itself — one that turns more slowly and whose implications are still unfolding.

Each turn of this spiral introduced a new language that allowed problems to be expressed more naturally and at higher abstraction than what came before. Machine code to assembler. Assembler to high-level languages. High-level languages to object orientation. And now to natural language. At each turn, the same pattern repeated: initial euphoria, the same problems resurfacing at the new level of abstraction, and eventually the reconstruction of the same disciplines — libraries, tools, methodologies, role distinctions, institutional structures — at the new level.

And at each turn, the previous level didn’t disappear. It became the component layer for the new level, managed by specialists when needed. Assembly language didn’t vanish when C appeared. C didn’t vanish when Python appeared. Python will not vanish because natural language prompts can generate it. It will become what natural language compiles to, managed by component builders whose specialized skills become more critical precisely as they become less common.

The natural language turn is the one that was always the destination. Every previous turn was pointing here — toward the point where the specification and the program are the same artifact, where human intent and machine execution are separated by the minimum possible distance. We have not quite arrived. The gap is narrower than it has ever been, and it is still there.

What fills that gap — what has always filled it — is certified abstraction. Complexity vouchers that allow the person working at the new level to reference the fully elaborated work of the people working at the level below, without having to understand it, without having to reimplement it, and without having to trust that the LLM made good choices in the dark.

It is worth pausing to note that both Ashby and Boehm, whose work underlies this essay’s central argument, have faded from active reference in recent years in ways that are difficult to justify given the continued force of their ideas. Ashby’s Law of Requisite Variety is routinely rediscovered independently in adjacent fields — under different names, without citation, without the accumulated debt being acknowledged. Boehm’s spiral model was absorbed into software engineering curricula as a historical artifact rather than a living analytical tool, then largely displaced by Agile methodologies that inverted his intentions. Boehm was explicit that iterative development was meaningful only when driven by rigorous risk analysis at each cycle — the hard thinking had to happen, iteration was how you managed its consequences, not how you avoided it. In common practice, some Agile implementation devolved into precisely what Boehm warned against: iteration used as a substitute for understanding rather than a discipline for managing it. The result is what it must be under Ashby’s law — a random walk through implementation space that gives the appearance of forward motion while systematically avoiding the specification complexity the system actually requires. That the essay arguing for the return of specification discipline must also argue for the return of the thinkers who made that discipline rigorous is itself an instance of the problem it describes. The knowledge exists. It simply stopped being cited.

The First Step

It is worth being honest about where this framework will not help much. Exploratory work — prototyping a product idea, discovering requirements through iteration, building glue code between systems, refactoring existing code, writing tests for known behavior — does not require a spec library and would gain little from one. The framework is most valuable where the problem domain is already stable enough to admit specification: security primitives, network protocols, data serialization, accessibility constraints, regulatory compliance. In domains where the user does not yet know what they want, or where the value comes precisely from rapid informal iteration, certified abstraction is not the bottleneck. Informal iteration is not always evasion of thought; often it is how requirements are discovered. The spec library addresses one class of LLM coding failure — silent underdetermination in well-understood domains — not all of them.

None of this requires waiting for institutions to act, standards bodies to convene, or platform vendors to build new tooling.

There is in fact an existing infrastructure that demonstrates the problem and the solution simultaneously. The Model Context Protocol — MCP — allows LLMs to discover and invoke tools at runtime. An LLM’s ability to match a tool to a need is entirely dependent on the quality of the tool’s description. A poorly described tool is invisible to the LLM even when it is exactly what the task requires. A well-described tool is reliably found and correctly applied. This is the fake voucher problem in miniature and in real time: a terse description that does not cash out to a determinate specification of what the tool does, under what conditions, and with what constraints, produces the same silent probabilistic failure as an underspecified natural language prompt. MCP tool descriptions are proto-specs — they exist, they are structured, they have a recognised quality problem, and the spec library discipline applies to them directly. Improving the description of a tool you already use, applying the spec format to make its behavior, constraints, and failure modes explicit, and observing that the LLM uses it more reliably as a result: that is perhaps the most immediately actionable first step available, requiring no new infrastructure and producing measurable results within a single session.

The minimal viable version of this framework exists today. A markdown file containing a careful specification for a common function. A GitHub repository making it publicly accessible. A prompt that instructs an LLM to decompose a problem, check whether components match attached specifications, use those specs to constrain generation, and report what it couldn’t match.

That is step one. It is embarrassingly simple, which is either a sign that something important is being missed or a sign that the idea is correct. The history of consequential open source projects suggests that embarrassingly simple first steps are exactly how durable ecosystems begin. The first demonstration is simple; the ecosystem is not. Building a living library of trusted, versioned, composable specifications that practitioners actually use is a collective action problem requiring sustained authoring effort, governance, tooling, and incentive structures that do not yet exist. That work is harder than writing a markdown file and pushing it to GitHub. The point is that it starts there.

The first public spec could be written today. The first library could be created this week. The first demonstration — showing that a spec-constrained LLM prompt produces more deterministic, auditable, and trustworthy output than an unconstrained one — could be running before the end of the month.

And then, as it has always been, more would follow.

Notes and Open Questions

Notes on threads from the conversation that generated this essay, for possible future development:

[1] String search as a candidate first spec library entry: Knuth, Morris, and Pratt’s linear-time string matching algorithm provides a clean example of a commonly needed function with a precisely specifiable behavior, a reference implementation, and a well-defined conformance criterion.⁹ Word boundary detection (the “what is a word” problem) is a natural companion spec that would depend on it.

[2] The formal verification community and its relationship to this framework. The framework is complementary rather than competitive — more specs and less code to formally verify is likely welcome — but the relationship deserves explicit treatment.

[3] The spec format question: should there be a standard structure for spec library entries? Metadata (title, author, date, version, review status), behavioral description, constraint layer, conformance tests, reference implementation, leakage documentation, known limitations. Something like IETF RFC format adapted for this purpose.

[4] Spec identity and naming requires a three-tier architecture, analogous to Zuko’s triangle extended with a fourth edge for persistence. A cryptographic hash identifies one specific version of one specific document — this exact text, these exact bytes — verifiable without trusting any authority. A GUID or IPFS address identifies a spec or family of related specs sharing the same interface and observable effects, persisting across editorial improvements until a breaking interface change requires a new identifier. A tag URI (RFC 4151) provides persistent, human-readable, authority-derived identity for the entire lineage across all versions and interface revisions — the name a human cites, an organisation governs by, and a grandchild finds decades later. The three tiers map to a directory structure: the tag URI is the root directory, GUID subdirectories represent interface versions, hashed files within each subdirectory are specific editorial versions. One tag URI contains a history of GUIDs; each GUID contains a history of hashes. A complete reference carries all three: tag URI for human citation and governance, GUID for compatibility checking, hash for verification.

Colophon

This essay was developed through an extended dialogue between the author and Claude Sonnet 4.6 (Anthropic), conducted in March 2026. The author provided the conceptual framework, the historical account, the architectural insights drawn from direct experience, and the intellectual direction throughout. Claude contributed elaboration, structural organisation, connection-making across sources, and occasional extension of ideas. The text was reviewed, revised, and approved by the author.

Model: Claude Sonnet 4.6 (claude-sonnet-4-6). Interface: claude.ai. Session date: March 2026. The dialogue that generated this essay is available as part of the conversation history in the author’s Claude account.

A draft was subsequently reviewed by Google Gemini and OpenAI ChatGPT, whose critical responses sharpened several arguments. The author remains responsible for all claims.

There is a pleasant irony in the colophon itself. This essay argues that intellectual contributions should be attributed explicitly, that the provenance of ideas matters, and that knowledge which is made visible and citable outlasts knowledge that is not. The colophon is an instance of that argument — a record of how this particular artifact came to exist, made available to whoever cares to read it.

Footnotes

1 W. Ross Ashby, An Introduction to Cybernetics, Chapman & Hall, London, 1956. The Law of Requisite Variety is developed in Part Three, which Ashby identifies as the “central theme” of cybernetics. The law is stated formally in Chapter 11. A digitized edition is freely available via the Internet Archive: archive.org/details/introductiontocy00ashb

2 Grace Hopper’s work on COBOL and her concept of the executable specification are documented in: Jean E. Sammet, “The Early History of COBOL,” ACM SIGPLAN Notices, Vol. 13, No. 8, August 1978, pp. 121–161. Hopper’s own account of her motivations is given in numerous interviews; see also: Kathleen Broome Williams, Grace Hopper: Admiral of the Cyber Sea, Naval Institute Press, 2004.

3 The RSA cryptographic standard is specified in: “PKCS #1: RSA Cryptography Specifications Version 2.2,” RFC 8017, Internet Engineering Task Force, November 2016. Available at: tools.ietf.org/html/rfc8017. RFC 8017 obsoletes RFC 3447 (2003) and RFC 2437 (1998), illustrating precisely the versioned, immutable specification model described in this essay.

4 Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1994. The four authors became known collectively as the “Gang of Four,” and the book’s catalog of 23 patterns remains the foundational reference for object-oriented design patterns.

5 Joel Spolsky, “The Law of Leaky Abstractions,” Joel on Software, November 11, 2002. Available at: joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/. Spolsky states the law as: “All non-trivial abstractions, to some degree, are leaky.” The essay remains one of the most widely cited in software engineering practice.

6 Barry W. Boehm, “A Spiral Model of Software Development and Enhancement,” IEEE Computer, Vol. 21, No. 5, May 1988, pp. 61–72. DOI: 10.1109/2.59. The spiral described in this essay is distinct from Boehm’s: his spiral addresses iterative risk management within a single development project; ours describes the historical succession of abstraction levels across the entire history of programming. The resonance of the metaphor is intentional. It is worth noting that Boehm himself warned explicitly against what he called “hazardous spiral look-alikes” — processes that adopted the iterative form of the spiral while discarding the risk analysis that gave it meaning. Common Agile practice, in which iterative delivery substitutes for upfront specification rather than managing the consequences of careful analysis, falls squarely within Boehm’s category of dangerous impostors. The random walk through implementation space that results is not a failure of iteration as a technique; it is a failure to apply the specification discipline that makes iteration converge rather than merely explore.

7 Donald E. Knuth, James H. Morris, Jr., and Vaughan R. Pratt, “Fast Pattern Matching in Strings,” SIAM Journal on Computing, Vol. 6, No. 2, June 1977, pp. 323–350. The KMP algorithm achieves O(n+m) string matching versus the O(nm) naive approach — a difference that becomes significant at scale and that a vibe-coded implementation would silently get wrong. The algorithm is also discussed in Knuth’s The Art of Computer Programming, Volume 2. Knuth’s planned Volume 5 (Syntactical Algorithms) will cover lexical scanning and word boundary problems more generally; it remains in preparation as of 2026.

8 The ‘tag’ URI scheme is specified in: Tim Kindberg and Sean Sayer, “The ‘tag’ URI Scheme,” RFC 4151, Internet Engineering Task Force, October 2005. Available at: tools.ietf.org/html/rfc4151. A tag URI mints a globally unique, persistent identifier from a domain name or email address controlled by the minting authority at a specific date, plus a locally assigned string. Unlike URLs, tag URIs make no claim about the location of the identified resource, providing persistence independent of hosting infrastructure.

9 Temple F. Smith and Michael S. Waterman, “Identification of Common Molecular Subsequences,” Journal of Molecular Biology, Vol. 147, No. 1, 1981, pp. 195–197. The Smith–Waterman algorithm performs local sequence alignment across nucleic acid and protein sequences using dynamic programming, achieving guaranteed optimal local alignment. It was subsequently extended into the BLAST family of tools (Altschul et al., “Basic Local Alignment Search Tool,” Journal of Molecular Biology, Vol. 215, 1990, pp. 403–410), which trades exhaustive optimality for the speed required to search large sequence databases. The lineage from KMP to Smith–Waterman to BLAST is a demonstration in the biological sciences of exactly the domain-transcendence property the essay attributes to well-abstracted specifications.

As I May Think

Discussion about this post

Ready for more?