abelcastro.dev

What Should a Spec Actually Contain? My Exploration of Spec-Driven Development | abelcastro.dev

After writing about monorepos and AI-first development, I kept coming back to one topic: specs. In that post I mentioned that specs benefit from living alongside the code in a monorepo. But the harder question was staring at me: what should a spec actually contain? And does anyone agree on the answer?

I spent time exploring spec-driven development, looking at real-world specs, and examining what standards exist. This post is what I found. It's not a definitive guide - it's a snapshot of my understanding as I worked through the problem.

What is spec-driven development?

The idea is straightforward. Before you write code, you write a spec that describes what the system should do. The spec becomes the source of truth. Code gets written to satisfy it, tests verify it, and any deviation between spec and implementation is a bug.

This is not new. OpenAPI lets you define a REST API contract before writing a single handler. GraphQL has its Schema Definition Language. JSON Schema describes data shapes. These are all forms of spec-driven development at the technical layer.

What makes the approach more interesting now is AI. When you give an agent a well-written spec, it has a clear target. It can generate the implementation, produce tests that verify conformance, and flag when its output drifts from what you described. Without a spec, the agent is guessing at your intent based on a brief prompt, and you are left reading code to figure out whether it guessed right.

So far I was sold on the principle. Then I started looking at how specs actually look in practice.

When I looked at real specs, something felt off

A common pattern I found: a spec for a UI component runs 200+ lines with detailed requirements, data models, acceptance criteria, and implementation instructions including specific CSS techniques and code patterns to follow.

These specs are solid work. But examining them raised a question I hadn't considered before: who is this for?

The behavioral part - what the component does, how it looks on hover, what the tooltips contain, what the data model looks like - is useful for any stakeholder. A product owner could read it and say "yes, that's what I want." A developer or agent could use it as a clear target.

The implementation part - which CSS techniques to use for specific visual effects, which existing component to use as a reference, which formatting library to call - is useful only for whoever is building it right now. A product owner doesn't care about CSS clip-paths. And six months from now, those implementation hints might not even be accurate anymore.

This made me realize that many specs mix two things that serve different audiences: the behavioral contract (what) and the implementation guide (how). The first should be lasting and stakeholder-readable. The second should be treated as disposable context for the current task.

The standards I found (and the gap I didn't expect)

At the technical contract level, the standards are solid. OpenAPI for REST, AsyncAPI for event-driven systems, GraphQL SDL for query APIs, JSON Schema for data validation. If you need to describe a single technical layer, you have good options.

But when I think about what it takes to spec out a full feature - "users need to be able to place an order" - none of these standards cover the whole picture. That feature touches the UI, the API, payment processing, inventory checks, email notifications, authentication, error handling, and storage. Each of those might have its own standard or contract format, but nothing ties them together into one coherent spec for the feature as a whole.

In my experience, teams fill this gap with whatever works for them: user stories with acceptance criteria, design docs, ADRs, sequence diagrams, or some combination of all of these.

What I think a spec should contain

Based on this exploration, I landed on three things a good feature-level spec needs.

First, the behavioral contract. Given these preconditions, when this action happens, then these outcomes occur. This is the core and it should be technology-agnostic. It shouldn't mention React or PostgreSQL. It should describe what the system does from the user's and the business's perspective. "When a user submits an order with valid payment, the order is created, inventory is reserved, and a confirmation email is sent within 30 seconds." That's a behavioral contract.

Second, the boundary definitions. Which systems are involved, what each one is responsible for, and what the contracts between them look like. This is where you reference your OpenAPI spec for the API layer, your event schema for async communication, and your auth model. The feature spec references these technical contracts but doesn't duplicate them.

Third, the error and edge case catalog. What can go wrong and what should happen when it does. What happens when payment fails? When inventory changed between cart and checkout? When the session expires mid-flow? When the user double-clicks submit? This is the part teams skip most often, and it's the part that matters most. Models will happily generate a perfect happy path and invent wildly inconsistent error handling if you don't specify it.

What the spec should not define is how to build it. File structure, framework choice, specific libraries, database table layouts: those are implementation decisions that belong somewhere else.

Where the "how" should live instead

This is where my exploration took an unexpected turn. If the implementation guidance doesn't belong in the spec, where does it go?

The answer I landed on: it belongs in the agent's own configuration. Project-level files like CLAUDE.md carry the conventions. Rules files carry the constraints. Skills carry task-specific patterns. The codebase itself carries reference implementations. The agent doesn't need a spec to tell it "use library X for charts" - that should be in the project's CLAUDE.md. The agent doesn't need a spec to tell it "follow the existing chart component pattern" - it should discover that by reading the existing code, guided by a skill that says "when building a new component, look at existing components of the same type first."

This realization shifted my entire perspective. If most of the "how" knowledge lives in agent configuration and the codebase, and the behavioral spec should be lightweight (intent, boundaries, edge cases), then the heavy spec documents that SDD tools encourage might be solving a problem that doesn't need to exist.

This reminds me of something

The more I looked at SDD workflows, the more familiar the shape felt. Kiro walks you through requirements -> design -> tasks -> implementation. spec-kit follows specify -> plan -> tasks. OpenSpec goes propose -> apply -> archive. Each phase produces documents that feed the next phase.

That sequence should ring a bell. It's the same shape as waterfall: gather requirements -> design -> implement -> test -> deploy. The tools are softer about it than classic waterfall - you can go back and edit artifacts, iteration is allowed. But the default workflow still pushes you through a linear pipeline of document production before any code is written.

Martin Fowler raised this concern directly: SDD encodes the assumption that you won't learn anything during implementation that would change the specification. That assumption has been proven wrong by decades of software engineering. The industry spent two decades moving from waterfall toward iterative, feedback-driven approaches. SDD is reintroducing a document-heavy, plan-everything-upfront workflow, just with AI generating the code instead of developers. The medium changed but the process shape didn't.

To be fair, the waterfall comparison has limits. SDD tools do allow iteration, and writing requirements before coding is not waterfall - it's just planning. Agile teams write user stories before sprinting. TDD practitioners write tests before code. The question is whether the planning artifact is rigid or adaptable, and whether the feedback loop between planning and implementation is tight or broken. But when a tool generates 4 user stories with 16 acceptance criteria for a bug fix - as Birgitta Boeckeler from Thoughtworks experienced with Kiro - it is hard not to see the same process bloat that made waterfall collapse under its own weight.

Why the enthusiasm might be misleading

There is another angle that made me uneasy. AI removed the production cost of documentation. Writing specs, tests, and design docs used to be manual, slow, and often deprioritized under delivery pressure. Now an agent can generate all of this in seconds. That's genuinely useful. But when production becomes free, people tend to overproduce.

This is a known pattern. When storage became cheap, people stopped deleting files. When bandwidth became cheap, websites became bloated. When spec production became cheap via AI, developers generate more spec artifacts than anyone can meaningfully review. The cost signal that used to constrain documentation volume - the effort of writing it - has been removed, and nothing has replaced it. Previously, writing a spec by hand was a natural filter: you only did it when the feature was complex enough to justify the effort. Now that the effort is nearly zero, the filter is gone.

There are developers for whom SDD is genuinely filling a gap - people who always knew they should plan more carefully but couldn't justify the time. For them, the current tools are a real step forward. But there's a difference between "I can now produce specs when they're needed" and "I should produce specs for everything." The risk is when the former becomes the latter, and maximum ceremony becomes the default regardless of task size.

Where I landed

I came away from this exploration with a few conclusions.

The principle of thinking before building is valuable. Writing down behavioral intent and edge cases before coding genuinely helps, especially for complex features. This isn't controversial and I don't think anyone disagrees.

The right scope for a spec is a user-facing capability, not a technical layer. "Place an order" is a spec. "The orders API" is not a spec, it's a technical contract that serves multiple specs.

The spec should stay lightweight. Behavioral intent, boundary definitions, and edge cases. Not implementation instructions. Not 200 lines of markdown.

The implementation knowledge that makes agents effective should live where agents naturally look: project memory, rules, skills, and the codebase. Not in per-feature spec files that create a maintenance burden.

And the real skill we need to develop is not "how to write more specs" but "how to know when a spec adds value and when it's overhead." That calibration - matching the level of ceremony to the complexity of the task - is what current SDD tools get wrong. They default to maximum ceremony regardless of context. I'll dig into that in a follow-up post .