Spec-Driven Development Is Solving the Wrong Problem
2026-04-14
What Should a Spec Actually Contain? My Exploration of Spec-Driven Development
2026-04-04
Monorepos, Microservices, and the Architecture Pendulum
2026-04-03
2026-04-14
2026-04-04
2026-04-03
Spec-driven development is having its moment. Tools like Kiro, GitHub's spec-kit, and OpenSpec all promise the same thing: write structured specs before coding, let AI agents implement against them, and finally bring order to the chaos of vibe coding. The pitch is compelling. The tooling is growing fast. OpenSpec has nearly 40k GitHub stars.
But the more I look at these tools, the more I think they're repeating a pattern our industry keeps falling into: reaching for more structure and more artifacts when the real problem is putting knowledge in the right place.
Birgitta Böckeler from Thoughtworks tested Kiro on a small bug fix. The tool generated 4 user stories with 16 acceptance criteria for something that should have been a quick fix. She described it as using a sledgehammer to crack a nut. With spec-kit she found the opposite problem: too many verbose markdown files to review, repetitive with each other and with existing code. Her conclusion was direct - she'd rather review code than review all those markdown files.
This is the core tension. SDD tools create a parallel documentation layer that needs to stay in sync with the code. We have seen this fail before with Javadoc that drifted from implementations, Swagger specs that diverged from actual API behavior, and architecture diagrams that became snapshots of how things were designed rather than how they work. Adding more markdown files does not solve the "documentation goes stale" problem. It multiplies it.
Martin Fowler himself raised a pointed concern: SDD encodes the assumption that you won't learn anything during implementation that would change the specification. That assumption has been proven wrong by decades of software engineering. Requirements change. Understanding deepens. What seemed right in the spec often needs adjustment once you see the code running.
My issue with SDD is not the principle of thinking before building. That part is valuable and I practice it myself. My issue is with where SDD tools put the knowledge.
AI agents already have natural places to get project context. In my experience, the question is not "should I write specs?" but "where should each piece of knowledge live so the agent finds it without extra ceremony?"
Project-level agent files like CLAUDE.md or AGENTS.md are always loaded at the start of every session. This is where project-wide conventions belong: what libraries you use, how components are structured, what patterns to follow, how tests are organized. The agent reads this once and applies it to every task. Zero per-task overhead.
Rules files shape how the agent works. Always write tests. Never use any in TypeScript. Use European number formatting. These are behavioral constraints, lighter than project context but always active.
Skills are task-specific playbooks that get loaded only when relevant. A skill for "building a new chart component" or "writing an API endpoint" encodes your team's patterns for that specific type of work. The agent reads the skill when it needs it, not on every session. This is an underexplored pattern that I think will grow.
The codebase itself is the most honest source of truth. Existing components, tests, naming conventions, and patterns. A good agent can infer how things are built by reading what already exists. Your best chart component is the spec for how chart components are built in your project.
Issue trackers and design tools hold the per-task intent. A Jira ticket describes what needs to happen. A Figma file shows what it should look like. Both are accessible to agents via MCP without duplicating their content into markdown.
Technical contracts like OpenAPI specs, GraphQL schemas, and TypeScript interfaces are already machine-readable and already live in the codebase. They don't need a prose summary in a spec file.
When you add all of this up, the agent has: project context from CLAUDE.md, behavioral constraints from rules, task-specific patterns from skills, reference implementations from the codebase, intent from the issue tracker, visuals from Figma, and contracts from typed schemas. A spec markdown file is only needed when none of these sources adequately capture the complexity of what you're building.
I am not arguing against specs entirely. There are situations where writing a dedicated spec document is the right call:
Large, cross-cutting features where multiple stakeholders need to align on behavior and edge cases before anyone writes code. A "place an order" workflow that touches UI, API, payments, inventory, and notifications genuinely benefits from a written behavioral contract.
Features with non-obvious edge cases that the agent would otherwise invent inconsistently. What happens when payment fails mid-checkout? When inventory changed between cart and submission? When the user double-clicks submit? These details need to be written down somewhere, and a spec is a reasonable place.
New team patterns that don't have a reference implementation yet. If you're building your first chart component, a spec helps the agent understand what you want. Once that first component exists, it becomes the reference and future components need less specification.
But bug fixes? Small features? Refactors where the existing code tells the story? A spec is overhead that slows you down without adding value. The agent has your rules, your patterns, and your codebase. Let it work.
My vision is that specs should be opt-in, not the default workflow. Most daily development work should be covered by a well-maintained set of agent rules, skills, and a clean codebase with good patterns. You reach for a spec document when the complexity of a feature exceeds what those sources can communicate.
The decision boundary is simple: if you can describe the change in a ticket and the agent can implement it correctly using project rules and existing patterns, you don't need a spec. If the change involves multiple systems, stakeholder alignment, or non-obvious edge cases, write one. But keep it lightweight: behavioral intent and edge cases, not implementation instructions.
This also means the effort should go into maintaining your agent configuration (CLAUDE.md, rules, skills) rather than maintaining per-feature spec files. A well-maintained CLAUDE.md with clear conventions and reference implementations will produce better results across hundreds of tasks than a detailed spec will produce for one.
Software engineering has a recurring habit of reaching for new artifacts when things feel chaotic. The instinct is understandable, but the answer is rarely "more documents." It's usually "better-placed knowledge." SDD tools formalize that instinct into a workflow, and for most daily work the overhead is not worth it.
The teams that will navigate this well are the ones who figure out early that agent context should live where agents naturally look: in project memory, rules, skills, and the codebase itself. Not in a spec folder that requires its own maintenance lifecycle.
After writing about monorepos and AI-first development, I kept coming back to one topic: specs. In that post I mentioned that specs benefit from living alongside the code in a monorepo. But the harder question was staring at me: what should a spec actually contain? And does anyone agree on the answer?
I spent time exploring spec-driven development, looking at real-world specs, and examining what standards exist. This post is what I found. It's not a definitive guide - it's a snapshot of my understanding as I worked through the problem.
The idea is straightforward. Before you write code, you write a spec that describes what the system should do. The spec becomes the source of truth. Code gets written to satisfy it, tests verify it, and any deviation between spec and implementation is a bug.
This is not new. OpenAPI lets you define a REST API contract before writing a single handler. GraphQL has its Schema Definition Language. JSON Schema describes data shapes. These are all forms of spec-driven development at the technical layer.
What makes the approach more interesting now is AI. When you give an agent a well-written spec, it has a clear target. It can generate the implementation, produce tests that verify conformance, and flag when its output drifts from what you described. Without a spec, the agent is guessing at your intent based on a brief prompt, and you are left reading code to figure out whether it guessed right.
So far I was sold on the principle. Then I started looking at how specs actually look in practice.
A common pattern I found: a spec for a UI component runs 200+ lines with detailed requirements, data models, acceptance criteria, and implementation instructions including specific CSS techniques and code patterns to follow.
These specs are solid work. But examining them raised a question I hadn't considered before: who is this for?
The behavioral part - what the component does, how it looks on hover, what the tooltips contain, what the data model looks like - is useful for any stakeholder. A product owner could read it and say "yes, that's what I want." A developer or agent could use it as a clear target.
The implementation part - which CSS techniques to use for specific visual effects, which existing component to use as a reference, which formatting library to call - is useful only for whoever is building it right now. A product owner doesn't care about CSS clip-paths. And six months from now, those implementation hints might not even be accurate anymore.
This made me realize that many specs mix two things that serve different audiences: the behavioral contract (what) and the implementation guide (how). The first should be lasting and stakeholder-readable. The second should be treated as disposable context for the current task.
At the technical contract level, the standards are solid. OpenAPI for REST, AsyncAPI for event-driven systems, GraphQL SDL for query APIs, JSON Schema for data validation. If you need to describe a single technical layer, you have good options.
But when I think about what it takes to spec out a full feature - "users need to be able to place an order" - none of these standards cover the whole picture. That feature touches the UI, the API, payment processing, inventory checks, email notifications, authentication, error handling, and storage. Each of those might have its own standard or contract format, but nothing ties them together into one coherent spec for the feature as a whole.
In my experience, teams fill this gap with whatever works for them: user stories with acceptance criteria, design docs, ADRs, sequence diagrams, or some combination of all of these. I'm not claiming I've seen everything the industry has to offer here, but I haven't come across a widely adopted standard that answers the question "how do I spec a complete user-facing feature from end to end?" If it exists, it hasn't reached mainstream adoption yet.
Based on this exploration, I landed on three things a good feature-level spec needs.
First, the behavioral contract. Given these preconditions, when this action happens, then these outcomes occur. This is the core and it should be technology-agnostic. It shouldn't mention React or PostgreSQL. It should describe what the system does from the user's and the business's perspective. "When a user submits an order with valid payment, the order is created, inventory is reserved, and a confirmation email is sent within 30 seconds." That's a behavioral contract.
Second, the boundary definitions. Which systems are involved, what each one is responsible for, and what the contracts between them look like. This is where you reference your OpenAPI spec for the API layer, your event schema for async communication, and your auth model. The feature spec references these technical contracts but doesn't duplicate them.
Third, the error and edge case catalog. What can go wrong and what should happen when it does. What happens when payment fails? When inventory changed between cart and checkout? When the session expires mid-flow? When the user double-clicks submit? This is the part teams skip most often, and it's the part that matters most. Models will happily generate a perfect happy path and invent wildly inconsistent error handling if you don't specify it.
What the spec should not define is how to build it. File structure, framework choice, specific libraries, database table layouts: those are implementation decisions that belong somewhere else.
This is where my exploration took an unexpected turn. If the implementation guidance doesn't belong in the spec, where does it go?
The answer I landed on: it belongs in the agent's own configuration. Project-level files like CLAUDE.md carry the conventions. Rules files carry the constraints. Skills carry task-specific patterns. The codebase itself carries reference implementations. The agent doesn't need a spec to tell it "use library X for charts" - that should be in the project's CLAUDE.md. The agent doesn't need a spec to tell it "follow the existing chart component pattern" - it should discover that by reading the existing code, guided by a skill that says "when building a new component, look at existing components of the same type first."
This realization shifted my entire perspective. If most of the "how" knowledge lives in agent configuration and the codebase, and the behavioral spec should be lightweight (intent, boundaries, edge cases), then the heavy spec documents that SDD tools encourage might be solving a problem that doesn't need to exist.
The more I looked at SDD workflows, the more familiar the shape felt. Kiro walks you through requirements -> design -> tasks -> implementation. spec-kit follows specify -> plan -> tasks. OpenSpec goes propose -> apply -> archive. Each phase produces documents that feed the next phase.
That sequence should ring a bell. It's the same shape as waterfall: gather requirements -> design -> implement -> test -> deploy. The tools are softer about it than classic waterfall - you can go back and edit artifacts, iteration is allowed. But the default workflow still pushes you through a linear pipeline of document production before any code is written.
Martin Fowler raised this concern directly: SDD encodes the assumption that you won't learn anything during implementation that would change the specification. That assumption has been proven wrong by decades of software engineering. The industry spent two decades moving from waterfall toward iterative, feedback-driven approaches. SDD is reintroducing a document-heavy, plan-everything-upfront workflow, just with AI generating the code instead of developers. The medium changed but the process shape didn't.
To be fair, the waterfall comparison has limits. SDD tools do allow iteration, and writing requirements before coding is not waterfall - it's just planning. Agile teams write user stories before sprinting. TDD practitioners write tests before code. The question is whether the planning artifact is rigid or adaptable, and whether the feedback loop between planning and implementation is tight or broken. But when a tool generates 4 user stories with 16 acceptance criteria for a bug fix - as Birgitta Boeckeler from Thoughtworks experienced with Kiro - it is hard not to see the same process bloat that made waterfall collapse under its own weight.
There is another angle that made me uneasy. AI removed the production cost of documentation. Writing specs, tests, and design docs used to be manual, slow, and often deprioritized under delivery pressure. Now an agent can generate all of this in seconds. That's genuinely useful. But when production becomes free, people tend to overproduce.
This is a known pattern. When storage became cheap, people stopped deleting files. When bandwidth became cheap, websites became bloated. When spec production became cheap via AI, developers generate more spec artifacts than anyone can meaningfully review. The cost signal that used to constrain documentation volume - the effort of writing it - has been removed, and nothing has replaced it. Previously, writing a spec by hand was a natural filter: you only did it when the feature was complex enough to justify the effort. Now that the effort is nearly zero, the filter is gone.
There are developers for whom SDD is genuinely filling a gap - people who always knew they should plan more carefully but couldn't justify the time. For them, the current tools are a real step forward. But there's a difference between "I can now produce specs when they're needed" and "I should produce specs for everything." The risk is when the former becomes the latter, and maximum ceremony becomes the default regardless of task size.
I came away from this exploration with a few conclusions.
The principle of thinking before building is valuable. Writing down behavioral intent and edge cases before coding genuinely helps, especially for complex features. This isn't controversial and I don't think anyone disagrees.
The right scope for a spec is a user-facing capability, not a technical layer. "Place an order" is a spec. "The orders API" is not a spec, it's a technical contract that serves multiple specs.
The spec should stay lightweight. Behavioral intent, boundary definitions, and edge cases. Not implementation instructions. Not 200 lines of markdown.
The implementation knowledge that makes agents effective should live where agents naturally look: project memory, rules, skills, and the codebase. Not in per-feature spec files that create a maintenance burden.
And the real skill we need to develop as an industry is not "how to write more specs" but "how to know when a spec adds value and when it's overhead." That calibration - matching the level of ceremony to the complexity of the task - is what current SDD tools get wrong. They default to maximum ceremony regardless of context, and I think that will age the same way the early microservices enthusiasm did. I'll dig into that in a follow-up post .
Around 2014-2017, microservices were the only architecture anyone wanted to talk about. Netflix, Spotify, and Amazon were the poster children, and the message the industry absorbed was clear: monoliths are legacy, microservices are modern. Conference talks, blog posts, and hiring trends all reinforced that framing. If you were building a monolith, you were doing it wrong.
Then reality caught up.
Teams discovered the hidden costs of distributed systems. Distributed tracing became a nightmare, network latency between services added up, data consistency across service boundaries was painful, and the operational overhead of managing dozens of repos with independent CI pipelines burned through engineering time. People like Kelsey Hightower and even Sam Newman (who wrote the book on microservices) started cautioning against premature decomposition. Martin Fowler's team put it simply: don't start with microservices, earn them.
The pattern became common enough to get its own narrative: monolith to microservices and back to monolith. In 2023, Amazon Prime Video published a case study where they moved back from microservices to a monolithic architecture for a video monitoring workload and cut costs by 90%. That made waves precisely because it came from Amazon itself.
The industry didn't swing all the way back though. The current consensus is more nuanced: start with a well-structured modular monolith, extract services only when you have a clear operational or scaling reason, and keep your architecture decisions tied to your actual problems rather than to conference hype.
One source of confusion is the word "monorepo" getting treated as a synonym for monolith. They are fundamentally different concepts.
A monolith is an architectural and deployment pattern. One deployable unit, one runtime, tightly coupled code.
A monorepo is a source code management strategy. One repository containing multiple projects, libraries, or even independent services.
These are orthogonal decisions. You can mix and match them freely, and all four combinations are valid: one repo with one deployable unit is the classic monolith setup most small teams start with. One repo with many deployable services is the Google-style monorepo with microservices. Many repos with many services is the "pure" microservices approach that was popular around 2015. And many repos with a monolithic deployment is possible too, though it is rare and usually painful.
The confusion happens because "mono-" appears in both words and because the back-to-simplicity energy applies to both dimensions at the same time. Teams burned by 50 repos with 50 CI pipelines are consolidating into monorepos while teams burned by distributed complexity are consolidating toward monolithic architectures. The motivations overlap (reducing accidental complexity) but the decisions are independent.
Here is where things get interesting. AI coding agents are fundamentally context-dependent, and monorepos maximize the context surface available without friction.
When an agent like Claude Code needs to implement a feature that touches both your API and your frontend, having both in the same repo means it can trace the contract end-to-end: the API endpoint definition, the shared types, the frontend call, the error handling. It does not need you to manually explain what the other repo looks like.
A few specific areas where this matters:
Type and contract coherence. In a monorepo with shared types (a /packages/shared folder, for example), the agent can see that changing an API response shape requires updating the frontend consumer. Across repos, that connection is invisible unless you explicitly describe it.
Refactoring scope. Renaming a field, deprecating an endpoint, or changing an auth flow are cross-cutting concerns. An agent in a monorepo can search, understand, and modify everything atomically. Across repos it becomes two separate sessions with you acting as the bridge.
Testing context. Integration tests that verify frontend-backend interaction live naturally in a monorepo. The agent can run them, see what breaks, and fix both sides in one pass.
Monorepos are not free. CI complexity scales with repo size. A change in a shared utility triggers builds and tests across every project in the repo. Google solved this with Bazel and massive infrastructure investment, but most teams are not Google. The tooling tax (Nx, Turborepo, build caching, affected-project detection) is real and adds its own layer of complexity. AI agents do not make your CI pipeline faster.
Context windows are finite too. A massive monorepo with 500k lines is going to hit token limits regardless of how well-structured it is. What matters more than repo structure is organization within the repo: clear module boundaries, good naming, and well-organized directories. A chaotic monorepo can actually be worse for an agent than two clean, well-documented repos with a shared API spec.
Conway's Law has not been repealed. Architecture follows organizational structure, not tooling. If two teams own two services with different deployment lifecycles, forcing them into one repo for AI convenience creates human coordination problems. AI optimizes the coding phase, but software delivery is still a people problem.
Cross-repo tooling is catching up too. MCP servers can index your API schema, your frontend types, and your deployment config across repos and give an agent functionally equivalent context to a monorepo. In some cases the context is arguably better because it is curated rather than "here is everything, figure it out." Think of it like a database index versus a full table scan.
The likely future is that repo structure becomes less important as tooling abstracts it away. The agent of 2028 probably will not care whether your code lives in one repo or five. It will have indexed access to all of it, understand the dependency graph, and operate across boundaries seamlessly. The monorepo advantage we feel today is real, but it might be a symptom of tooling immaturity rather than a fundamental architectural truth.
There is another monorepo win that gets less attention: the agent's own configuration lives alongside the code it operates on.
In an AI-first workflow, your project carries more than just source code. It has agent rules (CLAUDE.md, .cursorrules, AGENTS.md) that define project conventions and constraints. It has skills that encode task-specific playbooks, like how to build a chart component or how to write a migration in your project. It has architecture decision records that capture why the project uses certain libraries or patterns. And it has the codebase itself, which serves as a living reference for how things are actually built.
In a monorepo, all of this is available to the agent in a single context. The agent reads the rules, understands the conventions, looks at existing components for reference patterns, and implements the new feature following the established approach. There is no gap between "what the agent knows about the project" and "what the project actually looks like." The rules file says "use MUI X Charts Pro for all chart components," and three directories over the agent can see exactly how the team has used that library before.
In a multi-repo setup, this knowledge gets fragmented. Each repo might have its own rules file, but the project-wide conventions live... where? A shared wiki? A Confluence page the agent cannot read? The senior developer's head? The monorepo keeps the entire knowledge surface in one place: code, conventions, patterns, and agent configuration. A new agent session (or a new team member) can orient themselves from a single starting point.
This matters more than it sounds. The quality of an agent's output is directly proportional to the quality of the context it has. A well-maintained CLAUDE.md in a monorepo with clear conventions and reference implementations will produce better results than a detailed prompt in a repo where the agent has no project knowledge to draw from.
If you are using spec-driven development, having your specs live in the same repo as the implementation is another natural win. The agent can validate conformance in real time, see drift between the spec and the code, and keep both in sync without you acting as the bridge.
That said, there is a growing conversation about whether the current wave of SDD tooling is worth the overhead it introduces. Tools like Kiro, spec-kit, and OpenSpec all propose structured spec workflows, but some practitioners are finding that the maintenance burden of all those markdown files creates its own problems. I have been exploring this topic and I am not entirely convinced the hype around SDD will age well. It reminds me of the early microservices enthusiasm: a good idea applied universally without enough regard for when it actually helps and when it just adds overhead. I will dig into that in a future post.