abelcastro.dev

The Problem With Splitting Human and Agent Docs

2026-04-20

software-engineeringdocumentationdeveloper-experience

Spec-Driven Development Is Solving the Wrong Problem

2026-04-14

spec-driven-developmentsoftware-engineeringarchitecture

What Should a Spec Actually Contain? My Exploration of Spec-Driven Development

2026-04-04

spec-driven-developmentsoftware-engineeringarchitecture
1
23
...
1011

Abel Castro 2026 - checkout the source code of this page on GitHub - Privacy Policy

The short version. Sort docs by content type, not by audience. One README per meaningful module, read by both humans and agents. Automated checks go in tooling. Judgment calls go in a small conventions file. Contracts live in code. Per-task intent lives in the ticket. Agent-only rules exist but stay rare. Specs exist but only for features that cross modules and stakeholders. That's the whole playbook. Most of it is standard advice, the wrinkle is how the agent-specific pieces fit in without taking over. This is the follow-up to a previous post where I argued that spec-driven development is solving the wrong problem and that agent context should live where agents naturally look. Given that an AI agent is now one of the readers, what should the documentation that does exist actually look like? The rest of this post is the answer, piece by piece, and what goes wrong when you split by audience instead.

Sort by content type, not by audience

The common instinct is to split docs by audience. Architecture docs for humans, CLAUDE.md or AGENTS.md for agents, keep them in their lanes. I've tried this and it falls apart fast: the two docs drift, they contradict each other, and I end up maintaining two versions of the same content.

The better cut is by content type. The same system gets documented in multiple ways depending on what you're communicating. Imagine an orders module in some backend app:

  • The explanation of what the orders module does and why.
  • The constraint that only the auth module can verify tokens.
  • The CreateOrderDto that defines what a valid order looks like.

Three kinds of content, three different natural homes:

  • a README
  • a conventions file or a lint rule
  • the code itself

Once content types are sorted, the audience question mostly disappears. A module README that describes what the module does doesn't need a twin anywhere. Both readers want the same information.

Here's the split:

What each module does and why goes in a README, co-located with the code. The README also covers dependencies and what patterns to follow inside the module. I'll use "module context" as shorthand for this in the tree below. Both humans and agents read the same file.

Constraints split into two kinds. The ones a linter or type checker can enforce belong in tooling: ESLint rules, ruff configs, strict TypeScript, pre-commit hooks, CI checks. These run automatically and fail the build, which is stronger than any prose could be. The ones that can't be automated, usually judgment calls and architectural patterns, go in a short conventions file. An example of each: ESLint can enforce "no any," but it can't enforce "business logic lives in services, not in controllers." The first belongs in .eslintrc. The second belongs in docs/conventions.md.

Navigation goes in a small index like CLAUDE.md. Its job is to point at the READMEs and the conventions doc, not to re-describe content that lives elsewhere. If the repo structure is clear enough, this file can be thin or skipped entirely.

Code contracts already live in the code. TypeScript interfaces, Django models, OpenAPI schemas, typed function signatures. These are machine-readable and accurate by construction. Writing a prose version of them in a README creates two sources of truth and one of them will drift.

Per-task intent goes in the ticket, not in the repo. A GitHub issue or Jira ticket is where "what are we trying to do right now" lives. Copying it into a markdown file just adds something else to keep in sync.

What this looks like in practice

A monorepo with a NestJS API and a Next.js web app, structured around these categories:

my-monorepo/
β”‚
β”œβ”€β”€ README.md                          # Navigation: what this is, how to run it, where to go next
β”œβ”€β”€ CLAUDE.md                          # Navigation: index pointing at READMEs and conventions
β”‚
β”œβ”€β”€ .eslintrc.js                       # Tooling: automated checks (no-any, import rules)
β”œβ”€β”€ .prettierrc                        # Tooling: formatting checks
β”œβ”€β”€ .husky/                            # Tooling: pre-commit hooks
β”‚   └── pre-commit
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── ci.yml                     # Tooling: lint + type-check + test gates
β”‚
β”œβ”€β”€ .claude/                           # Agent rules (only when they earn their place)
β”‚   └── rules/
β”‚       β”œβ”€β”€ no-secrets-in-code.md      # Rule: scoped to the whole repo, high-stakes
β”‚       └── auth-boundary.md           # Rule: enforces service boundary the linter can't catch
β”‚
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ architecture.md                # System overview: diagram, boundaries, how modules fit together
β”‚   β”œβ”€β”€ conventions.md                 # Constraint (judgment-based): architectural patterns, testing philosophy
β”‚   └── specs/                         # Exception: one spec per cross-cutting feature
β”‚       └── place-an-order.md          # Example: feature touching UI, API, payments, inventory
β”‚
β”œβ”€β”€ apps/
β”‚   β”‚
β”‚   β”œβ”€β”€ api/                           # NestJS app
β”‚   β”‚   β”œβ”€β”€ README.md                  # Module context: what this app does, how it's shaped
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”‚
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ main.ts
β”‚   β”‚       β”‚
β”‚   β”‚       β”œβ”€β”€ orders/
β”‚   β”‚       β”‚   β”œβ”€β”€ README.md          # Module context: orders module purpose, deps, patterns
β”‚   β”‚       β”‚   β”œβ”€β”€ orders.controller.ts
β”‚   β”‚       β”‚   β”œβ”€β”€ orders.service.ts
β”‚   β”‚       β”‚   β”œβ”€β”€ orders.service.spec.ts
β”‚   β”‚       β”‚   └── dto/
β”‚   β”‚       β”‚       β”œβ”€β”€ create-order.dto.ts     # Contract: input shape lives in the code
β”‚   β”‚       β”‚       └── order-response.dto.ts   # Contract: output shape lives in the code
β”‚   β”‚       β”‚
β”‚   β”‚       β”œβ”€β”€ payments/
β”‚   β”‚       β”‚   β”œβ”€β”€ README.md          # Module context: payments module, Stripe integration notes
β”‚   β”‚       β”‚   β”œβ”€β”€ payments.service.ts
β”‚   β”‚       β”‚   └── ...
β”‚   β”‚       β”‚
β”‚   β”‚       └── auth/
β”‚   β”‚           β”œβ”€β”€ README.md          # Module context: auth module, token flow, guard usage
β”‚   β”‚           └── ...
β”‚   β”‚
β”‚   └── web/                           # Next.js app
β”‚       β”œβ”€β”€ README.md                  # Module context: what this app does, routing notes
β”‚       β”œβ”€β”€ ...
β”‚       β”‚
β”‚       └── app/
β”‚           β”œβ”€β”€ checkout/
β”‚           β”‚   β”œβ”€β”€ README.md          # Module context: checkout flow, states, key components
β”‚           β”‚   β”œβ”€β”€ page.tsx
β”‚           β”‚   └── ...
β”‚           β”‚
β”‚           └── account/
β”‚               β”œβ”€β”€ README.md          # Module context: account section purpose and structure
β”‚               └── ...
β”‚
└── packages/
    β”‚
    β”œβ”€β”€ shared-types/
    β”‚   β”œβ”€β”€ README.md                  # Module context: what types live here and why
    β”‚   └── src/
    β”‚       β”œβ”€β”€ order.ts               # Contract: shared type definitions
    β”‚       └── ...
    β”‚
    β”œβ”€β”€ ui/
    β”‚   β”œβ”€β”€ README.md                  # Module context: component library purpose, patterns
    β”‚   └── ...
    β”‚
    └── eslint-config/
        β”œβ”€β”€ README.md                  # Module context: what this config enforces
        └── index.js                   # Tooling: shared lint rules for the monorepo

A few things worth pointing out. Not every folder gets a README, only meaningful modules and packages. CLAUDE.md is pure navigation, it doesn't re-describe the architecture. DTOs and shared types are contracts in code, not prose. .claude/rules/ has exactly two files, not a sprawl, and each spec in docs/specs/ is there because it can't be scoped to a module or ticket. That's not accidental.

If a tool can enforce it, let it

The best constraint is one that fails the build. ESLint catches bad imports. TypeScript's strict mode catches any. ruff and Black handle formatting. pre-commit hooks catch common mistakes before they get committed. When tooling can enforce a constraint, writing it in prose is strictly worse: slower feedback, relies on someone reading the doc, and drifts away from the code over time.

This has a second benefit for working with agents. A constraint the build enforces applies every time, including when the agent writes code. A constraint that lives only in prose applies when the agent reads and remembers it, which is less reliable. If it matters, making it automated makes it real.

The conventions file ends up smaller than you'd think when tooling handles the automatable checks. What's left is the stuff that genuinely needs judgment: architectural patterns, testing philosophy, workflows that span multiple tools. Those still benefit from being written once and read by both humans and agents, but the list is short.

Why prose isn't the default answer

Even for content that stays in prose, the instinct to write more because "the agent needs full context" pushes in the wrong direction. Prose has real costs for agents that I didn't appreciate at first.

Token cost is real and compounds. A 500 line README burns tokens on every agent session that loads it. Across a team running many sessions a day, that adds up to a cost that shows up on the bill.

More context isn't always better context. When a constraint is buried in narrative prose, the agent has more surface area to get distracted by. From what I've seen, shorter and more structured tends to produce more reliable behavior than thorough and narrative.

Over-specification invites over-production. If the README lists fifteen edge cases because I wanted to be thorough, the agent may write code for all fifteen even when the task only needs three. That's slop, caused by the doc, not the agent.

There's no feedback loop for overexplaining. A doc that's too vague shows up quickly: the agent produces wrong output, tests fail, or the reviewer pushes back. A doc that's too long has no equivalent signal. The agent still produces working output, just after chewing through more context than it needed. Nothing fails, so nothing tells you the doc got bloated. The feedback is asymmetric, and the natural drift is toward more.

The practical implication is that writing for both humans and agents doesn't mean writing more. It means writing clearly and keeping each doc as short as it can be while still being explicit.

When agent-only rules earn their place

Rules in .claude/rules/ are the one place where agent-only content is genuinely the right answer. But they're easy to overuse, and when they grow unchecked they create hidden behavior: the agent follows a directive from a rule file the human never opened, and when the output surprises you, the reason is scattered somewhere you don't think to look.

This reminds me of Django signals. A signal can fire from code you didn't write, triggered by an action you took somewhere else. Useful, but it surprises you when something goes wrong, because the behavior doesn't live where you're looking.

So rules earn their place when three things are true:

  • tooling can't enforce the same constraint
  • a README wouldn't reach it because the rule applies across modules
  • the constraint is specific and stable

If any of those fails, the content belongs in tooling, in a README, or in conventions.md instead. That's why the example tree has two rule files, not twenty.

When a spec earns its place

Specs in docs/specs/ are the home for features that don't fit the other homes. A module README is scoped to one module. A ticket is scoped to one unit of work. But some features don't sit inside a single module or a single ticket, and for those a spec is the right place.

The shape that usually needs one is a feature that crosses several modules and several stakeholders at the same time. "Place an order" is the example I keep coming back to. It touches the UI, the API, the payments module, inventory, and notifications. A Jira ticket can describe what the user wants at a high level, but it can't capture cross-module behavior, edge cases, and the alignment between product, design, and engineering that has to happen before anyone writes code. A single spec document gives everyone one place to converge.

The real work with docs/specs/ is figuring out the scope of each file. A spec earns its place when three things are true:

  • the feature spans enough modules that no single README can describe it
  • stakeholders need to align on behavior and edge cases before implementation
  • the behavior is stable enough to be worth writing down, not a quick experiment

If any of those fails, the content belongs in the ticket, in a module README, or in the code. Getting that scoping call right is what keeps the specs folder useful instead of turning it into a second documentation layer that drifts from the code.

Closing

If I am starting a repo, the playbook above is my default. If I am working in a repo where CLAUDE.md and the human docs have already drifted apart, the migration is less scary than it looks. Picking one module, merging its agent-facing and human-facing docs into a single README, deleting what the tooling already enforces, moving cross-module constraints to docs/conventions.md, and seeing what's left. Probably it's a lot less than I started with.

Write less, put each thing where it belongs, and most of the drift and duplication just goes away.

Bonus: ai-first docs skill

I've packaged this as a Claude skill if you want to try it on an existing repo: ai-first-docs.

Spec-driven development is having its moment. Tools like Kiro, GitHub's spec-kit, and OpenSpec all promise the same thing: write structured specs before coding, let AI agents implement against them, and finally bring order to the chaos of vibe coding. The pitch is compelling. The tooling is growing fast. OpenSpec has nearly 40k GitHub stars.

But the more I look at these tools, the more I think they're repeating a pattern our industry keeps falling into: reaching for more structure and more artifacts when the real problem is putting knowledge in the right place.

The overhead problem

Birgitta BΓΆckeler from Thoughtworks tested Kiro on a small bug fix. The tool generated 4 user stories with 16 acceptance criteria for something that should have been a quick fix. She described it as using a sledgehammer to crack a nut. With spec-kit she found the opposite problem: too many verbose markdown files to review, repetitive with each other and with existing code. Her conclusion was direct - she'd rather review code than review all those markdown files.

This is the core tension. SDD tools create a parallel documentation layer that needs to stay in sync with the code. We have seen this fail before with Javadoc that drifted from implementations, Swagger specs that diverged from actual API behavior, and architecture diagrams that became snapshots of how things were designed rather than how they work. Adding more markdown files does not solve the "documentation goes stale" problem. It multiplies it.

Martin Fowler himself raised a pointed concern: SDD encodes the assumption that you won't learn anything during implementation that would change the specification. That assumption has been proven wrong by decades of software engineering. Requirements change. Understanding deepens. What seemed right in the spec often needs adjustment once you see the code running.

Where the knowledge actually belongs

My issue with SDD is not the principle of thinking before building. That part is valuable and I practice it myself. My issue is with where SDD tools put the knowledge.

AI agents already have natural places to get project context. In my experience, the question is not "should I write specs?" but "where should each piece of knowledge live so the agent finds it without extra ceremony?"

Project-level agent files like CLAUDE.md or AGENTS.md are always loaded at the start of every session. This is where project-wide conventions belong: what libraries you use, how components are structured, what patterns to follow, how tests are organized. The agent reads this once and applies it to every task. Zero per-task overhead.

Rules files shape how the agent works. Always write tests. Never use any in TypeScript. Use European number formatting. These are behavioral constraints, lighter than project context but always active.

Skills are task-specific playbooks that get loaded only when relevant. A skill for "building a new chart component" or "writing an API endpoint" encodes your team's patterns for that specific type of work. The agent reads the skill when it needs it, not on every session. This is an underexplored pattern that I think will grow.

The codebase itself is the most honest source of truth. Existing components, tests, naming conventions, and patterns. A good agent can infer how things are built by reading what already exists. Your best chart component is the spec for how chart components are built in your project.

Issue trackers and design tools hold the per-task intent. A Jira ticket describes what needs to happen. A Figma file shows what it should look like. Both are accessible to agents via MCP without duplicating their content into markdown.

Technical contracts like OpenAPI specs, GraphQL schemas, and TypeScript interfaces are already machine-readable and already live in the codebase. They don't need a prose summary in a spec file.

When you add all of this up, the agent has: project context from CLAUDE.md, behavioral constraints from rules, task-specific patterns from skills, reference implementations from the codebase, intent from the issue tracker, visuals from Figma, and contracts from typed schemas. A spec markdown file is only needed when none of these sources adequately capture the complexity of what you're building.

When specs still earn their keep

I am not arguing against specs entirely. There are situations where writing a dedicated spec document is the right call:

Large, cross-cutting features where multiple stakeholders need to align on behavior and edge cases before anyone writes code. A "place an order" workflow that touches UI, API, payments, inventory, and notifications genuinely benefits from a written behavioral contract.

Features with non-obvious edge cases that the agent would otherwise invent inconsistently. What happens when payment fails mid-checkout? When inventory changed between cart and submission? When the user double-clicks submit? These details need to be written down somewhere, and a spec is a reasonable place.

New team patterns that don't have a reference implementation yet. If you're building your first chart component, a spec helps the agent understand what you want. Once that first component exists, it becomes the reference and future components need less specification.

But bug fixes? Small features? Refactors where the existing code tells the story? A spec is overhead that slows you down without adding value. The agent has your rules, your patterns, and your codebase. Let it work.

The selective approach

My vision is that specs should be opt-in, not the default workflow. Most daily development work should be covered by a well-maintained set of agent rules, skills, and a clean codebase with good patterns. You reach for a spec document when the complexity of a feature exceeds what those sources can communicate.

The decision boundary is simple: if you can describe the change in a ticket and the agent can implement it correctly using project rules and existing patterns, you don't need a spec. If the change involves multiple systems, stakeholder alignment, or non-obvious edge cases, write one. But keep it lightweight: behavioral intent and edge cases, not implementation instructions.

This also means the effort should go into maintaining your agent configuration (CLAUDE.md, rules, skills) rather than maintaining per-feature spec files. A well-maintained CLAUDE.md with clear conventions and reference implementations will produce better results across hundreds of tasks than a detailed spec will produce for one.

Structure over specs

Software engineering has a recurring habit of reaching for new artifacts when things feel chaotic. The instinct is understandable, but the answer is rarely "more documents." It's usually "better-placed knowledge." SDD tools formalize that instinct into a workflow, and for most daily work the overhead is not worth it.

The teams that will navigate this well are the ones who figure out early that agent context should live where agents naturally look: in project memory, rules, skills, and the codebase itself. Not in a spec folder that requires its own maintenance lifecycle.

After writing about monorepos and AI-first development, I kept coming back to one topic: specs. In that post I mentioned that specs benefit from living alongside the code in a monorepo. But the harder question was staring at me: what should a spec actually contain? And does anyone agree on the answer?

I spent time exploring spec-driven development, looking at real-world specs, and examining what standards exist. This post is what I found. It's not a definitive guide - it's a snapshot of my understanding as I worked through the problem.

What is spec-driven development?

The idea is straightforward. Before you write code, you write a spec that describes what the system should do. The spec becomes the source of truth. Code gets written to satisfy it, tests verify it, and any deviation between spec and implementation is a bug.

This is not new. OpenAPI lets you define a REST API contract before writing a single handler. GraphQL has its Schema Definition Language. JSON Schema describes data shapes. These are all forms of spec-driven development at the technical layer.

What makes the approach more interesting now is AI. When you give an agent a well-written spec, it has a clear target. It can generate the implementation, produce tests that verify conformance, and flag when its output drifts from what you described. Without a spec, the agent is guessing at your intent based on a brief prompt, and you are left reading code to figure out whether it guessed right.

So far I was sold on the principle. Then I started looking at how specs actually look in practice.

When I looked at real specs, something felt off

A common pattern I found: a spec for a UI component runs 200+ lines with detailed requirements, data models, acceptance criteria, and implementation instructions including specific CSS techniques and code patterns to follow.

These specs are solid work. But examining them raised a question I hadn't considered before: who is this for?

The behavioral part - what the component does, how it looks on hover, what the tooltips contain, what the data model looks like - is useful for any stakeholder. A product owner could read it and say "yes, that's what I want." A developer or agent could use it as a clear target.

The implementation part - which CSS techniques to use for specific visual effects, which existing component to use as a reference, which formatting library to call - is useful only for whoever is building it right now. A product owner doesn't care about CSS clip-paths. And six months from now, those implementation hints might not even be accurate anymore.

This made me realize that many specs mix two things that serve different audiences: the behavioral contract (what) and the implementation guide (how). The first should be lasting and stakeholder-readable. The second should be treated as disposable context for the current task.

The standards I found (and the gap I didn't expect)

At the technical contract level, the standards are solid. OpenAPI for REST, AsyncAPI for event-driven systems, GraphQL SDL for query APIs, JSON Schema for data validation. If you need to describe a single technical layer, you have good options.

But when I think about what it takes to spec out a full feature - "users need to be able to place an order" - none of these standards cover the whole picture. That feature touches the UI, the API, payment processing, inventory checks, email notifications, authentication, error handling, and storage. Each of those might have its own standard or contract format, but nothing ties them together into one coherent spec for the feature as a whole.

In my experience, teams fill this gap with whatever works for them: user stories with acceptance criteria, design docs, ADRs, sequence diagrams, or some combination of all of these.

What I think a spec should contain

Based on this exploration, I landed on three things a good feature-level spec needs.

First, the behavioral contract. Given these preconditions, when this action happens, then these outcomes occur. This is the core and it should be technology-agnostic. It shouldn't mention React or PostgreSQL. It should describe what the system does from the user's and the business's perspective. "When a user submits an order with valid payment, the order is created, inventory is reserved, and a confirmation email is sent within 30 seconds." That's a behavioral contract.

Second, the boundary definitions. Which systems are involved, what each one is responsible for, and what the contracts between them look like. This is where you reference your OpenAPI spec for the API layer, your event schema for async communication, and your auth model. The feature spec references these technical contracts but doesn't duplicate them.

Third, the error and edge case catalog. What can go wrong and what should happen when it does. What happens when payment fails? When inventory changed between cart and checkout? When the session expires mid-flow? When the user double-clicks submit? This is the part teams skip most often, and it's the part that matters most. Models will happily generate a perfect happy path and invent wildly inconsistent error handling if you don't specify it.

What the spec should not define is how to build it. File structure, framework choice, specific libraries, database table layouts: those are implementation decisions that belong somewhere else.

Where the "how" should live instead

This is where my exploration took an unexpected turn. If the implementation guidance doesn't belong in the spec, where does it go?

The answer I landed on: it belongs in the agent's own configuration. Project-level files like CLAUDE.md carry the conventions. Rules files carry the constraints. Skills carry task-specific patterns. The codebase itself carries reference implementations. The agent doesn't need a spec to tell it "use library X for charts" - that should be in the project's CLAUDE.md. The agent doesn't need a spec to tell it "follow the existing chart component pattern" - it should discover that by reading the existing code, guided by a skill that says "when building a new component, look at existing components of the same type first."

This realization shifted my entire perspective. If most of the "how" knowledge lives in agent configuration and the codebase, and the behavioral spec should be lightweight (intent, boundaries, edge cases), then the heavy spec documents that SDD tools encourage might be solving a problem that doesn't need to exist.

This reminds me of something

The more I looked at SDD workflows, the more familiar the shape felt. Kiro walks you through requirements -> design -> tasks -> implementation. spec-kit follows specify -> plan -> tasks. OpenSpec goes propose -> apply -> archive. Each phase produces documents that feed the next phase.

That sequence should ring a bell. It's the same shape as waterfall: gather requirements -> design -> implement -> test -> deploy. The tools are softer about it than classic waterfall - you can go back and edit artifacts, iteration is allowed. But the default workflow still pushes you through a linear pipeline of document production before any code is written.

Martin Fowler raised this concern directly: SDD encodes the assumption that you won't learn anything during implementation that would change the specification. That assumption has been proven wrong by decades of software engineering. The industry spent two decades moving from waterfall toward iterative, feedback-driven approaches. SDD is reintroducing a document-heavy, plan-everything-upfront workflow, just with AI generating the code instead of developers. The medium changed but the process shape didn't.

To be fair, the waterfall comparison has limits. SDD tools do allow iteration, and writing requirements before coding is not waterfall - it's just planning. Agile teams write user stories before sprinting. TDD practitioners write tests before code. The question is whether the planning artifact is rigid or adaptable, and whether the feedback loop between planning and implementation is tight or broken. But when a tool generates 4 user stories with 16 acceptance criteria for a bug fix - as Birgitta Boeckeler from Thoughtworks experienced with Kiro - it is hard not to see the same process bloat that made waterfall collapse under its own weight.

Why the enthusiasm might be misleading

There is another angle that made me uneasy. AI removed the production cost of documentation. Writing specs, tests, and design docs used to be manual, slow, and often deprioritized under delivery pressure. Now an agent can generate all of this in seconds. That's genuinely useful. But when production becomes free, people tend to overproduce.

This is a known pattern. When storage became cheap, people stopped deleting files. When bandwidth became cheap, websites became bloated. When spec production became cheap via AI, developers generate more spec artifacts than anyone can meaningfully review. The cost signal that used to constrain documentation volume - the effort of writing it - has been removed, and nothing has replaced it. Previously, writing a spec by hand was a natural filter: you only did it when the feature was complex enough to justify the effort. Now that the effort is nearly zero, the filter is gone.

There are developers for whom SDD is genuinely filling a gap - people who always knew they should plan more carefully but couldn't justify the time. For them, the current tools are a real step forward. But there's a difference between "I can now produce specs when they're needed" and "I should produce specs for everything." The risk is when the former becomes the latter, and maximum ceremony becomes the default regardless of task size.

Where I landed

I came away from this exploration with a few conclusions.

The principle of thinking before building is valuable. Writing down behavioral intent and edge cases before coding genuinely helps, especially for complex features. This isn't controversial and I don't think anyone disagrees.

The right scope for a spec is a user-facing capability, not a technical layer. "Place an order" is a spec. "The orders API" is not a spec, it's a technical contract that serves multiple specs.

The spec should stay lightweight. Behavioral intent, boundary definitions, and edge cases. Not implementation instructions. Not 200 lines of markdown.

The implementation knowledge that makes agents effective should live where agents naturally look: project memory, rules, skills, and the codebase. Not in per-feature spec files that create a maintenance burden.

And the real skill we need to develop is not "how to write more specs" but "how to know when a spec adds value and when it's overhead." That calibration - matching the level of ceremony to the complexity of the task - is what current SDD tools get wrong. They default to maximum ceremony regardless of context. I'll dig into that in a follow-up post .