abelcastro.dev

Customer Journeys Belong Next to Your Code

2026-05-13

software-engineeringdocumentationcustomer-journeys

The Problem With Splitting Human and Agent Docs

2026-04-20

software-engineeringdocumentationdeveloper-experience

Spec-Driven Development Is Solving the Wrong Problem

2026-04-14

spec-driven-developmentsoftware-engineeringarchitecture
1
2
3
...
1112

Abel Castro 2026 - checkout the source code of this page on GitHub - Privacy Policy

I am building TheBest.Ink alone. The whole product vision lives in my head. Even with that, I was getting overwhelmed holding all the scenarios in mind. Who claims what. Which path triggers moderation. When an artist gets marked as listed.

So I opened docs/customer-journeys.md and started writing user journeys as prose, with a Mermaid flowchart for each one. The first time I rendered them, I had a small, sharp moment: "oh, that is what this project does." The charts gave the project a shape I could not see from the code alone. It felt like seeing the project's soul. Not in a mystical way, but as the actual sequence of promises the product makes to users.

This post is about that moment, and the artifact it produced. It follows two earlier posts of mine, one arguing that spec-driven development solves the wrong problem, the other arguing that human and agent docs should be the same docs. This is the doc category neither of those named.

I do not think every customer journey in every company must live next to the code. That would be too broad. Some journeys belong in product discovery, design tools, analytics dashboards, or support processes. But for journeys that define how the system should behave end-to-end, especially in a small product team or an AI-assisted codebase, I think there is a strong case for keeping them in the repo.

What the artifact looks like

The file sits at docs/customer-journeys.md, one section per journey. Each section is a few paragraphs of prose, a Mermaid flowchart, and status markers (NYI for "not yet implemented", NI for "needs improvement", CTA for "call to action", NC for "needs clarification").

The artist discovery and booking flow looks like this:

Five journeys live in that file today: artist discovery, studio discovery, review, claiming, and new artist or studio creation. Each one starts with "what is the user trying to do here, end-to-end" and ends with the path through the system that gets them there.

A simplified journey section could look like this:

### J-ARTIST-CLAIM-01: Artist claims an existing profile

Status: NI

Intent:
An artist should be able to prove ownership of an existing profile without blocking legitimate claims too early.

Key guarantees:
- Unclaimed profiles show a claim CTA.
- Suspicious claims are not hard-blocked immediately.
- Claims with weak evidence enter moderation.
- Verified claims unlock profile editing.

Coverage:
- E2E: artist-claim.spec.ts
- Integration: claim-moderation.service.spec.ts

That format is still experimental, but the idea is important: the journey should not only describe a flow. It should also describe the product intent and the behavioral guarantees that matter.

What this actually is

The doc types most repos have do not cover this.

  • ADRs say why a decision was made.
  • PRDs say what was going to be built, before it was built. They are pre-build, stakeholder-facing, and they go stale fast after launch.
  • READMEs say how a module works, in local scope.

None of them answer "what is the user actually trying to accomplish, end-to-end, across modules?" That is the gap.

I want to be honest about the prior art, because there is a lot of it. User journey mapping has been a UX practice for years. Use cases go back to Ivar Jacobson. User story mapping comes from Jeff Patton. BDD and specification by example come from Dan North and Gojko Adzic. Domain storytelling is from Stefan Hofer and Henning Schwentner. Living documentation is from Cyrille Martraire. The building blocks are old.

What I am doing sits closest to domain storytelling plus living documentation plus BDD without the Gherkin syntax. The difference is small but real. None of the prior art was written with AI agents as a first-class reader. The shape of the artifact changes when the reader is not only a human PM or a developer on the team, but also an agent that will read three or five files in a cold-start context and try to do something useful.

That is the only thing I claim is new here. The format, the in-repo location, and the freshness mechanism follow from that one shift in audience.

Why agents specifically benefit

When I think about what agents struggle with on my repo, three things come up over and over.

Cold-start orientation. An agent opens three to five files at the start of a task and needs to know what the project does. A top-level README gives it the elevator pitch. A module README gives it local scope. Neither tells it what the user is actually trying to accomplish across modules. The journey doc does. On TheBest.Ink, the difference between "search for artists" and "search for artists, view a card, fall back to an external channel, or follow the booking path" is the difference between a code tour and a usable orientation.

Cross-module shape. Agents follow code paths inside a file well. They lose the thread when a flow crosses three modules and a background job. A flowchart in prose closes that gap. The agent sees the whole flow before it reads any of the code, so it knows which file fits where. The artist claim flow on TheBest.Ink touches the artist module, the studio module, the email module, the Instagram OAuth callback, and an LLM moderation step. No single file shows that. The journey does.

Intent disambiguation. "Should this case block the user or just warn them?" is a question I cannot always answer by reading the code. The journey can answer it, because intent is the whole point of the journey doc. When an artist tries to claim a profile and the email domain does not match the studio website, should we hard-block, throttle, or fall back to Instagram OAuth? The code could branch either way and both are technically reasonable. The journey tells the agent which branch matches the product intent.

That last point is the most important one. Code tells the agent what exists. The journey tells the agent what should happen.

There is a real caveat here, and I want to keep it visible. When the journey drifts from the code, the agent believes the wrong thing more confidently than if no doc had existed at all. A wrong map is worse than no map. That is exactly why the freshness story matters.

The ownership problem

There is another caveat that matters once this leaves a solo project: ownership.

In my case, the answer is easy. I own the code, the product, the docs, and the tests. In a team, that is not always true. If the journey lives in Git, engineering becomes the default gatekeeper. That can be good because the doc sits close to the implementation, but it can also exclude the people who understand parts of the journey best: product, design, support, QA, or customer success.

So the question is not only "where should this document live?" It is also "who is responsible for keeping it true?"

A repo-based journey doc needs an owner. Maybe that is the engineer changing the flow. Maybe it is a product engineer. Maybe it is product and engineering together in PR review. But it cannot be an orphaned artifact. If nobody owns the journey, putting it next to the code only makes it stale in a more official-looking place.

The freshness problem

Documents that describe behavior go stale. That is the oldest problem in technical writing. PRDs go stale because they are pre-build, stakeholder-facing, and nobody reads them after launch. Module READMEs survive because they sit next to the code, and they break in code review when someone changes the public API.

Customer journeys need the same kind of forcing function. My first idea was a 1:1 mapping between each journey and an e2e test. If the artist discovery journey says "unclaimed profile shows the claim dialog, no booking section", there should be a Playwright test with that exact assertion. If the test breaks, either the code is wrong or the journey is wrong. Either way, someone has to look at both.

But the 1:1 idea is probably too simple.

A journey and an e2e test are not naturally the same size. One journey can include branches, emails, background jobs, external services, moderation, retries, and admin decisions. One Playwright test should not necessarily cover all of that. Otherwise the tests become slow, brittle, and hard to debug.

A better version is this: each journey should contain behavioral claims, and the important claims should point to automated coverage where practical.

That coverage can be different depending on the claim:

  • Some claims belong in Playwright e2e tests.
  • Some belong in integration tests.
  • Some belong in unit tests around domain rules.
  • Some are manual review points until the product stabilizes.

The journey is not the test. The journey is the spine that connects intent, implementation, and coverage.

This is the part where BDD deserves an honest comparison. A Gherkin .feature file IS the test. Cucumber parses the prose, runs the step definitions, and the build breaks the moment the scenario drifts from the code. The freshness mechanism is automatic and free.

I dropped Gherkin because prose plus a flowchart reads better to both humans and agents than Given / When / Then does, and a flowchart shows branching that linear Gherkin cannot. That trade is deliberate, and I want to be clear about what it cost. I kept the BDD goal of keeping prose and code in sync. I gave up the BDD mechanism that did it for free.

Free prose and Mermaid are more expressive, but they are also more dangerous. They invite narrative drift. Someone can write a beautiful journey that no system actually follows.

So what I am really choosing is readability and system shape over executability. That gives me a better orientation artifact, but a weaker verification mechanism.

What I am left with is a manual mapping between journey claims and automated tests, plus a CI check I have not built yet.

What journeys should not replace

Customer journeys are useful, but they are not enough by themselves.

They do not replace ADRs. A journey can show that a claim goes through moderation, but an ADR can explain why moderation exists and why it was designed that way.

They do not replace domain rules. A journey can show that a suspicious claim is not hard-blocked, but a domain document or test should define what "suspicious" means.

They do not replace analytics, support knowledge, permission models, legal constraints, or operational runbooks.

The journey is the spine. ADRs explain major decisions. Tests verify behavior. Domain docs define rules. The journey links them together.

That is the role I want this document to play.

What I am still figuring out

A few things are not solved, and I think it is more useful to name them than to pretend they are.

File structure at scale. One customer-journeys.md works for five journeys. It will not work for fifty. Do I split by domain (docs/journeys/artists/, docs/journeys/studios/), or keep one file and use anchor links? I do not know what breaks first.

Stable IDs. If a test references journey 3.2 step 4 and I reorder the journeys, the reference rots. I probably need stable IDs per journey and per step, like J-ARTIST-DISCOVERY-04. I have not committed to a scheme yet.

Ownership in teams. In a solo project, I can update the journey when I change the feature. In a team, that responsibility needs to be explicit. Otherwise the file becomes one more abandoned doc with better syntax highlighting.

The limit of automatic checks. A CI check can verify that a journey file and a test file change in the same PR. That is coupling, and a machine can enforce it. It cannot verify that the new Mermaid arrow actually corresponds to the new test assertion. That is semantic alignment, and it needs a human who understands intent. There is no way to close that gap with tooling, at least not reliably. Every doc that is not executable has the same gap. ADRs have it. Module READMEs have it. Conventions files have it. I accept human review as the alignment mechanism for those. Journeys are the same.

So the honest version of the question is not "can I close the semantic gap?" It is "is the gap small enough that code review can hold it?" For five journeys, yes. For fifty, I am not sure.

For now I am running with one file, five journeys, prose plus Mermaid, and a loose intent to map each important journey claim to automated coverage where practical. The cognitive payoff alone, after almost a year of typing the code, has already paid for the writing. Whether the artifact stays useful at fifty journeys, or quietly drifts into another wrong map, is the part I do not yet know.

Some product knowledge is too important to live only in people's heads, tickets, or stale PRDs. For flows that define how the system should behave end-to-end, a repo-based customer journey document can act as shared context for humans, tests, and AI agents.

But it only works if it has ownership, stable IDs, and a review mechanism that keeps it close to reality.

The short version. Sort docs by content type, not by audience. One README per meaningful module, read by both humans and agents. Automated checks go in tooling. Judgment calls go in a small conventions file. Contracts live in code. Per-task intent lives in the ticket. Agent-only rules exist but stay rare. Specs exist but only for features that cross modules and stakeholders. That's the whole playbook. Most of it is standard advice, the wrinkle is how the agent-specific pieces fit in without taking over. This is the follow-up to a previous post where I argued that spec-driven development is solving the wrong problem and that agent context should live where agents naturally look. Given that an AI agent is now one of the readers, what should the documentation that does exist actually look like? The rest of this post is the answer, piece by piece, and what goes wrong when you split by audience instead.

Sort by content type, not by audience

The common instinct is to split docs by audience. Architecture docs for humans, CLAUDE.md or AGENTS.md for agents, keep them in their lanes. I've tried this and it falls apart fast: the two docs drift, they contradict each other, and I end up maintaining two versions of the same content.

The better cut is by content type. The same system gets documented in multiple ways depending on what you're communicating. Imagine an orders module in some backend app:

  • The explanation of what the orders module does and why.
  • The constraint that only the auth module can verify tokens.
  • The CreateOrderDto that defines what a valid order looks like.

Three kinds of content, three different natural homes:

  • a README
  • a conventions file or a lint rule
  • the code itself

Once content types are sorted, the audience question mostly disappears. A module README that describes what the module does doesn't need a twin anywhere. Both readers want the same information.

Here's the split:

What each module does and why goes in a README, co-located with the code. The README also covers dependencies and what patterns to follow inside the module. I'll use "module context" as shorthand for this in the tree below. Both humans and agents read the same file.

Constraints split into two kinds. The ones a linter or type checker can enforce belong in tooling: ESLint rules, ruff configs, strict TypeScript, pre-commit hooks, CI checks. These run automatically and fail the build, which is stronger than any prose could be. The ones that can't be automated, usually judgment calls and architectural patterns, go in a short conventions file. An example of each: ESLint can enforce "no any," but it can't enforce "business logic lives in services, not in controllers." The first belongs in .eslintrc. The second belongs in docs/conventions.md.

Navigation goes in a small index like CLAUDE.md. Its job is to point at the READMEs and the conventions doc, not to re-describe content that lives elsewhere. If the repo structure is clear enough, this file can be thin or skipped entirely.

Code contracts already live in the code. TypeScript interfaces, Django models, OpenAPI schemas, typed function signatures. These are machine-readable and accurate by construction. Writing a prose version of them in a README creates two sources of truth and one of them will drift.

Per-task intent goes in the ticket, not in the repo. A GitHub issue or Jira ticket is where "what are we trying to do right now" lives. Copying it into a markdown file just adds something else to keep in sync.

What this looks like in practice

A monorepo with a NestJS API and a Next.js web app, structured around these categories:

my-monorepo/
β”‚
β”œβ”€β”€ README.md                          # Navigation: what this is, how to run it, where to go next
β”œβ”€β”€ CLAUDE.md                          # Navigation: index pointing at READMEs and conventions
β”‚
β”œβ”€β”€ .eslintrc.js                       # Tooling: automated checks (no-any, import rules)
β”œβ”€β”€ .prettierrc                        # Tooling: formatting checks
β”œβ”€β”€ .husky/                            # Tooling: pre-commit hooks
β”‚   └── pre-commit
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── ci.yml                     # Tooling: lint + type-check + test gates
β”‚
β”œβ”€β”€ .claude/                           # Agent rules (only when they earn their place)
β”‚   └── rules/
β”‚       β”œβ”€β”€ no-secrets-in-code.md      # Rule: scoped to the whole repo, high-stakes
β”‚       └── auth-boundary.md           # Rule: enforces service boundary the linter can't catch
β”‚
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ architecture.md                # System overview: diagram, boundaries, how modules fit together
β”‚   β”œβ”€β”€ conventions.md                 # Constraint (judgment-based): architectural patterns, testing philosophy
β”‚   └── specs/                         # Exception: one spec per cross-cutting feature
β”‚       └── place-an-order.md          # Example: feature touching UI, API, payments, inventory
β”‚
β”œβ”€β”€ apps/
β”‚   β”‚
β”‚   β”œβ”€β”€ api/                           # NestJS app
β”‚   β”‚   β”œβ”€β”€ README.md                  # Module context: what this app does, how it's shaped
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”‚
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ main.ts
β”‚   β”‚       β”‚
β”‚   β”‚       β”œβ”€β”€ orders/
β”‚   β”‚       β”‚   β”œβ”€β”€ README.md          # Module context: orders module purpose, deps, patterns
β”‚   β”‚       β”‚   β”œβ”€β”€ orders.controller.ts
β”‚   β”‚       β”‚   β”œβ”€β”€ orders.service.ts
β”‚   β”‚       β”‚   β”œβ”€β”€ orders.service.spec.ts
β”‚   β”‚       β”‚   └── dto/
β”‚   β”‚       β”‚       β”œβ”€β”€ create-order.dto.ts     # Contract: input shape lives in the code
β”‚   β”‚       β”‚       └── order-response.dto.ts   # Contract: output shape lives in the code
β”‚   β”‚       β”‚
β”‚   β”‚       β”œβ”€β”€ payments/
β”‚   β”‚       β”‚   β”œβ”€β”€ README.md          # Module context: payments module, Stripe integration notes
β”‚   β”‚       β”‚   β”œβ”€β”€ payments.service.ts
β”‚   β”‚       β”‚   └── ...
β”‚   β”‚       β”‚
β”‚   β”‚       └── auth/
β”‚   β”‚           β”œβ”€β”€ README.md          # Module context: auth module, token flow, guard usage
β”‚   β”‚           └── ...
β”‚   β”‚
β”‚   └── web/                           # Next.js app
β”‚       β”œβ”€β”€ README.md                  # Module context: what this app does, routing notes
β”‚       β”œβ”€β”€ ...
β”‚       β”‚
β”‚       └── app/
β”‚           β”œβ”€β”€ checkout/
β”‚           β”‚   β”œβ”€β”€ README.md          # Module context: checkout flow, states, key components
β”‚           β”‚   β”œβ”€β”€ page.tsx
β”‚           β”‚   └── ...
β”‚           β”‚
β”‚           └── account/
β”‚               β”œβ”€β”€ README.md          # Module context: account section purpose and structure
β”‚               └── ...
β”‚
└── packages/
    β”‚
    β”œβ”€β”€ shared-types/
    β”‚   β”œβ”€β”€ README.md                  # Module context: what types live here and why
    β”‚   └── src/
    β”‚       β”œβ”€β”€ order.ts               # Contract: shared type definitions
    β”‚       └── ...
    β”‚
    β”œβ”€β”€ ui/
    β”‚   β”œβ”€β”€ README.md                  # Module context: component library purpose, patterns
    β”‚   └── ...
    β”‚
    └── eslint-config/
        β”œβ”€β”€ README.md                  # Module context: what this config enforces
        └── index.js                   # Tooling: shared lint rules for the monorepo

A few things worth pointing out. Not every folder gets a README, only meaningful modules and packages. CLAUDE.md is pure navigation, it doesn't re-describe the architecture. DTOs and shared types are contracts in code, not prose. .claude/rules/ has exactly two files, not a sprawl, and each spec in docs/specs/ is there because it can't be scoped to a module or ticket. That's not accidental.

If a tool can enforce it, let it

The best constraint is one that fails the build. ESLint catches bad imports. TypeScript's strict mode catches any. ruff and Black handle formatting. pre-commit hooks catch common mistakes before they get committed. When tooling can enforce a constraint, writing it in prose is strictly worse: slower feedback, relies on someone reading the doc, and drifts away from the code over time.

This has a second benefit for working with agents. A constraint the build enforces applies every time, including when the agent writes code. A constraint that lives only in prose applies when the agent reads and remembers it, which is less reliable. If it matters, making it automated makes it real.

The conventions file ends up smaller than you'd think when tooling handles the automatable checks. What's left is the stuff that genuinely needs judgment: architectural patterns, testing philosophy, workflows that span multiple tools. Those still benefit from being written once and read by both humans and agents, but the list is short.

Why prose isn't the default answer

Even for content that stays in prose, the instinct to write more because "the agent needs full context" pushes in the wrong direction. Prose has real costs for agents that I didn't appreciate at first.

Token cost is real and compounds. A 500 line README burns tokens on every agent session that loads it. Across a team running many sessions a day, that adds up to a cost that shows up on the bill.

More context isn't always better context. When a constraint is buried in narrative prose, the agent has more surface area to get distracted by. From what I've seen, shorter and more structured tends to produce more reliable behavior than thorough and narrative.

Over-specification invites over-production. If the README lists fifteen edge cases because I wanted to be thorough, the agent may write code for all fifteen even when the task only needs three. That's slop, caused by the doc, not the agent.

There's no feedback loop for overexplaining. A doc that's too vague shows up quickly: the agent produces wrong output, tests fail, or the reviewer pushes back. A doc that's too long has no equivalent signal. The agent still produces working output, just after chewing through more context than it needed. Nothing fails, so nothing tells you the doc got bloated. The feedback is asymmetric, and the natural drift is toward more.

The practical implication is that writing for both humans and agents doesn't mean writing more. It means writing clearly and keeping each doc as short as it can be while still being explicit.

When agent-only rules earn their place

Rules in .claude/rules/ are the one place where agent-only content is genuinely the right answer. But they're easy to overuse, and when they grow unchecked they create hidden behavior: the agent follows a directive from a rule file the human never opened, and when the output surprises you, the reason is scattered somewhere you don't think to look.

This reminds me of Django signals. A signal can fire from code you didn't write, triggered by an action you took somewhere else. Useful, but it surprises you when something goes wrong, because the behavior doesn't live where you're looking.

So rules earn their place when three things are true:

  • tooling can't enforce the same constraint
  • a README wouldn't reach it because the rule applies across modules
  • the constraint is specific and stable

If any of those fails, the content belongs in tooling, in a README, or in conventions.md instead. That's why the example tree has two rule files, not twenty.

When a spec earns its place

Specs in docs/specs/ are the home for features that don't fit the other homes. A module README is scoped to one module. A ticket is scoped to one unit of work. But some features don't sit inside a single module or a single ticket, and for those a spec is the right place.

The shape that usually needs one is a feature that crosses several modules and several stakeholders at the same time. "Place an order" is the example I keep coming back to. It touches the UI, the API, the payments module, inventory, and notifications. A Jira ticket can describe what the user wants at a high level, but it can't capture cross-module behavior, edge cases, and the alignment between product, design, and engineering that has to happen before anyone writes code. A single spec document gives everyone one place to converge.

The real work with docs/specs/ is figuring out the scope of each file. A spec earns its place when three things are true:

  • the feature spans enough modules that no single README can describe it
  • stakeholders need to align on behavior and edge cases before implementation
  • the behavior is stable enough to be worth writing down, not a quick experiment

If any of those fails, the content belongs in the ticket, in a module README, or in the code. Getting that scoping call right is what keeps the specs folder useful instead of turning it into a second documentation layer that drifts from the code.

Closing

If I am starting a repo, the playbook above is my default. If I am working in a repo where CLAUDE.md and the human docs have already drifted apart, the migration is less scary than it looks. Picking one module, merging its agent-facing and human-facing docs into a single README, deleting what the tooling already enforces, moving cross-module constraints to docs/conventions.md, and seeing what's left. Probably it's a lot less than I started with.

Write less, put each thing where it belongs, and most of the drift and duplication just goes away.

Bonus: ai-first docs skill

I've packaged this as a Claude skill if you want to try it on an existing repo: ai-first-docs.

Spec-driven development is having its moment. Tools like Kiro, GitHub's spec-kit, and OpenSpec all promise the same thing: write structured specs before coding, let AI agents implement against them, and finally bring order to the chaos of vibe coding. The pitch is compelling. The tooling is growing fast. OpenSpec has nearly 40k GitHub stars.

But the more I look at these tools, the more I think they're repeating a pattern our industry keeps falling into: reaching for more structure and more artifacts when the real problem is putting knowledge in the right place.

The overhead problem

Birgitta BΓΆckeler from Thoughtworks tested Kiro on a small bug fix. The tool generated 4 user stories with 16 acceptance criteria for something that should have been a quick fix. She described it as using a sledgehammer to crack a nut. With spec-kit she found the opposite problem: too many verbose markdown files to review, repetitive with each other and with existing code. Her conclusion was direct - she'd rather review code than review all those markdown files.

This is the core tension. SDD tools create a parallel documentation layer that needs to stay in sync with the code. We have seen this fail before with Javadoc that drifted from implementations, Swagger specs that diverged from actual API behavior, and architecture diagrams that became snapshots of how things were designed rather than how they work. Adding more markdown files does not solve the "documentation goes stale" problem. It multiplies it.

Martin Fowler himself raised a pointed concern: SDD encodes the assumption that you won't learn anything during implementation that would change the specification. That assumption has been proven wrong by decades of software engineering. Requirements change. Understanding deepens. What seemed right in the spec often needs adjustment once you see the code running.

Where the knowledge actually belongs

My issue with SDD is not the principle of thinking before building. That part is valuable and I practice it myself. My issue is with where SDD tools put the knowledge.

AI agents already have natural places to get project context. In my experience, the question is not "should I write specs?" but "where should each piece of knowledge live so the agent finds it without extra ceremony?"

Project-level agent files like CLAUDE.md or AGENTS.md are always loaded at the start of every session. This is where project-wide conventions belong: what libraries you use, how components are structured, what patterns to follow, how tests are organized. The agent reads this once and applies it to every task. Zero per-task overhead.

Rules files shape how the agent works. Always write tests. Never use any in TypeScript. Use European number formatting. These are behavioral constraints, lighter than project context but always active.

Skills are task-specific playbooks that get loaded only when relevant. A skill for "building a new chart component" or "writing an API endpoint" encodes your team's patterns for that specific type of work. The agent reads the skill when it needs it, not on every session. This is an underexplored pattern that I think will grow.

The codebase itself is the most honest source of truth. Existing components, tests, naming conventions, and patterns. A good agent can infer how things are built by reading what already exists. Your best chart component is the spec for how chart components are built in your project.

Issue trackers and design tools hold the per-task intent. A Jira ticket describes what needs to happen. A Figma file shows what it should look like. Both are accessible to agents via MCP without duplicating their content into markdown.

Technical contracts like OpenAPI specs, GraphQL schemas, and TypeScript interfaces are already machine-readable and already live in the codebase. They don't need a prose summary in a spec file.

When you add all of this up, the agent has: project context from CLAUDE.md, behavioral constraints from rules, task-specific patterns from skills, reference implementations from the codebase, intent from the issue tracker, visuals from Figma, and contracts from typed schemas. A spec markdown file is only needed when none of these sources adequately capture the complexity of what you're building.

When specs still earn their keep

I am not arguing against specs entirely. There are situations where writing a dedicated spec document is the right call:

Large, cross-cutting features where multiple stakeholders need to align on behavior and edge cases before anyone writes code. A "place an order" workflow that touches UI, API, payments, inventory, and notifications genuinely benefits from a written behavioral contract.

Features with non-obvious edge cases that the agent would otherwise invent inconsistently. What happens when payment fails mid-checkout? When inventory changed between cart and submission? When the user double-clicks submit? These details need to be written down somewhere, and a spec is a reasonable place.

New team patterns that don't have a reference implementation yet. If you're building your first chart component, a spec helps the agent understand what you want. Once that first component exists, it becomes the reference and future components need less specification.

But bug fixes? Small features? Refactors where the existing code tells the story? A spec is overhead that slows you down without adding value. The agent has your rules, your patterns, and your codebase. Let it work.

The selective approach

My vision is that specs should be opt-in, not the default workflow. Most daily development work should be covered by a well-maintained set of agent rules, skills, and a clean codebase with good patterns. You reach for a spec document when the complexity of a feature exceeds what those sources can communicate.

The decision boundary is simple: if you can describe the change in a ticket and the agent can implement it correctly using project rules and existing patterns, you don't need a spec. If the change involves multiple systems, stakeholder alignment, or non-obvious edge cases, write one. But keep it lightweight: behavioral intent and edge cases, not implementation instructions.

This also means the effort should go into maintaining your agent configuration (CLAUDE.md, rules, skills) rather than maintaining per-feature spec files. A well-maintained CLAUDE.md with clear conventions and reference implementations will produce better results across hundreds of tasks than a detailed spec will produce for one.

Structure over specs

Software engineering has a recurring habit of reaching for new artifacts when things feel chaotic. The instinct is understandable, but the answer is rarely "more documents." It's usually "better-placed knowledge." SDD tools formalize that instinct into a workflow, and for most daily work the overhead is not worth it.

The teams that will navigate this well are the ones who figure out early that agent context should live where agents naturally look: in project memory, rules, skills, and the codebase itself. Not in a spec folder that requires its own maintenance lifecycle.