The agent landscape — what each one is for

"AI coding agent" is now a category, not a product. The tools differ in where they sit, what they can touch, and how autonomous they are.

ToolSurfaceSweet spot
Claude CodeTerminal / CLIEnd-to-end agentic coding, file ops, running shells. The most powerful for real codebases.
CursorIDE (VS Code fork)You stay in editor; agent has full project context. Inline edits, agent mode for bigger tasks.
WindsurfIDE (VS Code fork)Similar surface to Cursor. Both are fine; pick one and learn it.
GitHub CopilotIDE pluginInline completions and chat. Lightest touch — least autonomous, often that's right.
v0 (Vercel)Web — generates UI"I want a settings page that looks like X." Outputs React + Tailwind components you copy.
LovableWeb — generates appsWhole-app generation from a prompt. Good for rapid prototypes.
BoltWeb — generates appsBrowser-based full-stack generator with live preview.
Replit AgentWeb — cloud IDEReplit's environment + agent. Good for "I have an idea and no laptop."

The categories that matter:

Terminal / CLI agents
Claude Code is the lead. They live in your shell, can run commands, edit files anywhere. Best for working in a real codebase you already have.
IDE-resident agents
Cursor, Windsurf, Copilot. The agent has your project's full context — open files, related code, conventions. Best for steady, day-in-day-out development.
Generators (browser-based)
v0, Lovable, Bolt, Replit Agent. Best for greenfield projects, prototypes, or specific UI you can copy out. Less ideal once you have a serious codebase to integrate with.

Context is the work

Here's the most important sentence in this level: the quality of an agent's output is almost entirely a function of the quality of the context you give it. Models are smart. They are not psychic.

"Context" means everything the agent can see when it answers — the files in your project, the prompt you wrote, the conversation so far, the README, the schema, the conventions. The agent's whole understanding is what's in its window. Anything outside it might as well not exist.

Three categories of context you control:

Project context
The codebase itself. Structure, naming, framework choices, design system. CLI agents read files freely; IDE agents see open tabs and project-wide search. The cleaner your project is organized, the better the agent's reads.
Conversational context
What you've said in this session. The agent remembers, but only up to a budget — past a certain length it'll start losing earlier turns.
Out-of-band context
Things the agent can't see: the PRD in Notion, the discussion in Slack, the bug report in Linear, the user's screenshot. You have to bring these into the conversation explicitly.
⨯ Bad prompt
"add user profiles"
✓ Good prompt
"Add a user profile page at /u/[username]. It should show: avatar, display name, bio (max 160 chars), join date, and the user's last 10 public tasks. Match the styling of the existing project page (app/p/[slug]/page.tsx). Use the User model already in prisma/schema.prisma; add a bio field if it's missing. Don't add new dependencies."
Why it works: Specifies the route, the data shown, the existing pattern to match, the data model, and what NOT to do. The agent now has a target small enough to hit on the first try.

Scoping a task — small enough to land cleanly

The unit of work for an agent is a scoped task — small enough that you can describe its end state in two or three sentences, evaluate it in five minutes, and revert it cleanly if it's wrong.

Tasks that are too big:

  • "Build the auth system." (Multi-day. Many sub-decisions. Will go off the rails.)
  • "Refactor the codebase to use a state library." (Touches everything. High blast radius if anything's wrong.)
  • "Make the app feel snappy." (Subjective. No success criterion.)

The same work, scoped:

  • "Add a magic-link send endpoint at POST /api/auth/send-link. Use Resend. Rate-limit to 3 per email per hour."
  • "Replace the manual useState in app/board/page.tsx with a useTasksStore Zustand store. Don't touch other files."
  • "The board page takes 800ms to first paint on slow 3G. Suspect: the tasks query loads 10× more data than it shows. Audit the query, suggest a leaner one."

Writing a spec the agent can act on

For tasks past trivial, write a tiny inline spec. The structure that works:

  1. Goal

    One sentence. What, not how. "Users can mark a task as complete from the board view."

  2. Constraints

    What the agent can't change. "Don't touch the API. Use the existing updateTask hook. Match the pattern in TaskCard.tsx."

  3. Acceptance

    How you'll know it's done. "Clicking the checkbox toggles the status. The toggle is optimistic; if the API errors, it reverts. The card has a strikethrough when done."

  4. Out of scope

    What you don't want — pre-empts the agent's tendency to overshoot. "Don't add animations. Don't refactor anything else."

Reading the diff — the actual job

The agent produced a change. Now your job — and it is a job, not a glance — is to read it. Reading a diff is the single most important skill in agent-driven development. If you don't read what the agent wrote, you don't understand your own software.

What to look for, in order:

  1. Did it touch only the files you expected? If your prompt was about the auth flow and it edited tailwind.config.ts, ask why before you accept.
  2. Did it use existing code where possible? Or did it duplicate a helper that already exists in /lib? Agents have a tendency to write fresh code instead of reaching for what's there.
  3. Are the names consistent with the project? If everything in your project is camelCase and the new file uses snake_case, that's a smell.
  4. Are there assumptions you don't agree with? The agent had to make tiny choices — naming, error-handling style, where state lives. Disagree out loud.
  5. Are there things that are subtly wrong but compile? A function that returns the wrong thing on edge cases. A check that's slightly off. A missing await.
⨯ Bad prompt
"looks good, ship it"
✓ Good prompt
"In app/api/tasks/route.ts line 23, you wrote where: { id: taskId } — but multiple workspaces could have tasks with the same ID. Should this also filter by workspaceId? Confirm before I merge."
Why it works: Reviewing means asking pointed questions about specific lines. Generic approval is how broken code makes it to production with an agent's fingerprints on it and yours, too.

The debug loop with an agent

The code doesn't work. What now?

  1. Reproduce it

    Make the bug happen consistently. "It sometimes breaks" is unfixable. "Click create-task with no title; the page goes blank" is fixable.

  2. Capture the evidence

    The exact error, the network response, the console output, the screenshot. Paste these into the agent — don't paraphrase.

  3. State the expected vs. actual behavior

    "Expected: form shows a 'title required' error inline. Actual: page goes blank. Console shows TypeError: undefined is not a function."

  4. Tell the agent what you've already tried

    Stops it from suggesting the same fix you just ruled out.

  5. Ask for the smallest fix that addresses the root cause

    Not "refactor this whole component." A surgical change.

  6. Verify before celebrating

    Re-run the steps that produced the bug. Confirm it's actually fixed, not just changed.

⨯ Bad prompt
"it doesn't work, fix it"
✓ Good prompt
"Bug: clicking Create with an empty title crashes the page. The browser console shows: TypeError: Cannot read properties of undefined (reading 'trim') at TaskForm.tsx:47. Expected: an inline validation error. I've already tried wrapping line 47 in a null check; that just hides the symptom. Find the root cause and fix it cleanly."
Why it works: Bug + concrete error + line number + expected behavior + what was already attempted = an agent that lands the fix on the first try. Without those, you're asking it to debug from a description of the smell.

Managing long sessions

Agents have a context budget. Past some length, the conversation starts to lose its earliest turns. The PRD you pasted in turn 3 is forgotten by turn 30. The convention you established at the start has drifted.

Tactics for keeping long sessions coherent:

  • Persist the spec to disk. Put your PRD, architecture decisions, and conventions in /docs in the repo. CLI agents will read them on demand. You won't have to re-paste.
  • Use a "memory file." A CLAUDE.md, .cursorrules, or AGENTS.md at the project root that the agent reads at session start. Put your conventions there: naming, framework versions, "always use X library, never Y."
  • Compact and restart. When a session gets long, summarize it ("here's what we built, here's what's left") and start fresh. The agent gets more lucid.
  • Commit frequently. If a session goes wrong, you can roll back to a known-good point — only possible if you've been committing small chunks.
  • Branch per feature. Same reason — keeps each session's changes isolated and revertable.

When to trust the agent — and when to step in

The agent is a brilliant junior engineer with no memory and infinite confidence. Trust accordingly.

Trust agents with:

  • Boilerplate (forms, CRUD endpoints, repetitive UI).
  • Unfamiliar APIs (read docs faster than you can).
  • Translations of clear specs into clean code.
  • Refactors with strong tests covering the affected code.
  • Glue code, scaffolding, naming conventions, formatting.

Step in for:

  • Architectural decisions. Where does state live? What's the API shape? These compound; a bad call here is expensive later.
  • Anything security-sensitive. Auth flows, permission checks, secret handling. Agents will write code that compiles and is dangerous.
  • Performance-sensitive paths. The agent doesn't know which endpoint is hit a million times per day.
  • Domain logic. Pricing rules, billing, weird edge cases specific to your business.
  • Anything you don't understand the output of. If you can't read the diff, don't merge it.

Handling hallucinated APIs

Agents sometimes invent functions, libraries, or API endpoints that don't exist. Confidently. The output looks right; the code won't run; the documentation it cites is fictional.

How to catch and prevent it:

  • Verify imports exist. If the agent imports @radix-ui/react-magic, check that package is real before npm install fails on it.
  • Cross-check API method names against docs. Especially for libraries with multiple major versions — agents conflate v1 and v3 syntax routinely.
  • Pin versions in the prompt. "We're on Prisma 5.8 and Next.js 14.2" cuts down on mismatched syntax.
  • Run the code. The fastest hallucination filter is the type checker and the dev server. If it doesn't run, something is wrong.
  • When the agent insists, ask for a citation. "Show me the docs page where this method is documented." If it can't, the method is suspect.

PRD → tickets → prompts

The full pipeline for building a feature with an agent, from idea to merged code:

PRD one-page doc /scope Tickets small + acceptance /prompt Prompt + context /agent Diff read it /PR Merge
Each arrow is a place where it can go wrong. The PRD is fuzzy → the tickets are wrong. The prompt has no context → the diff is junk. Your judgment is at every arrow.

The artifacts at each stage:

PRD (one page)
Problem · Approach · Acceptance · Non-goals. Lives in /docs in the repo. Anyone — including the agent — can read it.
Tickets (small)
Each one a deliverable: title, description, acceptance criteria, files likely to touch. The agent helps you generate these from the PRD.
Prompts (per ticket)
The actual ask. Goal + constraints + acceptance + out-of-scope. Often a sentence or two with the ticket as the source of truth.
Diff (the output)
The agent's change. You read it.
PR (the package)
The diff with description, screenshots, and self-review. You (or a teammate) approves and merges.

See the Prompt Library for paired bad-vs-good examples at each stage of this pipeline.

End of level

Wrap-up

Jargon recap

Agent
An LLM with tools — can read files, run shells, edit code.
Context
Everything the agent sees when it answers. Project + conversation + what you paste.
Spec
Mini-document for a task: goal, constraints, acceptance, out-of-scope.
Diff
The change the agent produced. Always read.
Hallucination
Agent invents API/method/library that doesn't exist.
Memory file
AGENTS.md / CLAUDE.md / .cursorrules — agent reads at session start.
PRD → tickets → prompts
Pipeline from idea to shipped code.
LGTM
"Looks good to me." Approving a PR — say it only after you've actually read it.

You should now be able to

Mini-exercise

Take a feature you want and write the full pipeline for it on paper: a 200-word PRD, three tickets that decompose it, and a prompt for the first ticket. Don't run any of it through an agent — just notice how much sharper the eventual prompts will be when the upstream artifacts exist.