Simon Wang

Posted on Feb 14 • Originally published at itnext.io

I Made AI Study My Codebase Before Writing a Single Line

#ai #programming #devtools #tutorial

Cover Image Photo by Vitaly Gariev on Unsplash

4 practices for building context that survives between sessions

Adding "because" to corrections helps AI apply principles within a session. But sessions end. Tomorrow, AI starts fresh.

What if AI already knew your project's patterns at the start of every session?

That's what this article is about: building context that persists.

Bootstrap: Have AI Read Your Code First

Before you write any instructions manually, let AI do the initial work.

An example prompt I might use for illustration purpose:

Read through this codebase. What patterns do you see that I'd probably correct you on if you got them wrong? Focus on:

Custom types we use instead of standard library types

Architecture boundaries (what shouldn't call what)

External service constraints (rate limits, costs, timeouts)

Conventions that appear consistently across files

Format each pattern as: "[Do X] because [Y]"

AI scans your code and surfaces patterns. Not everything it finds will be right. But it gives you a starting point, faster than writing from scratch.

A note on AI-generated "because" statements: The bootstrap prompt asks AI to hypothesize reasons for patterns it observes in your actual code files, not guess from generic training data. But these are still inferences, and some will be wrong. Treat the output as a first draft. When a "because" doesn't match reality, correct it. The exercise of reviewing and fixing these explanations often surfaces conventions you hadn't explicitly articulated.

Where this works best: Private codebases with conventions AI hasn't seen in training. For popular open-source projects, AI might already "know" the patterns from training data, making the bootstrap less revealing.

Set expectations: This isn't "set and forget." The instruction file is a living document. New patterns emerge, old ones become obsolete. Budget 10 minutes monthly to review and prune. The payoff is fewer repeated corrections, not zero corrections.

What comes back might look like (examples are illustrative, not from any real codebase):

- Use `process[Entity]Data` naming for transformation functions because the codebase follows this pattern consistently
- Keep domain logic in /src/domain because it's separated from infrastructure for testing
- Add retry with backoff on auth service calls because comments mention cold-start latency issues
- Use repository interfaces because the codebase follows dependency injection patterns

Review this. Keep what's accurate. Discard what's wrong or outdated. Edit what's almost right.

This is your initial instruction file. Save it somewhere your AI tool can access:

Cursor/Claude Code: Add to .cursorrules, CLAUDE.md, or project instructions (loads automatically each session)
ChatGPT: Save to Custom GPT instructions (loads automatically) or paste at session start (manual)
Claude Projects: Add to project knowledge (loads automatically)
Other tools: Keep in a file you can reference when starting new sessions

The trade-off: A detailed instruction file eats context window every session. Start lean (key patterns only) and expand as you identify what actually reduces corrections. If it grows past ~1000 words, consider splitting by module.

The Instruction File

Every AI tool has somewhere to put persistent context:

Cursor: Project rules, .cursorrules file, or AGENTS.md
Claude: Projects with custom instructions
Copilot: Custom instructions in settings
ChatGPT: Custom instructions or memory

The specific mechanism matters less than having ONE place where project context lives and loads automatically.

What goes in this file:

Naming Conventions

- Use `process[Entity]Data` for transformation functions because this is 
  the established pattern across all data processing modules
- Prefix internal API routes with `/internal/` because our gateway uses 
  this to block external access

External Service Constraints

- Add retry with exponential backoff on auth service calls because the 
  service has 2-3 second cold-start latency after idle periods
- Cache geocoding responses for 24 hours because the upstream API charges 
  $0.005 per call and has 200ms latency
- Set 5-second timeouts on inventory checks because the warehouse API 
  occasionally hangs

Architecture Boundaries

- Don't add repository calls in domain functions because this layer gets 
  reused in the offline-first mobile app where there's no database
- Keep the pricing calculator stateless because this service runs as a 
  Lambda and state doesn't persist between invocations

Learned from Incidents

- Validate phone numbers with libphonenumber because we support international 
  formats and need carrier data for SMS routing
- Log the full request before calling the payment gateway because we've 
  lost debugging context when their API times out

Notice: every pattern has "because." The reasoning is what makes these transferable, not just rules to follow blindly.

What This Applies To

This approach works with AI tools that support persistent instructions: Cursor rules, Claude Projects, Copilot custom instructions, and similar. You need somewhere to store context that loads automatically at session start.

For pure autocomplete without instruction file support, the bootstrap and capture practices still help you think clearly about patterns, even if you can't feed them back to the tool directly.

Capturing New Patterns

Your instruction file will grow over time. The source: corrections you make during work.

If you've adopted the "because" habit from the companion article, you're already generating good candidates. The signal that something belongs in the instruction file: you've corrected the same thing twice.

First correction: maybe a one-off. Second correction: it's a pattern. Save it.

The workflow:

You correct AI with "because" during a session
The correction helps within that session
If you make the same correction again (same session or different), copy it to the instruction file
Now it loads automatically in future sessions

Don't try to anticipate everything. Let the file grow from actual corrections. What you actually correct is more valuable than what you think you might correct.

Per-Task Review: Ask Before Generating

Bootstrap creates your initial file. Capturing grows it over time. But there's a third practice that catches misunderstandings before they become code.

Before significant generation, ask AI what it found.

The prompt:

Before you generate this, tell me: what patterns in the relevant code would you follow? What constraints would you respect?

AI shows its working. You review before it generates, not after.

This catches:

Patterns AI noticed that you didn't intend to follow
Constraints AI missed that you expected it to catch
Conflicts between patterns that need resolution

Example:

You ask AI to add a new payment method. Before it generates, you ask what patterns it would follow.

AI responds: "I'd use the PaymentError wrapper, batch API calls, and follow the existing repository pattern in PaymentRepository."

You notice: "Actually, this new provider doesn't have rate limits. Don't batch. And we're trying to move away from the repository pattern for new code, use the port/adapter pattern instead."

You've prevented two wrong guesses before they became code to review and correct.

This is deliberate discovery: actively surfacing AI's assumptions before acting on them.

Maintenance

The instruction file isn't precious. It evolves.

I review mine roughly monthly. Not on a rigid schedule, just when I notice it's been a while or when corrections start repeating.

What I look for:

Outdated patterns: We migrated off Stripe. Remove those constraints.
Conflicts: Two patterns that contradict each other. Resolve or clarify scope.
Bloat: Is the file getting long enough that AI might ignore parts? Split or prioritize.
Missing context: Patterns I've corrected repeatedly that aren't in the file yet.

The goal isn't a perfect document. It's a living reference that makes AI more useful over time.

Example: My Instruction File Structure

Here's a simplified version of how I organize mine:

# Project Context for AI

## About This Project
Brief description: what it does, main technologies, team conventions.

## Naming Conventions
- Use `process[Entity]Data` for transformation functions because...
- Prefix internal routes with `/internal/` because...

## External Services
- Auth service: retry with backoff, cold-start latency
- Geocoding API: cache 24h, $0.005/call
- Warehouse API: 5s timeout, occasionally hangs

## Architecture Rules
- Domain layer: no infrastructure dependencies
- Services: stateless (Lambda deployment)
- New code: port/adapter pattern, not repository

## Domain Requirements
- Validate phones with libphonenumber (international + carrier)
- Log before external calls (debugging context)

## Current Migrations (Temporary)
- Moving from repository pattern to port/adapter
- Old code uses X, new code should use Y

## Still Figuring Out
- Best way to handle cross-service transactions
- Whether to split this into multiple services

The "Still Figuring Out" section is important. It tells AI where you don't have answers yet, so it doesn't confidently apply a pattern that you're still questioning.

The System

Four practices that build on each other:

Bootstrap: AI reads codebase, creates initial instruction file
Capture: Save "because" corrections that repeat
Review: Ask AI what patterns it found before generating
Maintain: Monthly light-touch review

You don't need all four to start. Bootstrap once, then capture as you work. Add per-task review for complex generations. Maintain when the file feels stale.

This is my system. Yours will look different. The principles transfer: give AI the context it needs, and keep that context current.

DEV Community