DEV Community

Cover image for I've organised the Claude Code commands, including some hidden ones.
灯里/iku
灯里/iku

Posted on

I've organised the Claude Code commands, including some hidden ones.

Greetings from the island nation of Japan.

In an era where we outsource our cognitive heavy lifting to silicon, keeping up with the relentless updates of Claude Code feels remarkably like trying to sip from a firehose while apologising for the splashing. We live in a world where "staying current" has a half-life shorter than a cup of artisanal matcha, and frankly, Anthropic’s pace of shipping features—some whispered in the dark corners of Twitter, others tucked away like Easter eggs for the desperate—is enough to make any developer consider a quiet life of organic rice farming.

This article is my personal attempt to organize the digital clutter before I lose the thread entirely; a curated map of the essential commands, the "agentic" chaos of sub-tasks, and the hidden gems that the official documentation forgot to highlight. Think of it as a survival guide for those of us who are tired of being roasted by our own usage reports. By the end of this read, you’ll hopefully navigate these AI waters with a bit more grace, or at least learn how to use /rewind to erase the evidence of your 3:00 AM coding hallucinations.

Introduction

Claude Code has quite a few features not covered in the official documentation, plus commands you'd never use unless someone told you about them.
There's honestly just too much — keeping up with the official docs is a real struggle, and lately I've been drowning in it all.

This article compiles everything from basic commands to recently added features and tips for running Agents, all gathered from hands-on use.
I needed to organize this for myself… I was losing track of everything.

And even so, I'm sure I've missed things — just keeping up with Claude, or rather Anthropic, is a full-time job…

I started out adding screenshots for everything, but there were just too many, so please run any commands you want to try in your own Claude. (Sorry for being lazy.)

:::message
The information in this article is current as of February 2026.
Claude Code is under active development, so please check the official documentation for the latest information.
Also, beyond the official docs, the dev team will casually drop "oh yeah, that exists" or ship things without mentioning them in the release notes, so I highly recommend following them on Twitter. Seriously.
:::

15 Essential Commands

Here's a list of commonly used commands. Some are absolute basics, I know.

Command Description Usage Example Tips / Best Practices / Notes
/rewind Rewind conversation or code changes Esc+Esc to show menu. Choose to rewind code only or conversation only Auto-checkpoints (saved on every prompt) make this great for experimental edits. Saves tokens in long sessions. Beginners should use "rewind code only" liberally for safe experimentation
/insights Generate an HTML report analyzing your usage patterns /insights saves report to ~/.claude/usage-data/report.html Recent feature that analyzes your coding habits in almost roast-level detail. The report suggests Skills and Hooks to optimize your workflow. Run monthly. This one is seriously amazing. You can see exactly how to improve based on your development style.
/help Show list of available commands /help Essential for beginners. A starting point for discovering hidden features. Fair warning — the amount of info it dumps on you is overwhelming. It really hits you with a wall of text.
/context Display context usage (token consumption visualization) /context Prevents token overflow in long conversations. Combine with /compact to keep output short. I tend to throw a lot of context at it, so I'm trying to use it bit by bit to find the sweet spot between the AI and me (the human)
/compact Switch responses to concise mode /compact or /compact focus on errors Saves tokens. Specifying error focus improves debugging efficiency
/init Initialize a new project (creates CLAUDE.md, etc.) /init Use at project start. Combine with custom templates
/usage Show plan usage and rate limit status /usage For subscription plan users. Monitor limits on free plan. Though I don't see many people using the free plan
/clear Clear conversation /clear Reset context for new tasks. I use this fairly often with a "let me just clear this real quick"
/agents Sub-agent management /agents Parallel processing for complex tasks. The hot topic right now. Burned through my tokens. Still feels like a luxury feature at this point
/install-github-app Install GitHub App (automate PR reviews) /install-github-app Integrate into CI/CD workflows. Boost productivity with automated PR comments. I recently set this up and have only tried it on private repos, but it looks promising. Haven't tried it for company use yet — feels like it might strip away some of the human touch
/cost Show token usage statistics /cost Track costs per session. /usage is for your overall plan, while this is per-session. Claude tends to be a big eater compared to others because she's smart, so I keep an eye on this
/export Export current conversation to file or clipboard /export conversation.md For saving and sharing useful exchanges. Not used often, but good to know
/review Request code review /review For when I'm paranoid about whether my code is garbage. Self-review before PRs. I'm anxious by nature so I do this a lot. Lately I've been considering having another model review too, while still having Claude Code review as well
/pr_comments Display PR comments /pr_comments Requires GitHub integration. For checking comments. As I wrote in my previous article, GitHub and I are basically inseparable at this point
/doctor Environment diagnostics (detect dependency and config issues) /doctor Same as a human health checkup. First stop for troubleshooting

Notable Features

/rewind - Time Travel Debugging

/rewind was recently enhanced to allow rewinding conversation and code separately.
I tend to say unnecessary things that make sessions drag on, so this really helps. Sorry for always being a burden, Claude.

Key features:

  • Auto-checkpoints (automatically saved on every prompt)
  • Esc+Esc to show the menu
  • Choose to rewind code only / conversation only

Use case:

# Try an experimental refactoring
→ Didn't work out
→ Esc+Esc → "Rewind code only"
→ Code reverts while conversation history is preserved
Enter fullscreen mode Exit fullscreen mode

Tips:

  • Use with parallel sessions (multiple terminals) for versioning
  • Also effective for saving tokens in long sessions (personally very grateful for this)

Reference:

/insights - Analyze Your Coding Habits

Reads your past month of usage history and compiles it into an HTML report.
Incredibly detailed. I can't share mine due to private reasons and too many accidental reveals, but please just try it once.
"Let's build the ultimate Claude environment together" — you'll feel that warm fuzzy feeling, while also being slightly terrified by how good this thing is.

What it generates:

  • Command usage frequency
  • Common patterns
  • Custom command recommendations
  • Skills suggestions

Usage:

/insights
# Output to ~/.claude/usage-data/report.html
Enter fullscreen mode Exit fullscreen mode

Tips:

  • Run monthly to review your workflow
  • The report suggests Skills and Hooks
  • Analyzes your coding habits in almost roast-level detail

:::message
For a deeper look at how it works, this article is a great reference.
It's in English and an excellent summary.
Deep Dive: How Claude Code's /insights Command Works
:::

Hidden Commands & Handy Features

Plan Mode (Shift+Tab) - Improve Success Rates on Large Tasks

Instead of jumping straight into writing code, you can have Claude analyze your codebase in read-only mode first, then decide on an implementation approach.
This is considered fairly basic, but I'm including it anyway. "Just plan first" — even the official team says so.
I personally want to make this a habit, and being the cautious worrier I am, I tend to use Plan Mode quite a lot.

How to activate:

  • Press Shift+Tab to cycle modes (Normal → Auto-Accept → Plan)
  • Or instruct: "Let's plan this first."
  • You can also use the /plan command directly

:::message alert
Windows note: Since Claude Code v2.1.3, there's a reported bug where Shift+Tab doesn't show Plan Mode on Windows (Issue #17344). Use the /plan command as a workaround. Or just tell Claude Code "let's plan."
:::

Use case:

# Before a major refactoring or architecture change
Switch to Plan Mode with Shift+Tab
→ Analyze codebase in read-only mode
→ Generate implementation strategy report
→ Begin implementation after approval
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Dramatically improves first-try success rate
  • Reduces wasted token consumption
  • Provides clear visibility on complex tasks

/statusline - Monitor Context Usage in Real-Time

Displays context usage in real-time.
I use this to stay on top of things for compacting. Too much context makes LLMs perform worse, so this is something humans can actively manage.

/statusline
Enter fullscreen mode Exit fullscreen mode

Use cases:

  • Token monitoring
  • Combine with /compact to prevent token overflow

/resume - Resume Sessions

Load a past conversation and continue where you left off.

# Resume the latest session
claude --resume

# Select from session picker
/resume

# Resume a specific session by ID/name
claude --resume auth-refactor
Enter fullscreen mode Exit fullscreen mode

Handy uses:

  • Continue yesterday's work
  • Switch between multiple projects

:::message
Want to find a session from a specific date? There's no built-in date search command, but session data is stored under ~/.claude/projects/, so you can ask in natural language: "Find my sessions from December 2024" and it'll search for you. If you use this often, you could create a custom command at ~/.claude/commands/history.md. Searching by specific date might be rare, but "I think I had a conversation around some month…" does happen.
:::

Launch Option: -p Mode

A high-speed mode that generates code without explanation.
I've been thinking lately that power-user engineers might prefer this.
I'm on the weaker side, so I plan a lot and talk to Claude Code constantly.

# Launch in print mode (non-interactive)
claude -p "explain this function"

# Combine with pipes
cat logs.txt | claude -p "explain"
Enter fullscreen mode Exit fullscreen mode

Best for:

  • Automation from scripts
  • Quick questions
  • CI/CD pipeline integration

Keyboard Shortcuts

Memorizing these speeds up your workflow.
I'm a Windows user, so Mac users should substitute Command key etc. as appropriate.
Recently some shortcuts have started conflicting with each other, so consult your own environment setup.

Shortcut Function Notes
Esc (once) Stop generation Stop a runaway response immediately
Esc (twice) Show /rewind menu Rewind code or conversation
Shift+Tab Cycle modes Normal → Auto-Accept → Plan
Ctrl+G Open editor Handy for multi-line input
Ctrl+T Toggle task list Check progress
Ctrl+R Search command history Interactive search through past inputs
Ctrl+V Paste image On Mac too — Ctrl+V, not Cmd+V
Alt+P (Win/Linux) Switch model Change model while typing a prompt

Tips:

  • Apparently you can combine voice input (Mac: fn+fn) with Esc for hands-free operation. An Anthropic team member mentioned this. Lucky…
  • Run /terminal-setup once to enable Shift+Enter for multi-line input

Agents (Avoiding Total Chaos)

Agents are convenient, but having too many will drown you in information.
There's also the question of how much to delegate to AI — I'm personally still a bit hesitant to hand everything over, so I'm taking it gradually.
Anthropic is aware of this and improvements are ongoing.
We're all figuring out the right balance that's kind to both humans and AI.

/agents - Sub-Agent Management Basics

You can delegate tasks across multiple sub-agents.

/agents
# Menu appears

# Create a custom agent
"Spawn researcher agent for docs"
Enter fullscreen mode Exit fullscreen mode

My current best practices:

  1. Start small: Begin with 2-3 agents (more = information overload, and still a bit scary)
  2. Keep parallel runs to 3-5: More than that leads to chaos (fun though)
  3. Write detailed task briefs: Clearly specify WHY/HOW
  4. Use tmux for session management: Organize multiple agents

For those with deep pockets who want large-scale orchestration, check out Oshio-san's viral article for a general idea of the sub-agent concept (it's a genuinely fun read):

https://zenn.dev/shio_shoppaize/articles/5fee11d03a11a1

Agent Teams - Autonomous Collaboration Mode (Research Preview)

:::message alert
Agent Teams is an experimental feature. You need to set the environment variable CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS to use it.
:::

In Team mode, a lead agent delegates work to multiple teammates who collaborate autonomously.
I found it kind of funny how they just poof disband when done. Very professional.
No lingering around — "alright team, we're done here."
You can enable it from settings.json.

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}
Enter fullscreen mode Exit fullscreen mode

Delegate Mode:

  • "Delegate Mode" is added to the Shift+Tab cycle
  • The lead agent only coordinates (cannot edit code)
  • Focuses on task management, team communication, and review

Features:

  • Shared task lists across teammates
  • Direct messaging for mutual coordination
  • Unlike sub-agents, each operates as a fully independent Claude Code instance

Sub-Agents

Launch dedicated sub-agents from the main agent to delegate specific tasks.
If you pick the wrong model for this, everyone ends up on Opus and costs skyrocket. Made me wish I were rich.
The basic approach is to use Opus as the commander and Sonnet for the others, adjusting based on the task.

# Define custom sub-agents via CLI flags
claude --agents '{"reviewer":{"description":"Reviews code","prompt":"You are a code reviewer"}}'
Enter fullscreen mode Exit fullscreen mode

Use cases:

  • Dedicated test agent
  • Dedicated documentation generator
  • Dedicated code reviewer

Differences between Sub-Agents and Agent Teams:

Aspect Sub-Agents Agent Teams
Independence Runs within parent session Fully independent instances
Communication Returns results to parent only Direct messaging between teammates
Stability Stable release Research Preview (experimental)

/tasks - Task List Management

A task list that persists even when you close a session. Added in v2.1.16 (January 2026).
Tasks don't disappear even if a human accidentally closes the session.
I've been that idiot who was messing around with Claude Code late at night and closed the session. Lifesaver.

# Toggle task list display
Ctrl+T

# Create tasks with natural language
"Add authentication feature. Break it down into tasks by dependency"
Enter fullscreen mode Exit fullscreen mode

Features:

  • Persisted as files in ~/.claude/tasks/
  • Carries over across sessions
  • Shareable across multiple sessions (via CLAUDE_CODE_TASK_LIST_ID environment variable)
  • Preserved even after context compression

Benefits:

  • Prevents forgetting things in complex projects
  • An evolution of the traditional TODO list

Chaos Prevention Tips

Some of these are obvious, but I want to write them down for my own sanity.

Best practices for avoiding chaos:

  1. Summarize context with /compact
   /compact Prioritize keeping the error handling patterns
Enter fullscreen mode Exit fullscreen mode
  1. Document team rules in CLAUDE.md

    • Maintain consistency across agents
    • Clarify role assignments
  2. Use MCP Tool Search for lazy-loading tools

    • Save context
    • Load only the tools you need
  3. Syntax highlighting

    • Change themes with /theme
    • Improves review readability

Output Styles

Use /output-style to change Claude Code's output style.
There are various styles. I see a lot of people tweaking this for fun or motivation. Makes sense. I get it.

Main Styles

Style Characteristics Best For
Default Concise, speed-focused, code only Maximum work efficiency
Explanatory Explains design decisions and trade-offs while working Understanding code intent
Learning Explains reasoning behind changes, has user write small code snippets Learning new technologies

Configuration

# Change output style
/output-style

# Undocumented feature: set up output modes
@agent-output-mode-setup
# → Generates 4 custom modes in ~/.claude/output-modes/:
#    Concise, Educational, Code Reviewer, Rapid Prototyping
Enter fullscreen mode Exit fullscreen mode

Customization

Open the Settings screen with /config to modify various settings.

Tips:

  • Output styles can be applied to Agents too
  • Custom output styles can be created

AskUserQuestion - Interactive Question Feature

When Claude is unsure about a decision, it presents options for you to choose from.
This pops up when I give unclear instructions — I feel a bit guilty but gratefully select an option… though honestly I usually end up picking "other" and typing whatever I want.

Features:

  • Improved usability with Agents integration
  • Also used for permission confirmations like file deletion
  • Useful for turning vague instructions into specific ones

Example:

"Implement feature X"
→ Auto-popup when unclear points arise
→ Select by entering a number in CLI
Enter fullscreen mode Exit fullscreen mode

Auto-Accept Mode

Switch to Auto-Accept Mode with Shift+Tab to auto-approve permission confirmations.
I'm still a little nervous about this, and while the clicking is tedious, I generally switch between manual approval and Auto depending on the situation.

Caution:

  • Use with security awareness
  • Difference from --dangerously-skip-permissions: Auto-Accept can be toggled during a session

Prompt Optimization Techniques

The way you write prompts changes output quality. I almost felt like I didn't need to include this, but just in case.
Here are some useful patterns.

Self-Review

"Grill me on changes"
Enter fullscreen mode Exit fullscreen mode

Gets you a tough code review.
By the way, "grill" is slang for "interrogate" in English, so you might not want to use it too casually.

Deep Thinking

"Ultra think"
Enter fullscreen mode Exit fullscreen mode

Gets Claude to think more deeply before responding.
This has been used with ChatGPT and others for a while now.

Task Decomposition

"Step by step"
Enter fullscreen mode Exit fullscreen mode

Progresses through complex tasks in stages.
I also use this when studying — shamelessly asking "explain it to me this way."

Hallucination Prevention

Encourages careful responses in conservative mode.

"Be conservative and verify before making changes"
Enter fullscreen mode Exit fullscreen mode

That said, hallucinations still happen because LLMs.
And that's fine — it keeps the human side vigilant too, which is healthy. Big heart energy.

Custom Slash Commands

Handles repetitive tasks with a single command.
Personally, I think this is the tastiest part of Claude Code.
Being free from prompt management? That's what makes me happiest.
Thank you, Anthropic — there are various things to appreciate, but personally, being able to customize everything (Skills included) is just wonderful.

Basic Setup

Global commands:

~/.claude/commands/unit-test.md
Enter fullscreen mode Exit fullscreen mode

Project-level:

.claude/commands/deploy.md
Enter fullscreen mode Exit fullscreen mode

Good Usage Examples

/unit-test - Auto-Generate Tests

# unit-test.md
Generate comprehensive unit tests for $ARGUMENTS.
Include edge cases and error handling.
Enter fullscreen mode Exit fullscreen mode

/fix-bugs - Automated Bug Fixing

# fix-bugs.md
Analyze $ARGUMENTS for bugs and fix them.
Explain what was wrong and how you fixed it.
Enter fullscreen mode Exit fullscreen mode

/deploy - Deployment Workflow

# deploy.md
1. Run tests
2. Build production bundle
3. Deploy to $ARGUMENTS environment
4. Verify deployment
Enter fullscreen mode Exit fullscreen mode

Using Arguments

# Receive arguments with $ARGUMENTS ($0, $1 also work)
/unit-test src/utils.js
Enter fullscreen mode Exit fullscreen mode

Upgrading to Skills

Upgrading custom commands to Skills lets you:

  • Add sub-files (reference documents)
  • Build more complex workflows
  • Use disable-model-invocation: true so they only run when explicitly invoked by the user

Session Handover Tips

When context is about to overflow, or when you want to reliably carry over to the next session in a long-term project — there are several approaches.
I'm still figuring out which style works best for me.
Also on the fence about whether to make these into Skills.

Method 1: Save conversation with /export

/export handover.md
# Current conversation is output to file
# In the next session: "Read handover.md and continue"
Enter fullscreen mode Exit fullscreen mode

Method 2: Create a custom command

In international communities, the pattern of creating a "handover" command that structures and saves a session summary is gaining traction.

# ~/.claude/commands/handover.md
Create a handover document for the current session:
- Summary of work done
- Decisions made
- Incomplete tasks
- Pitfalls encountered and lessons learned
Save as HANDOVER.md.
Enter fullscreen mode Exit fullscreen mode

Method 3: /teleport to move to a Web session

# Send from local to a claude.ai Web session
& task description

# Pull a Web session back to local
/teleport
Enter fullscreen mode Exit fullscreen mode

Comparison with Memory:

Aspect Memory (CLAUDE.md) /export + Custom Command
Behavior Automatically referenced Explicitly saved and loaded
Format CLAUDE.md file Any file
Best for Project-wide rules and context Specific session handovers

Potential Tips Worth Noting

  1. Turn repetitive tasks into commands

    • Examples: Git commits, running tests, builds
  2. Create commands suggested by /insights

    • Optimized based on your usage patterns
  3. Separate project-level and global commands

    • Project-specific → .claude/commands/
    • General-purpose → ~/.claude/commands/

Reference:

Hidden Features & Advanced Usage

Artifacts - Interactive Code Generation

This is a feature of Claude (web and desktop), but it's been extended in Claude Code.
Well, it was originally a Claude Code thing, technically.
I think this area is more about the distinction between engineers and non-engineers.

web-artifacts-builder skill:

  • Generates HTML/JS/CSS as files
  • Live editing possible
  • For interactive tools like "create a budget calculator"
"Create a budget calculator with live updates"
→ web-artifacts-builder skill activates
→ HTML/JS/CSS files are generated
Enter fullscreen mode Exit fullscreen mode

Checkpointing

An automatic backup feature used with /rewind.
This is seriously a lifesaver. Save points are a must.

Features:

  • Can rewind both code and conversation
  • Auto-creates checkpoints
  • Functions as a safety net

Reference:

! for Shell Injection

Lets you fetch live data within skills.
Subtle but appreciated.

# Example: Fetch GitHub PR diff live
!gh pr diff

# Example: Check current Git status
!git status
Enter fullscreen mode Exit fullscreen mode

Use cases:

  • Fetching live data
  • Integration with external tools
  • Reflecting dynamic information

Context Management

Auto-Compact (Automatic Context Compression)

When you use about 95% of the context window, it automatically summarizes and compresses the conversation (auto-compact).
Essential information is preserved while letting you continue the session seamlessly.
The web version has this too. I trigger it fairly often so I always feel like "s-sorry… the conversation got long again…"

# Manual compact (you can specify what to preserve)
/compact Keep the error handling patterns

# Check current context usage
/context
Enter fullscreen mode Exit fullscreen mode

Tips:

  • Since v2.0.64, compacting completes instantly (Claude Code feels pretty fast. The web version seems to work harder at it)
  • Manual /compact lets you specify what to preserve via instructions
  • Long sessions are managed automatically, so basically just let it handle things

MAX_THINKING_TOKENS

Expand thinking tokens to improve reasoning capability.
The trade-off with your wallet. Naturally.

MAX_THINKING_TOKENS=10000
Enter fullscreen mode Exit fullscreen mode

Trade-offs:

  • Reasoning capability ↑
  • Cost ↑

When to use:

  • Complex problems: Set higher
  • Simple tasks: Default is sufficient

Summary

The 3 Things to Learn First

  1. /help — Starting point for everything
  2. Esc+Esc (/rewind) — Your safety net
  3. /context — Token monitoring

Recommended Commands by Scenario

Debugging & Fixing:

  • /doctor → Environment diagnostics
  • Esc → Stop runaway responses
  • /rewind → Undo changes

Large-Scale Tasks:

  • Shift+Tab (Plan Mode) → Strategic planning
  • /agents → Task delegation
  • /tasks → Persistent management (Ctrl+T to toggle)

Token Management:

  • /compact [instructions] → Manual summary (auto-compact also available)
  • /context → Check usage
  • /clear → Reset

Learning:

  • /output-style → Switch to Learning mode
  • "Grill me on changes" → Tough review
  • "Step by step" → Step-by-step explanation

Efficiency:

  • Create custom slash commands
  • Monthly review with /insights

Team Development:

  • /export + custom handover command → Session handover
  • Agent Teams → Collaborative work (experimental)
  • CLAUDE.md → Share rules

Token Management Checklist

  1. Check regularly with /context
  2. Let auto-compact handle long sessions (manual: /compact)
  3. Use /clear when switching tasks
  4. Use /rewind to remove unnecessary conversation
  5. Save with /export before starting a new session

Rules for Agents

  1. Start with 2-3
  2. Clarify rules in CLAUDE.md
  3. Maximum 5 running in parallel
  4. Monitor constantly with /statusline
  5. Use /compact when things get chaotic

Closing Thoughts

Claude Code gets updated so fast that this article's content will eventually become outdated.
Seriously, it's too fast. Things change while you're at work or sleeping — it's almost funny.
Please also check the official documentation.

Running /insights monthly reveals habits and improvement areas you wouldn't notice on your own.
Start there. Seriously, it's that good.

Reference Links

Top comments (13)

Collapse
 
nedcodes profile image
Ned C

the /insights command is genuinely underrated. i ran it after a month of use and realized i was manually doing things that could have been custom commands the whole time. the report basically told me "you type this exact prompt 4x a day, just make a slash command." felt called out.

the sub-agents vs agent teams comparison table is helpful too. i've been using sub-agents for dedicated test runners and it works well, but the token cost catches up fast if you're not watching /cost between runs. the tip about using Opus as commander + Sonnet for workers is the right call for keeping costs sane.

Collapse
 
akari_iku profile image
灯里/iku

@nedcodes
Hi Ned C! Thanks so much for the comment!!

I've been really overwhelmed by the amount of Anthropic's official docs and all the recent info. There are still so many commands I didn't know about. It's a good reminder to go back to the basics, read the primary sources carefully, and use them more often... yeah, I feel that lol.

I'm glad the comparison table was helpful! That makes me really happy!

Claudecode is excellent, but the costs are definitely something to watch out for, right...!

Using Opus as the commander and Sonnet for workers is a great tip, but I'm still a bit worried about relying only on Claudecode. So I've been experimenting with using OpenAI's CodeX for the implementation parts. It might work well as a hybrid approach. (The downside is my AI subscriptions keep piling up... haha)

Collapse
 
nedcodes profile image
Ned C

the hybrid approach with CodeX for implementation is interesting. do you treat each tool as its own isolated step, or is there some context handoff between them?

and yeah, the subscription creep is real. i started tracking monthly AI spend separately just to stay honest with myself about it.

Thread Thread
 
akari_iku profile image
灯里/iku

@nedcodes
Great question! It's not fully automated yet, but it's more than just isolated steps.

Here's what I've been experimenting with:

  1. Claude plans, Codex implements
    I use Claude (Opus) for high-level design and planning, then hand that off to Codex for implementation. Practically, I run both side by side using tmux — split panes in VS Code's terminal, Claude Code on the left, Codex CLI on the right, same project directory. So context lives in the shared codebase and plan files, no manual copy-pasting needed.

  2. Parallel runs for cross-review
    Same tmux setup — Claude Code as the main driver (e.g., large refactoring) and Codex as a second opinion running in parallel. They catch different kinds of bugs, which is the whole point.

  3. Orchestration via Agent SDK (exploring)
    The dream is using Agents SDK to orchestrate Claude as the planner and Codex as the coder automatically, with context passed via SDK. Still early stage for me, but the potential is exciting.

Honestly, part of it is also a philosophical thing — I don't want to depend too heavily on a single company's model. Same as in the real world, right? One person can't solve everything. Getting a "second perspective" from a different model catches blind spots. It's like applying "the right person for the right job," but for AI models lol.

Personality-wise too, Claude is more chatty and has great vibes for conversation, while Codex is more of a serious worker type. Both have their strengths!

And yeah, the subscription creep is painful... tracking it separately is smart, I should do that too haha.

Here's a Medium article that covers the Claude Code vs Codex comparison well if you're interested:
blog.ivan.digital/claude-code-vs-o...

Thread Thread
 
nedcodes profile image
Ned C

the tmux split-pane setup is practical. i've been thinking about similar workflows where you keep both agents in the same project directory and let the shared codebase be the communication layer instead of trying to pipe context between them programmatically. the Agents SDK orchestration angle is worth exploring, especially if you can define clear boundaries for what each model handles. curious whether you've hit cases where Claude and Codex disagree on approach and how you resolve that. also good call on not depending on a single provider, it's something i think about more now.

Thread Thread
 
akari_iku profile image
灯里/iku

@nedcodes
thanks for the thoughtful comment! the "shared codebase as communication layer" framing is exactly how i think about it too. no fancy piping, just let the filesystem be the interface.

on the Claude vs Codex disagreement question, great timing, i've been meaning to write this up lol

honestly, it's less about "who's right" and more about "whose perspective fills the gap." the models reflect their makers' philosophies more than you'd expect.

here's what i've noticed so far (still experimental, grain of salt etc):

  1. top-down vs bottom-up
    Codex tends to think architecturally first. flags structural issues early and pushes for refactoring before you write more code. Claude jumps in and starts building fast, which feels productive until you hit a wall of edge cases you didn't plan for. for bigger features, Codex's "slow down and think" approach usually wins out.

  2. over-engineering vs shortcuts
    Claude's failure mode is over-abstraction. too many layers, too much modularity for what you actually need. Codex goes the opposite way, cuts corners, skips edge cases. so i literally cross-review: feed Claude's output to Codex and vice versa. they catch each other's blind spots surprisingly well.

  3. greenfield vs precision work
    for creative/new features, Claude moves fast and generates ideas. but Codex sometimes ships a more "complete" result out of the box (tried making a 2D platformer with both. Codex auto-generated sprite cleanup, Claude didn't even build the floor lol). for infra or anything requiring precision, both struggle, but Codex grinds through test-fix cycles longer before giving up.

  4. planning style
    Claude gives you clean markdown with actionable snippets. Codex generates strict XML-style architecture docs, thorough but harder to read. personally i prefer Claude's style for day-to-day work, it just feels more... human to work with.

  5. so how do i resolve disagreements?
    you don't pick a winner. you treat it like a code review between two engineers with different backgrounds. not depending on a single provider isn't just a resilience thing, it genuinely produces better output when you let them challenge each other.

  6. hybrid workflow
    that's exactly why a hybrid approach is working well for me right now. something like: Claude for planning → Codex for review & implementation → Claude for final check. each model plays to its strengths in sequence.

  7. main brain vs sub brain?
    comes down to your preference and what you're trying to build. no universal right answer. i switch the lead role depending on the task, and that flexibility is part of the fun.

that said, this is my personal answer as someone who can freely pick tools. in a business context? often you don't get to choose. company policy, compliance, contracts, etc. can lock you into one provider. so the practical answer is... it depends lol

  1. work in progress i've been intentionally throwing the same tasks at both models to compare, and there's still a lot of testing to do. every model update from each company can shift the balance. what's true today might not hold in a few months.

still exploring, still learning. this turned out longer than expected lol. might be worth its own article at some point 😄

Thread Thread
 
nedcodes profile image
Ned C

this is a really solid breakdown. the cross-review pattern where you feed one model's output to the other is something i want to try more deliberately. the personality difference you mention (Claude chatty, Codex serious worker) maps to what i've seen too. curious if you've hit cases where their architectural disagreements were both wrong, or does one usually end up closer to the right call?

Thread Thread
 
akari_iku profile image
灯里/iku

@nedcodes
That's a rather intriguing question!

Short answer: one usually ends up closer to the right call. But there are some fun failure patterns.

  1. Both wrong in the same direction (too optimistic)

I design AI-integrated workflows for organizations. Every company has a different daily stack: Slack for chat, SharePoint for docs, Salesforce for CRM, Google Drive for storage... often a beautiful mess with no clean integration. Very common in Japan, probably everywhere though lol.

When I ask both Claude and Codex to plan architectures for these environments, they both propose elegant solutions that assume humans will actually follow the new workflow. They seriously underestimate how lazy people are. Both end up too idealistic about adoption.

  1. Both wrong in complementary ways (this one's sneaky)

Their different mistakes don't cancel out.
Distributed systems

  • Codex piles on unnecessary config and endpoints (bloat), Claude proposes fast implementations that ignore race conditions. Both "look like they work" but are fundamentally broken.
  • Large-scale refactoring: Claude suggests beautifully modular code that doesn't scale, Codex produces conservative rewrites of outdated patterns. Both miss the real bottleneck (e.g., a denormalized DB schema that crashes in production).

It's not "same mistake twice." It's "different mistakes that don't offset each other." Root cause? Both hallucinate with confidence.

  1. What I actually do about it
  • Feed both models detailed context about the client's existing tools and daily habits. Anchors them in reality instead of theory.
  • Split PDCA: Plan and Act stay with the human, Do and Check get delegated to AI. Human stays the architect and final judge.
  • Same task to both models, let them "debate," then you make the call. This alone cuts failure rates dramatically. This is why I side-eye the "automate everything!" hype a bit. The real value of cross-review isn't picking the winner. It's that disagreement itself is a signal: when they diverge, think harder, don't just pick one.

It all boils down to the basics: context is key.

Thread Thread
 
nedcodes profile image
Ned C

the "both wrong in complementary ways" failure mode is the one i hadn't thought through. i was assuming cross-review works because disagreement is a signal, but if they're both confidently wrong in different directions you just end up with two plausible-looking bad answers instead of one. that's way harder to catch than one obviously wrong output. your PDCA split where the human stays as architect makes more sense for that scenario than trying to automate the tiebreaker

Thread Thread
 
akari_iku profile image
灯里/iku

@nedcodes
haha yeah exactly!
When both outputs look plausible, you actually let your guard down. that's the sneaky part.

and to be clear, i'm not against automation at all! i just think LLMs genuinely can't tell when they're wrong, so someone's gotta cover that blind spot. it's less about control and more about... caring for the process, i guess?

we're still super early in figuring out how humans and AI work together. i'd love to keep exploring what that looks like with people who actually think about it like you do;)

Thread Thread
 
nedcodes profile image
Ned C

the tricky part is that the failure modes are different from what we're used to, so we don't even have good instincts for when to double check yet. i think that's what makes the "both wrong in complementary ways" thing so dangerous, you can't just pattern match your way out of it

Thread Thread
Collapse
 
akari_iku profile image
灯里/iku

@nedcodes
i tried to reply in the thread but dev.to said "you two talk too much" lol.
so continuing here!

good morning! glad to be part of your coffee routine now haha.
it's 11pm here so we're on opposite ends of the day.

and yeah, that's exactly it. when two outputs are both wrong in complementary ways, pattern matching won't save you. the instincts we built from reviewing human code just don't transfer cleanly.

that's exactly why the human has to stay in the loop not just as a user, but as the one who guards and makes the judgment calls. no model can own that responsibility yet, and personally, even if someday a model gets praised for being "reliable enough," i still think in any real business context, accountability has to stay with humans.

for hobby projects, sure, let AI take the wheel and see what happens that's half the fun! but at the end of the day, it's still a human who receives the result and decides "okay, what now?" lol

which is honestly why i think engineers aren't going anywhere. the role shifts, but it shifts toward something harder — redesigning architectures that have AI-generated code baked in, maintaining systems where you didn't write half the code, and knowing when to trust the output and when to push back. structure over instinct becomes the whole game.

anyway, have a great day at work~!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.