I Tested Whether Cursor's Auto Mode Actually Picks the Right Model

#cursor #ai #testing #webdev

There's a recurring debate on the Cursor forum about Auto mode. Some people swear by it. Others are convinced it's quietly routing their complex tasks to cheaper, faster models and giving them worse output. The "Auto is cheating you" take comes up pretty often in model discussion threads.

I wanted to find out, so I tested it.

The Setup

I created 5 tasks at increasing complexity levels and ran each one twice in Cursor:

Run A: Auto mode (let Cursor pick the model)
Run B: Sonnet 4.5 (manually selected)

Same prompt, same project, same files. The only variable was who picked the model.

The Tasks

Simple: Add a loading spinner to a button component
Medium: Refactor a 200-line function into smaller modules with proper error handling
Complex: Debug why an async data fetch sometimes returns stale data in a React component with multiple useEffect hooks
Architecture: Design the folder structure and module boundaries for a multi-tenant SaaS dashboard
Reasoning: A test suite that passes individually but fails when run together. Find the shared state causing the conflict.

The Results

Task	Type	Auto	Sonnet 4.5	Faster
Loading spinner	Simple	21.6s	21.5s	Tie
Refactor function	Medium	41.4s	36.9s	Sonnet
Debug stale data	Complex	26.0s	28.8s	Auto
Architecture design	Architecture	66.1s	83.4s	Auto
Test debugging	Reasoning	44.6s	39.9s	Sonnet

What I Expected

I expected Auto to pick a lighter model for the simple tasks and maybe get away with it, then struggle on the harder ones where you'd want Opus or Sonnet doing the heavy lifting.

That's not what happened.

What Actually Happened

Auto was faster on the two hardest tasks. Not by a little. On the architecture task, Auto finished 17 seconds ahead of Sonnet. On the React debugging task, it was almost 3 seconds faster.

The output quality looked the same across all 5 tasks. Based on reviewing the generated code, both modes solved every problem correctly. I didn't notice any obvious hallucinations or missing pieces in either set of outputs.

The only interesting difference was on Task 5 (the shared state test bug). Auto fixed it by moving the setup into a global beforeEach. Sonnet fixed it by calling .clear() on the shared state between tests. Both approaches work. Neither is obviously better.

The Limitation I Couldn't Get Around

I ran this through the Cursor CLI, which doesn't show which model Auto actually selected for each task. So I can tell you the outcomes were equivalent, but I can't tell you whether Auto picked Sonnet for everything, or whether it routed the architecture task to Opus and the spinner task to something lighter.

That routing data would make this a much more interesting test. If anyone knows how to pull model metadata from Auto mode responses, I'd love to re-run this with that visibility.

What This Means

Based on this test, the "Auto is cheating you" concern doesn't hold up. At least not for these task types. Auto matched or beat manual Sonnet selection on every metric I could measure.

That doesn't mean Auto is perfect. Five tasks is a small sample, all in a clean TypeScript/React project with minimal context. A large, messy codebase with thousands of files might route differently. I also only tested one language and framework. There could be edge cases where Auto routes poorly, especially for very domain-specific work. But the blanket "never use Auto, always pick your model manually" advice that floats around the forum doesn't seem supported by the data.

If you're spending mental energy on model selection for every prompt, this test suggests you can probably stop. Auto handled it fine.

My Takeaway

I went into this expecting to write "Auto mode is quietly downgrading your work." Instead I'm writing "Auto mode is fine, maybe stop worrying about it."

Sometimes the boring answer is the right one.

If you've run your own Auto vs manual comparisons, I'd be curious what you found.

Test environment: Cursor CLI, fresh TypeScript/React project, February 2026. Models available at time of testing included Sonnet 4.5 and Opus among others.

Top comments (1)

Miracle • Feb 17