DEV Community

Cover image for From Vibe Coding to Agentic Engineering: What Karpathy Got Right (and What's Missing)
Andrey Kolkov
Andrey Kolkov

Posted on • Edited on

From Vibe Coding to Agentic Engineering: What Karpathy Got Right (and What's Missing)

On February 4, 2026, Andrej Karpathy — the person who gave us "vibe coding" exactly one year earlier — declared it passé. His replacement term: Agentic Engineering.

"Agentic — because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents. Engineering — because there is an art & science and expertise to it."

The term landed in a landscape already primed for it. Anthropic had just released their 2026 Agentic Coding Trends Report. A Hacker News thread asking "Do you have any evidence that agentic coding works?" pulled 461 points and 455 comments. An academic paper on arXiv had already formalized the paradigm split. And within days of Karpathy's post, Addy Osmani (Google) published a detailed framework and announced an O'Reilly book titled Beyond Vibe Coding.

I've been practicing and writing about this exact shift since January 2026 — a month before Karpathy named it. My Smart Coding article laid out the same core idea: engineering discipline plus AI as an accelerator, not a replacement.

I'm not claiming credit for the term. But the conversation is still incomplete. Karpathy gave us a great name. Osmani gave us a great workflow. Both frameworks miss a critical piece that I've found essential across 35+ production projects.

The Three Paradigms: Understanding the Spectrum

The community keeps framing this as a binary: Vibe Coding vs. Agentic Engineering. That's wrong. There are actually three distinct paradigms, and understanding all three is the key to working effectively with AI.

Vibe Coding: Exploration Mode

Karpathy's original description (February 2025): "You fully give in to the vibes, embrace exponentials, and forget that the code even exists."

When it works: Prototypes, hackathons, learning new APIs, feasibility spikes. You prompt, you accept, you run, you see if it works. Fast and cheap — and the code is disposable.

When it fails: Production. As a survey of 18 CTOs revealed, 16 experienced production disasters directly caused by unreviewed AI-generated code. The code compiled. The tests (auto-generated, shallow) passed. The bugs were subtle, the security vulnerabilities real.

Agentic Engineering: Orchestration Mode

Karpathy's new framework: you orchestrate AI agents while maintaining oversight.

What it gets right:

  • Developers are no longer typists — they're orchestrators
  • Multiple agents can work in parallel (code, test, review)
  • Quality gates and automated validation are essential
  • The skill is in specification and oversight, not syntax

What it misses: More on this below.

Smart Coding: The Unifying Framework

My take: Smart Coding isn't the opposite of either Vibe Coding or Agentic Engineering. It's the framework that encompasses both.

┌─────────────────────────────────────────────┐
│              SMART CODING                   │
│         (The Meta-Framework)                │
│                                             │
│  ┌─────────────┐    ┌───────────────────┐   │
│  │ Vibe Coding │    │    Agentic        │   │
│  │ (Explore)   │───→│    Engineering    │   │
│  │             │    │    (Build)        │   │
│  └─────────────┘    └───────────────────┘   │
│         ↑                    │              │
│         └────────────────────┘              │
│    Feedback loop + Knowledge accumulation   │
└─────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Smart Coding is knowing when to Vibe and when to Engineer — and, crucially, how to make each session compound on the last. It's the judgment layer that neither paradigm addresses on its own.

Vibe Coding answers: how do I explore quickly?
Agentic Engineering answers: how do I build with AI agents?
Smart Coding answers: how do I decide, learn, and get better at both over time?

What Karpathy Got Right

Credit where it's due — Karpathy nailed several things:

1. The 99% observation is accurate.
This isn't hyperbole. Boris Cherny, head of Claude Code at Anthropic, says he hasn't written a line of code by hand in over two months — shipping 22-27 PRs per day, all 100% AI-generated. OpenAI researcher Roon says the same: "100%, I don't write code anymore." Anthropic's CPO Mike Krieger confirms that for most products it's "effectively 100%", with the company-wide average at 70-90%. In my own daily work across multiple Go ecosystems, I'd put it at 90-95%. My role is architecture, specification, review.

2. "Engineering" is the right word.
Calling it "engineering" demands rigor. It's not "agentic vibing" or "agentic prompting." The word engineering implies standards, validation, discipline.

3. The timing is right.
Models in early 2026 (Claude Opus 4.6, GPT-5, Gemini 2.5) are genuinely capable of multi-step autonomous work. Agentic workflows that failed in 2024 now succeed reliably with proper guidance.

4. The self-awareness is refreshing.
Karpathy calling his original vibe coding tweet a "shower of thoughts throwaway" is honest. The fact that it resonated shows the community was ready for the vocabulary, not that the term was deeply considered.

What's Missing: The Bidirectional Learning Gap

Neither Karpathy nor Osmani fully address this: the AI doesn't learn from you, and most developers don't systematically learn from AI.

Osmani's framework says: "Start with a plan. Direct. Review. Test. Own the codebase." All correct. But it treats AI as a stateless contractor — you give instructions, it delivers code, you review. Next session, same blank slate.

This is like hiring a developer who forgets everything every morning.

Teaching Your AI Agent

In my Smart Coding article, I described the Knowledge File Pattern — a dedicated context file that your AI agent reads at the start of every session:

# PROJECT_CONTEXT.md

## Architecture
- Hexagonal architecture with ports/adapters
- All external services accessed through interfaces
- No business logic in HTTP handlers

## Conventions
- Error wrapping with context, never naked errors
- Structured JSON logging with request_id
- Table-driven tests for validation logic

## Domain Knowledge
- "Settlement" = end-of-day batch processing
- Customer IDs are UUIDs, Order IDs are sequential

## Known Pitfalls
- Redis cluster doesn't support MULTI in our setup
- Legacy API returns 200 for errors — check body
- Date fields from Partner X: ISO but no timezone

## Lessons Learned
- Week 1: AI suggested time.Now() → added: "Use injected clock"
- Week 2: AI used fmt logger → added: "Use zerolog with context"
- Week 3: AI flat package layout → added: "Domain-driven packages"
Enter fullscreen mode Exit fullscreen mode

This file is a living document that accumulates project wisdom. Each AI mistake becomes a permanent correction. Each session starts where the last one ended.

In practice, I go further. Each project has a STATUS.md that the agent reads at session start and updates on exit — current progress, near-term plans, discovered problems that need investigation. Detailed tasks go into an internal kanban broken into stages with interdependencies. There are LINTER_RULES files that agents read during coding and update when they discover new patterns — so code passes golangci-lint on the first try, not the fifth.

Each repository also maintains a research knowledge base — results of investigations, benchmarks, comparisons with alternative approaches, and crucially, why specific design decisions were made, with the full context of the decision. For an ecosystem like GoGPU with 9 interrelated repositories, there's an ecosystem-wide research base on top of per-repo ones. When you revisit a decision six months later, you don't have to reverse-engineer your own reasoning — it's documented, with the alternatives you considered and why you rejected them.

And the knowledge is hierarchical. The root-level context knows about all repositories in the ecosystem but doesn't go deep. Each project essentially has its own agent that fully owns its repository — knows every module, every convention, every quirk. It practically lives inside the codebase. When I switch between projects, the context switches with me, and each agent picks up right where it left off.

The agent doesn't just know how to write code for this project — it knows where we left off, what problems we've spotted, and what's next.

The Feedback Loop Karpathy Doesn't Mention

Session 1: AI makes 20 mistakes. You correct 20.
           → 20 corrections added to knowledge file.

Session 2: AI makes 5 mistakes (15 prevented by context).
           → 5 new corrections added.

Session 3: AI makes 1 mistake.
           → Your knowledge file is now a comprehensive project spec.
Enter fullscreen mode Exit fullscreen mode

This is bidirectional learning: you learn AI's patterns and capabilities, AI (via context) learns your project's constraints and conventions. Over weeks, this compounds dramatically.

Without this feedback loop, agentic engineering is like conducting an orchestra that can't remember the piece they played yesterday. You spend half your time re-explaining context instead of making progress.

The exchange is asymmetric, and that's the point. You know better: project context, business domain, team conventions, past mistakes, infrastructure quirks, stakeholder constraints. AI knows better: syntax across languages, algorithm tradeoffs, library APIs, boilerplate patterns, cross-ecosystem solutions. Smart Coding makes this exchange systematic and cumulative, not ad hoc and forgotten.

Knowledge Consolidation: Keeping Context Sharp

There's a catch that nobody talks about. Knowledge files grow. Fast. After weeks of daily sessions, your PROJECT_CONTEXT.md becomes a sprawling document with redundant entries, outdated lessons, and contradictory notes from different project phases. The AI agent starts drowning in its own context — reading hundreds of lines of accumulated wisdom where half no longer applies.

You need to periodically consolidate your knowledge files. Think of it as garbage collection for project context.

Week 1-4:   Knowledge file grows organically (additions only)
Week 5:     Consolidation pass
            - Remove lessons the AI no longer needs (patterns now habitual)
            - Merge duplicate entries
            - Update outdated conventions (API changed, dependency upgraded)
            - Restructure: separate "always relevant" from "situational"
            - Archive historical context that's no longer actionable
Enter fullscreen mode Exit fullscreen mode

In practice, I do a consolidation pass every few weeks — or whenever I notice the agent making mistakes it shouldn't, which often means it's losing important rules in a sea of less relevant ones. Context windows are large but not infinite, and signal-to-noise ratio matters more than volume.

The consolidation itself is a form of learning. Reviewing what accumulated forces you to reflect: which patterns stuck? Which conventions evolved? What assumptions turned out wrong? It's a checkpoint for both the project and your own understanding of it.

Pro tip: Keep a separate archive file for removed entries. You might need them if you revisit an old subsystem or onboard someone new. The active knowledge file should be lean and current — a briefing, not a history book.

Real-World Case: Building Where Go Was Weakest

Enough theory. What does Smart Coding look like at scale?

It started last summer, when I decided to publish my internal HDF5 and MATLAB parsers as open-source libraries. These were working code extracted from private projects — battle-tested, but not shaped for public consumption. Packaging them properly (documentation, CI, tests, API design) was the first exercise.

Then something clicked. I had years of accumulated private code solving problems where Go was traditionally weak — and modern AI agents (Claude Opus 4.6 in particular) made it feasible to extract, refactor, and publish at a pace I couldn't have imagined before. Without these tools, I could realistically maintain one, maybe two open-source projects alongside my regular work. With Smart Coding, I've shipped 35+.

There's a personal angle here too. After COVID-19, my vision deteriorated — I developed polyopia and ghosting, where letters and digits multiply across different depth layers on screen, like looking through layered glass at night in a subway tunnel. Not simple doubling — images shift along random axes, some closer, some further, overlapping unpredictably. Post-COVID corneal and lens changes are well-documented, and mine made reading code character by character genuinely exhausting. Agentic engineering changed my relationship with the screen: I focus on code flow, architecture, and diffs — not on typing every semicolon and bracket. My eyes track the big picture while agents handle the detail work. It's not just a productivity story. For developers dealing with vision impairment, RSI, or other physical constraints, this way of working can be genuinely life-changing.

I want to be honest about this: AI didn't design these libraries. I did. The architecture, the API decisions, the choice of which problems to solve — that's decades of engineering experience. But AI turned what would have been years of solo implementation work into months. That's the force multiplier effect in action.

What that looks like in practice — across a systematic effort to fill Go's blind spots:

The Pattern: Identifying Architectural Gaps

Go is excellent for servers, CLI tools, and infrastructure. But it has well-known blind spots:

Domain The Gap What Existed
GPU/Graphics No unified ecosystem (vs. Rust's wgpu+naga+bevy) Individual libs (Ebiten, Gio, Fyne), all CGO-based
Regex stdlib is intentionally slow (single RE2 engine) No multi-engine alternative
ML/Deep Learning Python dominance, Rust has Burn No production Go framework
Scientific formats (HDF5, MATLAB) CGO wrappers only, no write support No pure Go read/write implementations
Race detection Built-in requires CGO Can't use on Lambda/Alpine
PDF processing Limited libraries No enterprise-grade option

AI agents can't identify these gaps. They don't understand ecosystem dynamics, community pain points, or the strategic value of filling a specific niche. That's architecture thinking — the human's job.

GoGPU: 380K+ Lines of Pure Go Graphics

GoGPU is a full GPU computing ecosystem: 380,000+ lines of pure Go across 9 repositories. A WebGPU implementation, a 2D graphics library (155K LOC), a shader compiler translating WGSL to SPIR-V/MSL/GLSL/HLSL, a UI toolkit (54K LOC, 1,400+ tests), PDF and SVG export backends. 312+ stars and growing. Five GPU backends (Vulkan, DirectX 12, Metal, GLES, Software), three platforms.

Go had graphics libraries before this — Ebitengine for 2D games, Gio and Fyne for UI. But what it lacked was what Rust has built over years: a cohesive, integrated GPU ecosystem. Rust's wgpu + naga + Bevy + Iced form an interconnected stack from low-level HAL to high-level UI — all pure Rust, all interoperable. Go had nothing comparable. Every existing option required CGO, and there was no shader compiler, no unified abstraction layer, no path from GPU compute to rendered pixels without leaving Go.

GoGPU fills that gap: a pure Go stack from shader compilation to 2D graphics to UI toolkit.

This was impossible to Vibe Code. The architecture decisions — how to structure a shader compiler pipeline, how to map WebGPU's memory model to Go's garbage collector, how to design cross-platform rendering abstractions across five GPU backends — require deep understanding of both GPU programming and Go's runtime characteristics.

But it was also impossible without AI agents. Translating shader compiler patterns from Rust's naga to idiomatic Go? Implementing a WebGPU HAL across five backends? That's exactly where agentic engineering shines — clearly specified translation tasks with well-defined inputs and outputs, executed at a pace no single developer could match.

The Smart Coding approach:

  1. Vibe phase: Explore Rust's naga codebase, understand the architecture (throwaway spikes)
  2. Architecture phase: Design Go-idiomatic module boundaries, memory layout, API surface (human)
  3. Agentic phase: AI agents implement well-specified components under strict review
  4. Knowledge accumulation: Each module taught the AI about Go-specific GPU patterns, making subsequent modules faster

Coregex: Multi-Engine Regex, 3-3000x Faster Than stdlib

Coregex (75 stars) is a multi-engine regex library inspired by Rust's regex crate architecture — multiple execution engines (Thompson NFA, Pike VM, one-pass DFA, bounded backtracker) with an intelligent meta-engine that selects the optimal strategy per pattern. It outperforms Go's stdlib by 3 to 3000x depending on the pattern.

You can't Vibe Code a multi-engine regex library. The meta-engine selection logic, Thompson NFA construction, Pike VM execution, SIMD-accelerated literal search — each requires deep algorithmic understanding and careful architectural decisions about when to dispatch to which engine. But you can use AI agents to implement well-specified automaton transitions, generate comprehensive test suites from regex grammar specifications, and benchmark systematically.

The knowledge file for this project grew to include regex-specific patterns:

## Regex Engine Conventions
- NFA states use uint32, not int (SIMD alignment)
- All character class operations must be branchless
- Thompson construction: no epsilon-removal optimization
  (benchmarks showed it's slower for our pattern distribution)
Enter fullscreen mode Exit fullscreen mode

Every session with AI was more productive than the last because the context accumulated.

Born: Go's Answer to Burn

Born (39 stars) is a production ML framework with PyTorch-like API, automatic differentiation, and type-safe tensors — Go's answer to Rust's Burn framework. Where Burn brought deep learning to Rust with swappable backends and ONNX interop, Born brings it to Go with the same philosophy: single binary deployment, compile-time type safety, and zero external dependencies.

Here, the cross-ecosystem research pattern was critical. AI agents researched Burn's backend trait architecture, PyTorch's autograd implementation, JAX's tracing approach, and Tinygrad's minimalist design. But the human decided: Go's strengths (single binary deployment, generics-based type safety, goroutine parallelism) dictate a different design than either Python's dynamic typing or Rust's ownership model allows.

Smart prompt vs. naive prompt:

Naive: "Implement backpropagation in Go"
→ Gets a textbook implementation that ignores Go's type system

Smart: "I've studied Burn's backend trait architecture, JAX's tracing,
and PyTorch's autograd. For Go, I want backpropagation that:
- Uses Go generics for type-safe tensor operations
- Leverages goroutines for parallel gradient computation
- Produces a computational graph that can be serialized
- Follows the conventions in PROJECT_CONTEXT.md
What are the tradeoffs vs. a tape-based approach for our use case?"
→ Gets an informed implementation that fits the ecosystem
Enter fullscreen mode Exit fullscreen mode

The Pattern Across All Projects

Racedetector (30 stars) — pure Go race detector built on multiple research papers: FastTrack (Flanagan & Freund, PLDI 2009), escape analysis integration, shadow memory, vector clocks, and AST instrumentation. 359 tests ported from Go's official race detector suite. This required studying concurrent systems theory across several academic papers and synthesizing them into a cohesive architecture — then AI accelerated the implementation of each well-specified component.

HDF5 (20 stars) — full read/write support for HDF5 2.0.0 (Format Spec v4.0), including chunked datasets, GZIP/LZF/BZIP2 compression, hyperslab selection (10-250x faster partial reads), and 100% pass rate on the official 378-file test suite. MATLAB (8 stars) — read/write for both legacy v5-v7.2 and modern HDF5-based v7.3+ formats, covering all numeric types, complex numbers, sparse matrices, structures, and cell arrays. Both are pure Go, zero CGO — the only complete implementations in the Go ecosystem. Designing the architecture for these complex binary format specifications required deep human analysis; AI agents then implemented the byte-level parsing and serialization under strict test coverage.

GxPDF — started when a friend asked me to help parse bank statements from PDFs. Every bank had its own table format for transactions and balances. What began as a quick script grew into an enterprise PDF library — because the PDF spec is 1,300 pages and "quick" doesn't exist in PDF parsing. Human identified the subset that matters for production use; AI implemented the extraction pipeline.

GRPM — next-gen package manager for Gentoo. Here, AI agents first helped study and translate the entire Gentoo PMS (Package Manager Specification) into structured Markdown — creating a machine-readable knowledge base from dense technical documentation. That knowledge base then became the spec for implementation: each section mapped to code, tracked via a PMS compliance matrix. This is bidirectional learning in action — agents helped build the context, then used that same context to implement. SAT-based dependency resolution still required human CS fundamentals; but the documentation-to-implementation pipeline was pure agentic workflow.

Every single project followed the same cycle:

  1. Human identifies the gap and designs the architecture
  2. Vibe exploration of reference implementations
  3. Knowledge file creation with project-specific constraints
  4. Agentic implementation with growing context
  5. Bidirectional learning compound effect

This is Smart Coding. Not Vibe. Not Agentic. Both, orchestrated by engineering judgment.

The Architect Mindset: Why Experience Still Matters

Karpathy wrote: "I've never felt this much behind as a programmer."

I'd reframe that: you've never been more valuable as an architect.

When AI handles 95% of implementation, what remains is the 5% that determines whether the software succeeds or fails:

  • System boundaries and module design — AI optimizes locally, architects think globally
  • Technology selection with full context — AI knows APIs, architects know ecosystems
  • Tradeoff evaluation — AI can list options, architects can weigh them against constraints AI doesn't see
  • Failure mode anticipation — AI builds the happy path, architects prevent the disasters
  • Conceptual integrity — AI generates code, architects maintain vision
Traditional:  Junior → Mid → Senior → Lead → Architect
              (Years of progression)

With AI:      Every developer must think like an architect
              (AI handles the junior-to-senior implementation work)
Enter fullscreen mode Exit fullscreen mode

The Hacker News discussion on agentic coding confirmed this: the consensus was that "you get the most value when you know exactly what you want." Clear specifications matter more than model capability.

But knowing what you want requires experience. And experience compounds faster when you maintain a systematic feedback loop with your AI tools.

Practical Smart Coding Framework

The workflow I use daily, distilled from months of production work:

Phase 1: Assess (5 minutes)

Before touching any tool:

☐ Am I exploring or building?
☐ What are the system boundaries affected?
☐ What would I sketch on a whiteboard?
☐ Can I write a one-paragraph spec for this task?
Enter fullscreen mode Exit fullscreen mode

If you can't write the spec, you're not ready for Agentic Engineering. Start with a Vibe spike.

Phase 2: Prepare Context

Update your knowledge file with anything relevant to this session:

☐ New constraints discovered since last session?
☐ Known pitfalls for this specific task?
☐ Conventions that must be followed?
☐ Related decisions already made?
Enter fullscreen mode Exit fullscreen mode

This takes 2-3 minutes and saves 30+ minutes of correcting AI mistakes.

Phase 3: Execute (Vibe or Agentic)

Vibe mode (exploration):

  • Time-boxed to 30-60 minutes
  • Goal: knowledge, not code
  • Everything produced is throwaway

Agentic mode (building):

  • Detailed specs per component
  • AI agents implement, you review every diff
  • Tests required before proceeding
  • Knowledge file updated with lessons learned

Phase 4: Capture

After every session:

☐ What did AI get wrong? → Add to knowledge file
☐ What did AI teach me? → Add to personal notes
☐ What pattern emerged? → Document for future sessions
☐ Did my architecture assumptions hold? → Adjust if not
Enter fullscreen mode Exit fullscreen mode

The 70/30 Rule

A practical heuristic: spend 70% of time on architecture, specification, review, and validation. Let AI accelerate the remaining 30% — the mechanical implementation.

This ratio seems counterintuitive. But the 70% investment is what makes the 30% valuable. Without understanding, AI output is just random characters that happen to compile.

The Terminology Doesn't Matter. The Practice Does.

Call it Smart Coding. Call it Agentic Engineering. Call it whatever you want. The principles are the same:

  1. You own the architecture. AI owns the implementation.
  2. You validate everything. AI generates candidates.
  3. Context compounds. Every session builds on the last.
  4. Mode awareness matters. Know when to explore and when to build.
  5. Experience is your moat. AI levels the playing field on syntax. Architecture and judgment are your edge.

Karpathy gave us a great term. The industry needed vocabulary to distinguish disciplined AI-assisted development from reckless prompt-and-pray. "Agentic Engineering" is professional, precise, and descriptive.

But the term alone isn't enough. Without the bidirectional learning loop — without systematically teaching your AI agent about your project while learning from its capabilities — you're orchestrating an amnesiac. Productive today, starting from zero tomorrow.

Smart Coding is the practice that makes Agentic Engineering compound.


In January 2026, I published Smart Coding vs Vibe Coding — exploring the same ideas before Karpathy coined "Agentic Engineering." The principles hold up. The vocabulary evolved. The practice continues.

How are you managing the Vibe-to-Agentic transition? Are you maintaining knowledge files? Have you found the right explore/build ratio for your work? I'd love to compare approaches — share your experience in the comments.


About the Author

I'm Andrey Kolkov — a Full Stack developer (Go backend, Angular frontend) maintaining 35+ open source projects. I build in the spaces where Go is traditionally weakest: GPU computing, regex engines (3-3000x faster than stdlib), ML frameworks, and scientific computing. Each project is a daily exercise in Smart Coding — using AI as a force multiplier while maintaining engineering rigor.

GitHub: @kolkov | Projects: GoGPU, coregx, born-ml, scigolib


Tags: #ai #programming #productivity #softwaredevelopment

Top comments (0)