Daniel Stolf

Posted on May 29 • Originally published at linkedin.com

I Was the Retrieval Layer

#ai #programming #deepseek #chatgpt

I once spent half a day debugging code that was completely correct.

The problem wasn't the logic. The problem was that the functions the LLM had written didn't exist.

Not deprecated. Not renamed. Never existed.

Here's what had happened: I caught the model using an outdated API parameter and corrected it. Instead of fixing the issue, it started compensating: hallucinating function names, inventing method signatures, generating plausible-looking code that had no basis in reality. The more I pushed back, the deeper into fiction it went.

That afternoon is why I started doing RAG before the industry had a name for it.

At the time, I was building a Kubernetes Operator using free-tier LLMs (ChatGPT and DeepSeek). No agentic tooling. No memory. No orchestration frameworks. Just a chat window and whatever I could fit into the context.

I had two problems:

Problem 1:
The model didn't know current APIs. Kubernetes controller-runtime, Operator SDK, and Delphix APIs move fast. The model's training data was already stale. Left to its own devices, it would confidently generate code against API versions that no longer existed. When corrected, it would sometimes make things worse.

Problem 2:
The context window ran out. Long sessions degraded. The model would start contradicting earlier decisions, losing track of architecture choices, rehashing solved problems. On a free tier, hitting the limit meant starting over and losing everything.

Here's what I built to solve both:

For the API problem, manual retrieval and injection. Before writing any implementation code for a new component, I would research the relevant documentation myself. Then I'd summarize it (sometimes by hand, sometimes by feeding the raw docs into a separate chat session just for summarization) and inject only the relevant fragments into the working session. Confirmed, current, scoped to exactly what the model needed.

The model wasn't searching. I was the retrieval layer.

For the context problem, session state documents. When a session was getting too long, I'd ask the model to generate a structured Markdown file: current architecture decisions, what had been built, what was left, key constraints and open questions. Then I'd start a fresh session, paste the MD file as context, and continue exactly where I'd left off.

The model wasn't remembering. I was the memory layer.

What I was doing, without knowing it:

Retrieval-Augmented Generation: surfacing accurate, current information and injecting it as context to ground model outputs.

Session state management: the manual precursor to what agent memory systems now handle automatically.

Multi-session LLM chaining: using one model to process and compress information for another, before orchestration frameworks made this trivial.

I didn't invent these patterns. I arrived at them by necessity, the hard way, after a hallucination loop cost me half a day.

That's usually how the best practices emerge.

The tools have improved dramatically since then. But the underlying problem (models that hallucinate on fast-moving APIs, context that degrades over long sessions, outputs that need grounding in verified information) hasn't gone away. It's just more visible now, at scale, in enterprise deployments.

The engineers and companies figuring this out today are rediscovering the same lessons. Usually also the hard way.

Have you hit a hallucination loop that cost you real time? What was your fix?

Top comments (2)

Harjot Singh • May 31

"I was the retrieval layer" is a poignant framing - the realization that a lot of knowledge work was, functionally, humans doing retrieval-and-synthesis (find the relevant info, condense it, hand it over), and that's exactly the slice AI now does instantly. It's an unsettling mirror: the part of the job that felt like expertise was sometimes just being a good RAG pipeline. The discomfort is real and worth sitting with rather than waving away.

But the hopeful read is the same shift everyone smart is making: if retrieval/synthesis is commoditized, human value moves up to judgment, taste, knowing which question to ask, and verifying the machine's output is actually right. You stop being the retrieval layer and become the layer that decides what matters and catches what's wrong. That's the bet I'm making with Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - let the machine do the retrieval/execution, humans operate at intent + judgment. The floor rises; you rise with it. Genuinely reflective piece - this is the emotional reality under the productivity hype. Did landing on "move up to judgment" feel like a relief, or are you still sitting in the unsettling part? Honest answer's more useful than the tidy one.

Daniel Stolf • May 31

Both, honestly... and the ratio shifts week to week.

The relief was in naming it. For a long time the work felt high-skill in the moment and oddly disposable in retrospect, and I couldn't articulate why. Realising the slice that felt like expertise was the slice AI now does instantly lands hard... but it lands. You can plan around a named thing.

The unsettling part sits underneath your "move up to judgment" framing. I think it's right and I'm betting on it too. But this isn't the first time we've watched a layer get eaten, and each time the next swallow comes faster. Part of me makes the bet with full conviction. Part of me is already wondering what gets eaten next. Taste? Spec authoring? Agents are starting to draft specs now too.

What's helped me sit with that: "move up to judgment" isn't a one-time relocation. It's a loop. Every time AI swallows a layer, you re-locate to wherever verification still has to happen, and build whatever scaffolding lets you actually verify there. The retrieval layer became the orchestration layer became the spec layer became... whatever's next. The work isn't being safe from the next swallow. It's staying the person who decides what gets verified, and being honest when that location moves.

So: relief in the framing, ongoing discomfort in the trajectory. Your Moonshift bet and what I'm building converge on the same instinct: machine does the execution, human holds the line at intent and verification. Probably why your comment hit the way it did.