DEV Community

Building Your First Real GPT Is Not a Prompting Exercise

Simon Griffiths on May 29, 2026

I recently built my first non-trivial GPT. The interesting lesson was not about clever prompting. It was almost the opposite. The GPT only starte...

Read full post

Gilder Miller • May 29

Building a reliable GPT isn’t about crafting the perfect prompt: it’s about treating it like a tiny software project. I found that splitting knowledge into clear, focused files made it far more consistent. The prompt should control behavior, not carry all the information. Testing early and often caught gaps I would have otherwise missed. How do you keep track of updates as the knowledge base grows?

Simon Griffiths • May 30

I put everything into git, including the source files and the intermediate build files. My build process outputs my knowledge modules into a set of markdown files, and one instruction also in markdown. Then the final step is to manually update the GPT, which just takes a few minutes. Not ideal, but it’s the best I could come up with.

Gilder Miller • May 31

I like this approach. Treating the knowledge base as code makes updates way less scary once it starts growing. Have you looked at automating that last GPT update step, or is the manual review intentional as a safety check?

Simon Griffiths • Jun 1

At the moment I don't think there's a way to automate this - I'd love it to be able to copy everything from a repo or set of files. If you are aware of a mechanism, I'd love to hear about it

Gilder Miller • Jun 3

Yeah, fully automatic sync from a repo into a GPT still isn’t really a first-class workflow. The closest practical setup I’ve seen is a CI step that builds your markdown bundle and outputs a single artifact you upload manually or reattach to the GPT. If you move toward the API side, the Assistants style file search setup gets you closer to programmatic updates, but it’s still not a clean git hook experience.

Keeping that final manual step actually isn’t a bad thing either since it forces a quick sanity check before publishing changes.

Simon Griffiths • Jun 3

Yes, this is essentially what I do, but I have 6 files, one for each knowledge modules. The idea was that each module could be maintained by a different SME, but I’m not sure that’s actually feasible. For now I’ll stick with the six files manually updated

Harjot Singh • May 31

Strongly agree - "it's not a prompting exercise" is the lesson people learn after their first GPT that demos great and falls apart in real use. A real GPT is a product: it needs the right data/knowledge wired in, tools it can actually call, guardrails for when it's out of its depth, and handling for the messy inputs real users throw at it. The prompt is maybe 10% of that; the other 90% is the engineering around it that makes it reliable.

This is the same gap that shows up everywhere in applied AI - the model/prompt is the easy, visible part, and the durable value is in the unglamorous scaffolding (knowledge grounding, tool wiring, failure handling, verification). That's literally the thesis behind Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - the prompt starts it, but the product is the harness that turns it into something real and shippable, which is also why a build holds ~$3 flat. Spot-on post, the "real GPT = product, not prompt" framing needs to be louder. What was the biggest non-prompt thing that surprised you building your first real one - the knowledge/data wiring, the tool integration, or the edge-case handling?

Simon Griffiths • May 31

The most interesting part was how far the knowledge modules I built diverged in structure from the original input documents. I didn’t impose this structure, I worked with codex iteratively to discover the structure over a few sessions. It wasn’t about me imposing a structure, neither was it codex just bringing some best practice, it was much more of a discovery of the underlying structure in the original documents

Harjot Singh • May 31

That divergence is the whole insight, isn't it: what you thought the knowledge modules needed vs what actually made the GPT behave well are different things, and you only learn it by building, not prompting. Prompting is asking; building is engineering the context and constraints so the right behavior is the default. The gap between those two is exactly where "AI can't do my use case" usually turns out to be "I prompted instead of structured." Curious what surprised you most in the divergence, the granularity, or what you had to leave out to keep it focused? That trimming decision is where I always learn the most.

Simon Griffiths • Jun 1

The trickiest part was that there were four main knowledge areas in the GPT that were pretty large, but there was also some significant common instructions and "routing" that needed to be done, and so I ended up with a minimal set of instructions, which then referred to a "routing" file which helped direct to a specific knowledge module. I was not convinced that this would work, but it did. The underlying mechanism is basically RAG-like, but it seems flexible enough to be able to have some knowledge files referring to other files. The other part that I was surprised at was the repetition that I needed. I am guessing that repetition of YAML sections within the more text based markdown helped ensure that the best chunks were loaded into context at run-time - for a long document, I think that there a risk (I have no proof of this) that information at the top of a long file is not necessarily enough to get sections lower down in the file, So repeating the YAML seemed to help.