The problem I kept hitting
Every time I went deep on a topic with ChatGPT, one tangent would
poison the whole thread. You ask a follow-up question, and suddenly
your entire conversation context is contaminated with an irrelevant
detour. The LLM loses the plot.
The standard workaround? Open a new chat. Paste context manually.
Repeat. That's not a solution, that's giving up.
I wanted branches — real ones. Not tabs. Not separate threads you
manage yourself. Branches that inherit the right context automatically
and stay isolated from each other.
So I built ContextTree.
What it does
ContextTree is a node-based visual canvas for LLM conversations.
Every message is a node. Branching is a first-class action, not a
workaround.
The core invariant: a child node only inherits its direct parent
lineage — never siblings, never cousins. No cross-contamination.
But the feature that surprised me most during development is what
each node carries independently:
- Its own LLM model (GPT-4o on one branch, Gemini Flash on another)
- Its own custom system prompt (scoped to that node and its children)
- Its own advanced settings: temperature, max output tokens, history mode, last K messages, context budget in tokens, external context chunk count
This means on one canvas you can have a general assistant node, fork
into a strict legal-persona branch with a lawyer system prompt and
tight context budget, then fork again into a summarizer with low
temperature. Three personalities, zero interference, one visual graph.
The hardest design decision: context inheritance
The honest rule in the codebase:
A child node never reads parent live state — no shared LangGraph
state, no reads of the parent's current summary after the fork
moment. Each node evolves independently.However, ancestry-scoped vector search lets a child retrieve
relevant snippets from any ancestor's history, capped at the
fork point. Branches inherit knowledge, not state.
This distinction took a while to nail. "Knowledge not state" is the
mental model that made the architecture clean. If you want hard
isolation, set SIMILAR_CONTEXT_LIMIT=0 per node.
What I'm still figuring out
- Prompt stack order — should users be able to reorder layers?
- Is per-node system prompt enough, or do people want per-node RAG sources pinned differently?
- The multi-LLM branching UX — is it obvious enough what's happening?
Try it
Demo:CONTEXTTREE
Video walkthrough: https://youtu.be/AqmICcc26VI
Built solo. Early stage. Brutal feedback welcome — especially from
anyone who's built multi-agent or prompt engineering tooling.

Top comments (1)
Per-branch model/prompt/context is a genuinely smart canvas design - it makes the routing decision visual and explorable instead of buried in config, which is exactly how people SHOULD think about multi-model work: this branch needs the cheap fast model, that one needs the heavy reasoner, and you can see the tradeoff laid out. Most tools force one model for the whole session; letting each branch pick its own is the right mental model for how real multi-model pipelines actually work.
The thing that'd make it even more powerful: surface the cost/latency per branch on the canvas, so the visual routing decision also shows the economic consequence (this branch on Opus costs 10x that one on Haiku). Then it's not just "which model" but "is the quality worth the cost here," made visible. That per-branch-routing-with-cost-visibility is essentially the internal model of Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - different steps, different models, cost-aware - you've built a visual front-end for the same idea. Really cool tool. Do branches show their per-call cost yet, or is it focused on the model/prompt config? The cost overlay would turn it into a routing-optimization canvas, not just a config one.