Same prompt, two models, different outputs. No tooling was actually showing me where they diverged.
Built tokenflame that gives entropy heatmaps, tokenizer diffs, divergence markers, token-by-token replay. One command, one HTML file.
pip install tokenflame
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (1)
a token-level diff between two LLMs is genuinely useful tooling, model selection is mostly vibes otherwise. that kind of visibility is what makes routing decisions in Moonshift defensible: agents build + deploy + market a SaaS overnight, and picking the right model per step matters for cost + quality. nice tool. first run's free if you ever want a real workload to test it against.