This is the third post of series Building TinyAgent where we are building a small agent from scratch in Node.js with no frameworks just the API cal...
For further actions, you may consider blocking this person and/or reporting abuse
Nice way to make the state visible. The messages array gets real interesting in voice, where every turn you keep is latency you pay at synthesis time. We ended up summarizing everything older than a few turns into one system-side note and keeping only the recent turns verbatim, which felt wrong until we measured that response quality barely moved and time-to-first-token dropped noticeably. The array is a budget, not just a log
Thanks @realmarcuschen
That's a great analysis. How much difference did you observe in cost and output rendering time?
The messages array abstraction works fine for simple chatbots but starts breaking when you need agents that maintain state across tool calls. The moment you have parallel tool execution or need to inject system context mid-conversation, the linear array model gets awkward fast. Most agent frameworks end up building a graph on top of it anyway.
Thanks @mininglamp
Nice addition! Would definitely include this point in the upcoming post of the series for tool call.
Exactly the questions I had today. Thanks for this write-up + the cool illustrations 👏
Thanks @yvem
Hey, this article appears to have been generated with the assistance of ChatGPT or possibly some other AI tool.
We allow our community members to use AI assistance when writing articles as long as they abide by our guidelines. Please review the guidelines and edit your post to add a disclaimer.
Failure to follow these guidelines could result in DEV admin lowering the score of your post, making it less visible to the rest of the community. Or, if upon review we find this post to be particularly harmful, we may decide to unpublish it completely.
We hope you understand and take care to follow our guidelines going forward!
Great series! 👏
The caching section is the part most agent tutorials skip entirely. Worth adding: you're not limited to a single cache marker. Anthropic gives you up to 4 cache_control breakpoints, so a common layout is one on the tools, one on the system prompt, one on the last message. Since a read checks for the longest matching cached prefix, append-only history keeps hitting the cache without you re-marking every turn. The gotcha you nailed is the real one though! Any edit before a breakpoint (a window drop, a summary swap) busts everything after it, which is exactly why window and caching fight each other.
Thanks @nazar_boyko !
Your feedback is really motivating. Thanks for adding up! :)
Thanks @lovestaco