Rudson Kiyoshi Souza Carvalho

Posted on May 5

TERSE Tool Catalog (TTC): Cut Tool Catalog Token Usage by 66.6% in Your AI Agents

#llm #mcp #token #terse

If you’ve ever built or worked with AI agents that use tools via the Model Context Protocol (MCP), you’ve probably felt the pain that nobody talks about out loud:

The tool catalog is eating your entire context window and budget.

A single tool defined in MCP JSON Schema typically consumes 100–270 tokens. With 50 tools installed, you’re already spending 5,000–13,500 tokens before the user even writes their first message.

This isn’t just expensive — it actively hurts performance:

Higher cost on every single request
Lower tool-selection accuracy as the catalog grows (attention dilution)
Less room for actual user instructions, memory, or reasoning

The good news? There’s a clean, elegant solution: TERSE Tool Catalog (TTC).

The Problem with Today’s MCP JSON Schema

The current MCP format was designed for machine-to-machine execution contracts, not for LLM reasoning. As a result:

There is no explicit trigger condition (WHEN) — the LLM has to guess from a free-form description string.
There is no error contract (ERR) — the model has no idea what to do when a tool fails.
There is no retrieval taxonomy (TAGS) — dynamic tool retrieval (RAG over tools) becomes painful.
Verbose parameter descriptions add noise with almost zero signal for the LLM.

The result is high cost + mediocre tool selection.

Introducing the TERSE Tool Catalog (TTC)

TTC is an official extension of the TERSE Format — a specification for dense, deterministic, human-and-machine-readable representations optimized for LLMs.

It is not just a compression of MCP JSON. It is a semantic reformulation of the tool contract.

TTC keeps everything the LLM actually needs for execution and adds three fields that MCP is missing:

PURPOSE — clear one-line intent
WHEN — explicit semantic trigger (the most important field for selection)
ERR — declared failure modes
TAGS — taxonomy for semantic grouping and retrieval

Measured result: average 66.6% token reduction with net information gain.

TTC Syntax — Clean and Simple

TOOL <tool-id>
  PURPOSE: <one-line description of what the tool does>
  IN: <param1>:<type>, <param2>:<type>?
  OUT: <return-type>
  ERR: <error1> | <error2> | <error3>
  WHEN: <natural language trigger condition>
  TAGS: <tag1>, <tag2>, <tag3>

Supported Types

string, int, float, bool
array[string], array[int], etc.
object, any

The ? suffix marks an optional parameter.

Real-World Example: `gmail_send_email`

MCP JSON Schema (208 tokens):

{
  "name": "gmail_send_email",
  "description": "Sends an email message via the Gmail API to one or more recipients...",
  "input_schema": { ... }  // very verbose
}

TTC (55 tokens):

TOOL gmail_send_email
  PURPOSE: send email via Gmail
  IN: to:string, subject:string, body:string, cc:string?
  OUT: message_id:string
  ERR: auth_failed | quota_exceeded | invalid_recipient
  WHEN: user wants to send or compose an email
  TAGS: gmail, email, communication

Same semantic content. 73.6% fewer tokens. And the LLM now has structured fields to make much better decisions.

Real Benchmark (10 Production Tools)

Tool	JSON Schema	TTC	Reduction
gmail_send_email	208	55	73.6%
gmail_read_inbox	121	52	57.0%
drive_list_files	141	53	62.4%
calendar_create_event	262	78	70.2%
slack_send_message	206	69	66.5%
github_create_issue	269	84	68.8%
...	...	...	...
TOTAL (10 tools)	1948	650	66.6%

Projection at scale:

50 tools → ~9,740 → ~3,250 tokens
100 tools → ~19,480 → ~6,500 tokens Savings: ~13,000 tokens per request

Why TTC Works So Well

It follows the core TERSE principles:

Maximum information density per token
Determinism (same input → same output)
Human + machine readability
Full composability (tools → servers → agent context)

And it adds exactly what LLMs need for better reasoning:

WHEN becomes the primary discriminator for tool selection
ERR enables graceful degradation and fallback strategies
TAGS makes dynamic tool retrieval (RAG over tools) trivial

How to Use It in Your Agent Context

At the start of a conversation (or via dynamic retrieval), you inject:

TOOLS v1.0 [3/47]
  MCP gmail v1.2
    TOOL gmail_send_email
      ...
  MCP google_drive v2.0
    TOOL drive_read_file
      ...

With semantic tool retrieval, you only inject the 3–5 most relevant tools per request. Context cost becomes sub-linear no matter how large your total catalog grows.

Reference Converter (Python)

The author provides a ready-to-use reference implementation:

github.com/RudsonCarvalho/terse-format

It converts MCP JSON Schema → TTC with sensible defaults. For production use, you simply add explicit annotations for OUT, ERR, WHEN, and TAGS on the server side.

Planned Future Extensions

EXAMPLE block — input/output examples for few-shot learning
COST annotation — estimated token/latency cost per call
CHAIN annotation — tool dependencies and composition patterns
ALIAS field — alternative trigger phrases
AUTH annotation — required OAuth scopes

Conclusion

The TERSE Tool Catalog is not just a token-saving trick. It is a genuine improvement in agent quality — better tool selection, better error handling, and native support for semantic tool retrieval.

If you work with agents, MCP, LangGraph, CrewAI, AutoGen, or any modern agentic framework, TTC is worth trying today.

Links

📄 Full spec (Zenodo): https://doi.org/10.5281/zenodo.19869007

💻 GitHub: https://github.com/RudsonCarvalho/terse-format/tree/main/extensions/ttc

🌐 Landing page: https://rudsoncarvalho.github.io/terse-format/

📦 TERSE Format (parent spec): https://doi.org/10.5281/zenodo.19058364

Top comments (4)

Harjot Singh • May 31

Cutting tool-catalog token usage is attacking a cost most people don't even see, because the tool definitions sit in the prompt on every single call whether or not the agent uses them, so a fat catalog is a flat tax on every request, and it scales with the number of tools, not the work done. A terser catalog is pure margin, you pay less per call for the same capability. The deeper insight your title hints at is that the catalog is also a context-quality problem, not just cost: a huge verbose tool list doesn't only cost tokens, it makes the model choose worse, because more options and more noise dilute attention on the right tool. So trimming it usually helps accuracy and cost together, which is the best kind of optimization. The thing I'd watch is the floor, terse can't become ambiguous, the tool name and signature still have to carry enough meaning for the model to pick correctly, so the art is minimum tokens that preserve unambiguous selection, not just shortest. Trim the always-present overhead, but keep each tool legible enough to choose right. That cut-the-flat-tax-without-losing-clarity instinct is core to how I think about cost in Moonshift. Did the terser catalog also improve tool-selection accuracy, or purely the token bill?

Rudson Kiyoshi Souza Carvalho • Jun 10

Both, but for structural reasons, not as a side effect of compression. TTC attacks selection quality through three mechanisms that operate on the context itself.
First, WHEN turns selection from inference into matching. With MCP, the model has to reconstruct the trigger condition from a free-form description; with TTC, the discriminator is declared. Less reasoning spent on "what is this tool for?" means more attention on "is this the right tool now?"
Second, exactly the attention-dilution point you raised: a 66% smaller catalog isn't just cheaper, it's a cleaner signal-to-noise ratio over the same capability set. The tokens removed were mostly redundant parameter prose, noise by your own definition.
Third, and this is where it compounds: TAGS makes RAG-over-tools trivial, so at scale the model never even sees 50 tools... it sees the 3–5 relevant ones. That doesn't mitigate attention dilution; it removes it. Catalog size stops being a variable in selection quality at all.
On the floor: fully agree, and it's why TTC is a semantic reformulation rather than maximal compression. PURPOSE + WHEN + typed signature are the legibility floor, below that you trade a token tax for a selection tax. Formal accuracy numbers (MCP vs. TTC, same model, same tasks) are the next thing I'm publishing.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.