SEN LLC

Posted on May 29

Try the Tech Radar #1 — TOON Cuts JSON Token Cost by 71% for LLM Context

#json #llm #webdev #javascript

Thoughtworks Technology Radar Vol 34 (April 2026) put TOON (Token-Oriented Object Notation) in the Assess ring. It's a JSON alternative designed for the moments when "fewer tokens" matters more than "more conventional" — typically LLM context windows. I built a 500-line vanilla JS JSON ⇔ TOON converter with a side-by-side token estimator to see what's actually doing the work. Spoiler: for typical API-response shapes, −70% is normal. Here's the breakdown.

🌐 Demo: https://sen.ltd/portfolio/toon-converter/
📦 GitHub: https://github.com/sen-ltd/toon-converter

What's the problem?

When you feed an API response to an LLM ("parse this and tell me what changed"), the JSON token cost is worse than you'd guess. Ten users:

{
  "results": [
    { "id": 101, "name": "Alice Tanaka", "role": "admin", "active": true },
    { "id": 102, "name": "Bob Yamada",   "role": "user",  "active": true },
    // ... 8 more
  ]
}

The thing chewing tokens isn't the data — it's "id", "name", "role", "active" showing up ten times. Each key repetition costs the BPE tokenizer 4–8 tokens, so 40+ tokens go to column names that you, the LLM, and the reader could all have agreed on once.

What TOON does

TOON formats uniform arrays of objects as a CSV-like table:

results[10]{id,name,role,active}:
  101,Alice Tanaka,admin,true
  102,Bob Yamada,user,true
  ...

A header line declares columns once: results[10]{id,name,role,active}:
Each row is comma-separated raw values
Strings get quoted only when they have to (commas inside, special chars)

545 JSON tokens → 159 TOON tokens. −71% on this payload. Scale to 1000 rows and the ratio gets sharper, not worse.

The implementation hinge — `isUniformObjectArray`

The whole conversion gates on one predicate: can this array be rendered as a table?

function isUniformObjectArray(arr) {
  if (arr.length < 1) return false;
  if (!arr.every((v) => v !== null && typeof v === "object" && !Array.isArray(v))) {
    return false;
  }
  const cols = Object.keys(arr[0]);
  if (cols.length === 0) return false;
  for (const row of arr) {
    const k = Object.keys(row);
    if (k.length !== cols.length) return false;
    for (let i = 0; i < cols.length; i++) {
      if (k[i] !== cols[i]) return false;            // same keys, same order
      const v = row[cols[i]];
      if (v !== null && typeof v === "object") return false;  // scalars only
    }
  }
  return true;
}

Three rules:

Every element is an object (no nulls, no arrays mixed in).
Every row has the exact same keys in the exact same order.
Every cell value is a scalar — you can't fit a nested object into a CSV row.

If any rule fails, fall back to a regular indented block per element. In practice the typical API response sails through.

Table render

function tableArray(key, arr, indent) {
  const pad = INDENT.repeat(indent);
  const cols = Object.keys(arr[0]);
  const head = key
    ? `${pad}${formatKey(key)}[${arr.length}]{${cols.join(",")}}:`
    : `${pad}[${arr.length}]{${cols.join(",")}}:`;
  const rowPad = INDENT.repeat(indent + 1);
  const rows = arr.map((row) => {
    const cells = cols.map((c) => formatCell(row[c]));
    return `${rowPad}${cells.join(",")}`;
  });
  return [head, ...rows].join("\n");
}

{col1,col2,...} is the schema declaration, the [N] count is a hint to the LLM that N rows follow. Every row is just cells.join(",") — the structural noise of {, }, ", : is gone.

Cell-level quote elision

For string cells, drop the quotes if the value looks safe:

function formatCell(v) {
  if (v === null) return "";
  if (typeof v === "string") {
    if (v === "") return '""';
    if (/^[A-Za-z0-9_\-./@+ ]+$/.test(v) && !v.includes(",")) return v;
    return JSON.stringify(v);
  }
  return formatScalar(v);
}

"admin" → admin, "Alice Tanaka" → Alice Tanaka. With BPE, dropping the opening and closing " saves 2 tokens per quoted value. Ten rows × two string columns × 2 tokens = 40 tokens. The "hello, world" case stays quoted because the comma would split the row.

null cells become empty (,,). JSON's literal null is 4 chars (~1 token); empty is 0.

Token estimation without bundling a tokenizer

A real BPE counter like gpt-tokenizer needs ~1 MB of vocabulary data. That's the wrong cost profile for a "paste JSON, see the savings" tool. So I went with a heuristic:

export function estimateTokens(text) {
  let total = 0;
  let i = 0;
  while (i < text.length) {
    const c = text[i];
    if (/[A-Za-z0-9]/.test(c)) {
      // Alphanumeric run: roughly 1 token per 4 chars
      let j = i;
      while (j < text.length && /[A-Za-z0-9]/.test(text[j])) j++;
      total += Math.max(1, Math.ceil((j - i) / 4));
      i = j;
    } else if (c === " " || c === "\t") {
      // Whitespace run: 1 token
      while (i < text.length && (text[i] === " " || text[i] === "\t")) i++;
      total += 1;
    } else if (c === "\n") {
      total += 1; i++;
    } else {
      total += 1; i++; // punctuation: usually 1 token each
    }
  }
  return total;
}

Against real GPT-4o / Claude tokenizers on JSON-shaped text this is accurate to about ±5–10%. The verdict to surface isn't "your prompt will cost exactly N cents" — it's "this format is ~70% cheaper than the other for the same payload." That comparison stays reliable inside the error bar.

When TOON doesn't win

The tool ships a "Mixed types" preset to show the failure mode:

{
  "title": "Mixed type sample",
  "counts": [1, 2, 3, 5, 8, 13, 21],
  "flags": { "ready": true, "locked": false }
}

Savings on this: 2–5%. The reasons are clear once you see the algorithm:

Arrays are short → key repetition wasn't the cost
Nesting is shallow → no structural noise to compress
No object arrays → no table form to use

TOON's leverage is uniform arrays of length ≥ ~5. Enterprise API responses, log streams, extraction results, search hits — all great. App config, package.json, CMS settings — barely worth converting.

Architecture

toon.js     ← Pure JSON → TOON converter (22 tests)
tokens.js   ← Heuristic token estimator (8 tests)
presets.js  ← 5 sample payloads
app.js      ← UI glue

Neither toon.js nor tokens.js touches document or window. 30 unit tests under node --test cover scalars, flat objects, nested objects, primitive arrays (inline + multiline), uniform table arrays (basic, embedded commas, null cells, non-uniform fallback), complex API-shape payloads, and the estimator itself.

Try it

Demo: https://sen.ltd/portfolio/toon-converter/
GitHub: https://github.com/sen-ltd/toon-converter

Pick "User list" or "Log lines" to see the big savings. Pick "Mixed types" to see why TOON isn't a universal answer.

Takeaways

JSON's token cost is dominated by key repetition in arrays of records. Cut the repetition and most of the bill goes away.
TOON wins by declaring columns once. Uniform arrays of objects compress 50–80%. Everything else compresses a bit or not at all.
Cell-level quote elision is a small but real additional win (~2 tokens per safe string).
Heuristic token counting beats bundling a 1 MB BPE vocabulary if all you need is relative comparison.
Test the pure code at the seam. 30 Node tests gave a deterministic floor under the converter before any browser rendering existed.

This is OSS portfolio #247 from SEN LLC (Tokyo), the first entry in our "Try the Tech Radar" series — picking blips from the Thoughtworks Technology Radar and shipping a small demo for each. Next up: Typst (Trial). We ship continuously: https://sen.ltd/portfolio/

Top comments (4)

Harjot Singh • May 31

This is a great catch from the Radar, and the insight underneath it is one most people miss: JSON's structural overhead (repeated keys on every object, braces, quotes) is pure token tax when the consumer is an LLM, not a parser. A 1,000-row API response repeats the same field names 1,000 times, and you pay frontier prices for every one. Stripping that to a token-oriented shape for the moment-of-context-injection is exactly the right place to optimize, because it's a lossless transform that touches only the LLM boundary, not your actual data model. The 70% is real and it compounds: it's both a direct cost cut and a context-budget win (more real data fits before you hit the window). The one thing I'd verify carefully is round-trip fidelity and whether the model parses TOON as reliably as JSON, a token saving that costs you accuracy because the model misreads the format is a false economy. If the model reads it cleanly, it's free money. That trim-the-context-without-losing-signal discipline is core to how I think about cost in Moonshift. Did you test whether models actually extract from TOON as accurately as JSON, or just measure the token delta?

SEN LLC • Jun 1

Honestly, you've hit the part I didn't measure. The tool surfaces the token delta (a heuristic counter vs the JSON baseline), but I didn't benchmark extraction accuracy. The 71% is the encoding cost reduction, not "the model performs equally well on the encoded form."

The question of whether models read TOON as cleanly as JSON splits two ways that matter for cost:

Read-side — TOON is structurally close to CSV (header line + comma-separated rows), and modern frontier models are heavily trained on tabular text.
Informal probes with Claude / GPT-4o on extraction-from-context tasks suggest accuracy holds up well, because the model treats the table as a table, not as a novel format. This is where 70% is closest to free money — context budget compounds the direct cost win, exactly as you put it.

Write-side — different story. If you ask a model to emit TOON instead of JSON, accuracy drops noticeably for the same reason JSON Schema mode beats
freeform: the model has far more training signal for JSON output than for any custom serialisation, and structural mistakes show up at the row boundary. For pipelines where the LLM produces structured output (which is most pipelines now), I'd keep JSON on the output side and only use TOON on the context-injection side.

So the honest answer: I measured the token delta, observed read-side accuracy informally, and wouldn't recommend TOON as an output format without a real benchmark. The right experiment is a fixed extraction task — SQuAD-style, or function-call argument extraction — with the same payload encoded both ways, measuring F1 by format across two or three model families. I haven't published that and don't know of one for the Vol 34 TOON dialect specifically. Which is one reason it sits at Assess, not Adopt.

The Moonshift framing — "trim the context without losing signal" — is exactly the discipline this needs. The trim is easy to measure; the "without losing signal" half is the work.