A new data format for LLMs called TOON (Token-Oriented Object Notation) has been getting some crazy attention in the last couple of weeks.
First - I think it’s a very cool concept and it looks like there’s an impressive engineering effort in a very short time here.
And having said that - I think it’s worth trying to understand what are the use cases where TOON is actually the best option.
(I’m not saying anything that doesn’t appear in the TOON repo itself, btw - just trying to take a closer look at a fairly over-hyped situation)
What is TOON?
From the repo:
Token-Oriented Object Notation is a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
The basic idea sounds very good: Send the same info as JSON, but it’ll use fewer tokens and it won’t lose accuracy.
The high-level benchmarks on the README look promising, too:
TOON ████████████████████ 26.9 │ 73.9% acc │ 2,744 tokens
JSON compact █████████████████░░░ 22.9 │ 70.7% acc │ 3,081 tokens
YAML ██████████████░░░░░░ 18.6 │ 69.0% acc │ 3,719 tokens
JSON ███████████░░░░░░░░░ 15.3 │ 69.7% acc │ 4,545 tokens
XML ██████████░░░░░░░░░░ 13.0 │ 67.1% acc │ 5,167 tokens
Should you use TOON?
But…
These are aggregate results.
The question you should ask yourself is not:
“Is TOON better than other formats on average?”
But rather:
“Is TOON better than others for my specific use case?”
Or more generally, as an industry:
“What are the use cases for which TOON is the best choice?”
What do the benchmarks show?
The TOON repo doesn’t pretend it’s a perfect match for everything:
TOON’s sweet spot is uniform arrays of objects (multiple fields per row, same structure across items).
It also discusses
- The format’s similarity to CSV (and it really is very similar when the data is tabular)
- A useful list of “When Not to Use TOON”, which mentions, for example, that purely tabular or highly nested data have better alternatives.
Several benchmarks are provided in the docs.
My take from them is that indeed, there are limited use cases where TOON appears to be the best option.
Tabular data: CSV
Here are a couple of examples:
Uniform employee records
| Format | Accuracy | Tokens | Correct/Total |
|---|---|---|---|
csv |
72.0% | 2,352 | 118/164 |
toon |
73.8% | 2,518 | 121/164 |
Time-series analytics data
| Format | Accuracy | Tokens | Correct/Total |
|---|---|---|---|
csv |
73.3% | 1,406 | 88/120 |
toon |
72.5% | 1,548 | 87/120 |
These are pretty close.
CSV is more compact, and the accuracy difference isn’t significant either way (and probably depends on the model).
Larger complex data: Compact JSON
Semi-uniform event logs
| Format | Accuracy | Tokens | Correct/Total |
|---|---|---|---|
json-compact |
63.3% | 4,819 | 76/120 |
toon |
57.5% | 5,799 | 69/120 |
Cases where TOON was better:
Deeply nested configuration
On the one hand, this is a very interesting scenario, because it’s really free-form data, and TOON’s accuracy outperforms all other formats.
On the other hand, this is really small data. One configuration sample of less than 1,000 tokens.
So it’s difficult to know whether or not this will be consistent.
And also - when it’s this small, the token savings aren’t that significant.
| Format | Accuracy | Tokens | Correct/Total |
|---|---|---|---|
json-compact |
92.2% | 574 | 107/116 |
toon |
95.7% | 666 | 111/116 |
yaml |
91.4% | 686 | 106/116 |
json-pretty |
94.0% | 932 | 109/116 |
xml |
92.2% | 1,018 | 107/116 |
E-commerce orders with nested structures
This is the sweet spot mentioned in the docs.
If this is your use case, TOON looks promising.
| Format | Accuracy | Tokens | Correct/Total |
|---|---|---|---|
toon |
81.1% | 7,232 | 133/164 |
json-compact |
76.8% | 6,794 | 126/164 |
To give a concrete sense, this is the structure of each of the orders:
export interface Order {
orderId: string
customer: {
id: number
name: string
email: string
phone: string
}
items: {
sku: string
name: string
quantity: number
price: number
}[]
subtotal: number
tax: number
total: number
status: string
orderDate?: string
createdAt?: string
}
What can we learn from this?
From an industry perspective, I can’t help but wonder if the improvement here really justifies another format. This immortal xkcd always makes a good point:
But putting that aside, the bottom line is that there are a couple of use cases where TOON shows promise - but it’s not the best solution in the most common cases (and it’s not claiming to be).
What should you do?
I’d say these are the defaults:
┌──────────────────────────────────────────────────────────────────┐
│ Is structured data tokens/accuracy actually a bottleneck for you?│
└────────────────┬─────────────────────────────────────────────────┘
│
┌────────┴────────┐
│ │
No Yes
│ │
▼ ▼
┌─────────────┐ ┌─────────────────────────┐
│ Don't worry │ │ What's your data shape? │
│ about it │ └───────────┬─────────────┘
└─────────────┘
│
┌───────────┼───────────┬──────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌───────────┐ ┌──────────┐ ┌────────────┐ ┌──────────┐
│ Tabular │ │ Highly │ │ Arrays of │ │Free-form │
│ data │ │ nested │ │ objects │ │ complex │
│ without │ │ data │ │ (not flat, │ │ data │
│ nesting │ │ │ │not deeply │ │ │
│ │ │ │ │ nested) │ │ │
└─────┬─────┘ └────┬─────┘ └─────┬──────┘ └────┬─────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌───────────┐ ┌────────────────┐
│ CSV │ │ Compact │ │ Consider │ │ Maybe TOON, │
│ │ │ JSON │ │ TOON │ │ test carefully │
└─────────┘ └──────────┘ └───────────┘ └────────────────┘
And either way:
- Test the make-sense alternatives on your actual data!
- Balance the improvements against the complexity of adding another format to your stack.
