Wyzer Logo ← Back to Blog

Data Format Selection for Multi-Agent LLM Systems: An Empirical Analysis of Token Efficiency

We evaluated TOON as a candidate format for our agent messaging protocol and compared it with JSON, YAML, and CSV across datasets with varying structure. The results showed clear patterns. TOON delivered meaningful token savings on simple and moderately nested data but failed on deeply nested hierarchies. CSV was slightly more efficient than TOON on flat tables, although its flattening step introduced ambiguity when structure mattered. JSON remained the only dependable choice for deep nesting. The broader conclusion was straightforward. No format performs well across every structure. Real gains come from matching the data to the format rather than expecting a single representation to handle every use case.

By Patrick Bartsch November 16, 2025 15 min read

JSON to TOON 41% reduction

At WYZER, we're building a sophisticated multi-agent AI platform where expert knowledge meets intelligent automation. Like many teams working with LLMs at scale, token costs became a significant operational constraint. Every interaction between our agents, every data exchange, every API call added up.

We investigated TOON (Token-Oriented Object Notation) for our Agent-to-Agent (A2A) protocol and ran controlled experiments to measure its actual performance. The results are nuanced: TOON achieved 51.6-53.8% token savings on simple tabular structures, but completely failed on deeply nested data.

More importantly, independent benchmarks show that while CSV sometimes achieves 2-3% better token efficiency, format selection depends critically on data structure: CSV excels on flat tables, TOON on moderately nested data, and JSON remains necessary for deep hierarchies.

This isn't promotional content. It's an honest technical assessment with empirical data to guide format selection decisions.

Experimental Methodology

We tested four data serialization formats (JSON, YAML, TOON, CSV) across three dataset types using the Claude API for token counting:

Dataset Structure Type Examples Tested JSON Baseline (avg tokens)
AgentResults Simple tabular 10 580
KnowledgeGaps Moderately nested 10 1,148
BookstoreBorrowingHistory Deeply nested 3 2,176

Each dataset was serialized in all four formats and token-counted via Claude Haiku 4.5 API to ensure measurement consistency.

Results: Token Efficiency by Data Structure

Simple Tabular Structures (AgentResults)

For uniform tabular data with consistent schemas:

Format Avg Tokens Token Savings vs JSON Applicable
JSON 580 baseline
YAML 427 26.4%
TOON 268 53.8%
CSV 252 56.6%

Key finding: CSV outperforms TOON by 2.8 percentage points for token efficiency on simple structures. TOON's structural overhead (array length declarations, field schemas) provides no advantage when data is already flat and uniform.

Moderately Nested Structures (KnowledgeGaps)

For data with one level of nesting and repeating sub-structures:

Format Avg Tokens Token Savings vs JSON Applicable
JSON 1,148 baseline
YAML 844 26.5%
TOON 555 51.6%
CSV 511 55.4%

Key finding: CSV again shows 3.8 percentage point advantage in token efficiency. However, CSV's flattening of nested structures creates ambiguity that can degrade LLM parsing accuracy (detailed below).

Deeply Nested Structures (BookstoreBorrowingHistory)

For hierarchical data with multiple nesting levels:

Format Avg Tokens Token Savings vs JSON Applicable
JSON 2,176 baseline
YAML 1,619 25.6%
TOON -1 N/A
CSV -1 N/A

Critical finding: TOON and CSV both fail completely on deeply nested structures. The format cannot represent complex hierarchical relationships without either losing information or introducing flattening hacks that negate efficiency gains.

For this use case, JSON remains the only practical choice, with YAML offering modest (~26%) savings.

The Critical Factor: LLM Comprehension Accuracy

Token efficiency alone is misleading. In multi-agent systems, parsing errors cost far more than the tokens you save. When an agent misinterprets data, the entire workflow suffers: retry loops, failed tasks, incorrect outputs.

Independent research demonstrates that comprehension varies significantly by data structure and format choice.

TOON Benchmark Results: Nested Data Comprehension

The official TOON benchmarks tested 209 questions across four LLM architectures on datasets with varying nesting complexity:

Overall Accuracy (209 questions):

  • TOON: 73.9% accuracy
  • JSON: 69.7% accuracy
  • Improvement: +4.2 percentage points

Per-Model Results:

Model TOON Accuracy JSON Accuracy TOON Improvement
Claude Haiku 4.5 59.8% 57.4% +2.4 pp
Gemini 2.5 Flash 87.6% 77.0% +10.6 pp
GPT-5 Nano 90.9% 89.0% +1.9 pp
Grok 4 Fast 57.4% 55.5% +1.9 pp

Critical limitation: CSV was excluded from 100 of 209 questions (48%) because it cannot represent nested structures without losing information.

CSV Comprehension: It Depends on Structure

Research shows CSV comprehension is highly context-dependent:

Scenario 1: Pure Flat Tabular Data

A GetCrux.ai study tested Claude 3.5 Sonnet on flat tabular question-answering:

Format Accuracy Cost per 1K queries
CSV 95.45% $0.23
JSON 87.50% $0.41

Key finding: For pure flat tables, CSV achieves 8% higher accuracy and 44% lower cost than JSON.

Scenario 2: Mixed Text and Tabular Data

An independent benchmark comparing 11 formats on mixed-structure tasks found:

Format Accuracy
Markdown (key: value) 60.7%
JSON 57.1%
CSV 44.3%

Key finding: CSV was among the weakest performers when data includes narrative text or non-tabular elements.

Understanding Format-Specific Strengths

The research reveals clear patterns:

CSV excels when:

  • Data is purely tabular with no nesting
  • All records share identical schema
  • Queries involve simple lookup or aggregation
  • Example: "What is the price in row 5?"

TOON excels when:

  • Data has one level of nesting with repeating sub-structures
  • Structure needs to be preserved for comprehension
  • Queries involve parent-child relationships
  • Example: "What items belong to order 1005?"

JSON remains necessary when:

  • Data has deep nesting (3+ levels)
  • Structure is irregular or highly dynamic
  • Universal compatibility is required

Why Format Choice Affects Comprehension

Let's examine concrete examples showing how structure impacts LLM parsing.

Example 1: Array Length Ambiguity

CSV format (implicit count):

order_id,customer,total
1001,Alice,124.98
1002,Bob,45.50
1003,Carol,89.99

Question: "How many orders are there?"

The LLM must count rows. With streaming data or partial contexts, this is error-prone. TOON benchmark shows 59.8% accuracy on count queries with CSV.

TOON format (explicit count):

orders[3]:
  order_id,customer,total
  1001,Alice,124.98
  1002,Bob,45.50
  1003,Carol,89.99

The [3] declaration explicitly states the count. TOON benchmarks show 73.9% accuracy on the same queries—a 14-point improvement.

Example 2: Nested Structure Flattening

CSV format (flattened, loses relationships):

order_id,customer,item_name,item_qty,item_price
1001,Alice,Widget,2,9.99
1001,Alice,Gadget,1,105.00
1002,Bob,Doohickey,3,15.00

Question: "What is Alice's total order value?"

The LLM must:

  1. Identify all rows where customer == "Alice"
  2. Calculate qty × price for each row
  3. Sum the results

This multi-step reasoning is error-prone. GetCrux.ai shows CSV achieves 95.45% accuracy only when questions map directly to single cells—not aggregations across rows.

TOON format (preserves hierarchy):

orders[2]:
  - order_id: 1001
    customer: Alice
    total: 124.98
    items[2]{name,qty,price}:
      Widget,2,9.99
      Gadget,1,105.00

  - order_id: 1002
    customer: Bob
    total: 45.50
    items[1]{name,qty,price}:
      Doohickey,3,15.00

The nested structure explicitly groups items under their parent order. The LLM can directly extract total: 124.98 without calculation.

Example 3: Field Schema Boundaries

CSV format (implicit schema):

order_id,customer,items
1001,Alice,"Widget,Gadget,Doohickey"
1002,Bob,"Thingamajig"

When a field contains comma-separated values, CSV becomes ambiguous. Is "Widget,Gadget" one field or two?

TOON format (explicit schema):

orders[2]{order_id,customer,items[*]}:
  1001,Alice,[Widget,Gadget,Doohickey]
  1002,Bob,[Thingamajig]

The items[*] declaration tells the LLM that this field contains an array, eliminating parsing ambiguity.

Understanding Format Efficiency: Real Examples

Let's examine the e-commerce order scenario to see how format choice affects both tokens and comprehension.

JSON: The Verbose Standard

{
  "orders": [
    {
      "order_id": 1001,
      "customer": "Alice",
      "total": 124.98,
      "items": [
        {"sku": "A1", "name": "Widget", "qty": 2, "price": 9.99},
        {"sku": "B2", "name": "Gadget", "qty": 1, "price": 105.00}
      ]
    },
    {
      "order_id": 1002,
      "customer": "Bob",
      "total": 45.50,
      "items": [
        {"sku": "C3", "name": "Doohickey", "qty": 3, "price": 15.00},
        {"sku": "D4", "name": "Thingamajig", "qty": 1, "price": 0.50}
      ]
    }
  ]
}

Token count: ~200 tokens | Comprehension: 69.7%

JSON's explicit structure is clear but verbose. Every field name repeats for every object.

CSV: Compact but Context-Dependent

order_id,customer,total,item_sku,item_name,item_qty,item_price
1001,Alice,124.98,A1,Widget,2,9.99
1001,Alice,124.98,B2,Gadget,1,105.00
1002,Bob,45.50,C3,Doohickey,3,15.00
1002,Bob,45.50,D4,Thingamajig,1,0.50

Token count: ~80 tokens | Comprehension: 95.45% (flat queries) / 44.3% (complex queries)

CSV achieves maximum token efficiency but loses hierarchical relationships. Performance depends entirely on query type.

TOON: Structure Meets Efficiency

orders[2]:
  - order_id: 1001
    customer: Alice
    total: 124.98
    items[2]{sku,name,qty,price}:
      A1,Widget,2,9.99
      B2,Gadget,1,105.00

  - order_id: 1002
    customer: Bob
    total: 45.50
    items[2]{sku,name,qty,price}:
      C3,Doohickey,3,15.00
      D4,Thingamajig,1,0.50

Token count: ~110 tokens | Comprehension: 73.9%

TOON balances token efficiency (45% better than JSON) with explicit structure. Array declarations and field schemas create parsing guardrails while maintaining nested relationships.

The Multi-Agent Token Problem

In a multi-agent architecture, LLMs don't just process user queries—they communicate with each other. Agents exchange structured data: task assignments, context updates, results, metadata. When your system processes thousands of these exchanges daily, both token efficiency and comprehension accuracy matter enormously.

Traditional formats like JSON work, but they weren't designed for the token economy of LLMs. Every repeated key name, every redundant bracket, every verbose structure costs tokens. And in a multi-agent system, these costs multiply across every agent interaction.

But token savings mean nothing if agents misinterpret data. The research shows that format selection must match data structure:

  • Flat tables: CSV wins on both tokens and comprehension (95.45% accuracy)
  • Nested structures: TOON wins on both tokens and comprehension (73.9% accuracy)
  • Deep hierarchies: JSON remains necessary (TOON/CSV not applicable)

Evidence-Based Format Selection Guidelines

Based on our empirical testing combined with independent research, here are data-driven selection criteria:

Use CSV when:

  • Data is purely tabular with no nesting whatsoever
  • All records share identical schema with no optional fields
  • Queries are simple lookups or single-row aggregations
  • Working with traditional data analysis tools (pandas, Excel, databases)
  • Maximum token efficiency is critical (55-57% savings)
  • Interoperating with systems that require CSV format

Benchmark support: GetCrux.ai shows 95.45% accuracy + 44% cost savings vs JSON for flat tables.

Use TOON when:

  • Data has one level of nesting with repeating sub-structures
  • Working with multi-agent systems where parsing errors cascade
  • Need both structure preservation and token efficiency (51-54% savings)
  • Queries involve parent-child relationships or nested aggregations
  • Building Agent-to-Agent protocols where comprehension reliability matters

Benchmark support: TOON benchmarks show 73.9% accuracy vs JSON's 69.7% on nested data.

Use JSON when:

  • Data has deep nesting (3+ levels)
  • Interoperating with existing APIs that expect JSON
  • Working with highly irregular, non-tabular data structures
  • TOON/CSV are not applicable to your data structure
  • Need universal compatibility across ecosystems

Benchmark support: Remains the only format supporting arbitrary depth.

Use YAML when:

  • Writing configuration files for human readability
  • Structure includes extensive nested references and anchors
  • Modest token savings (~26%) are acceptable
  • Working with infrastructure-as-code tools (Kubernetes, Ansible)

Multi-Agent Architecture Insights

One Reddit discussion on multi-agent token efficiency reported similar findings: distributing tasks across specialized agent types (extractor, analyzer, answerer) with targeted context resulted in 60% token reduction with no accuracy loss.

The pattern is clear: when agents communicate with purpose-built, token-efficient formats matched to data structure, the entire system becomes more scalable.

For WYZER's A2A protocol, we now selectively apply formats based on data structure characteristics:

  • Task definitions with flat parameters → CSV (56.6% token savings + 95.45% accuracy)
  • Search results with metadata and nested relevance scores → TOON (51.6% token savings + 73.9% accuracy)
  • Knowledge fragments with deep citation hierarchies → JSON (TOON/CSV not applicable)
  • Processing outcomes with validation structures → TOON (51.6% token savings + structural clarity)
  • Analytics exports for external tools → CSV (interoperability requirement)

Implementation Insights

Integrating format optimization into our A2A protocol required building conversion layers and updating our agent communication schemas. Key implementation learnings:

  1. Match format to data structure — CSV for flat, TOON for nested, JSON for deep hierarchies
  2. Prioritize comprehension for critical paths — A 3% token saving isn't worth accuracy loss
  3. Start with high-frequency exchanges where token savings and accuracy improvements compound
  4. Use schema validation to catch format errors early in development
  5. Benchmark your specific use cases since both token efficiency and comprehension vary by structure
  6. Maintain JSON fallbacks for external integrations during transition
  7. Instrument error rates to measure the real-world impact of format changes

The Blunt Reality: Match Format to Structure

Data format selection is not ideological. It's engineering. The evidence from our controlled testing and independent benchmarks is unambiguous:

CSV wins for flat tabular data: 56.6% token savings + 95.45% comprehension accuracy. When data has no nesting and queries are simple lookups, CSV delivers maximum efficiency on both dimensions.

TOON wins for moderately nested data: 51-54% token savings + 73.9% comprehension accuracy. When data has one level of nesting with repeating structures, TOON's explicit schemas provide both efficiency and parsing reliability—the optimal choice for multi-agent systems.

JSON wins for deeply nested data: TOON/CSV simply cannot represent hierarchies beyond 1-2 levels without losing information. YAML offers modest savings (~26%), but JSON remains the pragmatic default.

The research is clear: CSV excels on pure tables (GetCrux: 95.45%), but fails on nested data (excluded from 48% of TOON benchmark questions). TOON excels on nested structures (73.9% vs JSON's 69.7%), but adds unnecessary overhead on flat tables (56.6% vs CSV's 53.8%).

Claims of universal superiority from any single format are misleading. The optimal choice depends entirely on your data structure characteristics and whether you optimize for flat tables, nested structures, or deep hierarchies.

Conclusion

Token efficiency in multi-agent systems isn't optional—it's foundational. But format selection must match data structure. Every agent interaction, every data exchange, every context update contributes to your operational costs and system reliability.

Our testing across actual examples with actual Claude API token counting, combined with independent comprehension benchmarks, reveals the truth:

Token Efficiency:

  • Simple structures: CSV (56.6%) > TOON (53.8%) > YAML (26.4%)
  • Moderately nested: CSV (55.4%) > TOON (51.6%) > YAML (26.5%)
  • Deeply nested: YAML (25.6%) > JSON (baseline), TOON/CSV not applicable

LLM Comprehension Accuracy:

  • Flat tables: CSV (95.45%) > TOON/JSON (benchmarks excluded CSV)
  • Nested structures: TOON (73.9%) > JSON (69.7%)
  • Mixed structures: Markdown (60.7%) > JSON (57.1%) > CSV (44.3%)

The optimal choice:

  • Flat tables: CSV wins on both tokens and comprehension
  • Nested structures: TOON wins on both tokens and comprehension
  • Deep hierarchies: JSON remains necessary (TOON/CSV not applicable)

Neither TOON, CSV, nor JSON is a universal solution. Each is a specialized tool that excels in a specific performance envelope. If you're building multi-agent systems or processing significant structured data through LLMs, empirical testing with your actual data structures is essential. Match format to structure, not marketing claims.

About the Authors