LLM Integration Guide

November 4, 2025 · View on GitHub

Best practices for using TOON with Large Language Models to maximize token efficiency and response quality.

Why TOON for LLMs?

Traditional JSON wastes tokens on structural characters:

Braces & brackets: {}, []
Repeated quotes: Every key quoted in JSON
Commas everywhere: Between all elements

TOON eliminates this redundancy, achieving 30-60% token reduction while maintaining readability.

Quick Example

JSON (45 tokens with GPT-5):

{"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}

TOON (20 tokens with GPT-5, 56% reduction):

users[2,]{id,name}:
  1,Alice
  2,Bob

Basic Integration Patterns

1. Prompting the Model

Explicit format instruction:

Respond using TOON format (Token-Oriented Object Notation):
- Use `key: value` for objects
- Use indentation for nesting
- Use `[N]` to indicate array lengths
- Use tabular format `[N,]{fields}:` for uniform arrays

Example:
users[2,]{id,name}:
  1,Alice
  2,Bob

2. Code Block Wrapping

Always wrap TOON in code blocks for clarity:

```toon
users[3,]{id,name,age}:
  1,Alice,30
  2,Bob,25
  3,Charlie,35
```

This helps the model distinguish TOON from natural language.

3. Validation with Length Markers

Use lengthMarker="#" for explicit validation hints:

from toon_format import encode

data = {"items": ["a", "b", "c"]}
toon = encode(data, {"lengthMarker": "#"})
# items[#3]: a,b,c

Tell the model:

"Array lengths are prefixed with #. Ensure your response matches these counts exactly."

Measuring Token Savings

Before integrating TOON with your LLM application, measure actual savings for your data:

Basic Measurement

from toon_format import estimate_savings

# Your actual data structure
user_data = {
    "users": [
        {"id": 1, "name": "Alice", "email": "alice@example.com", "active": True},
        {"id": 2, "name": "Bob", "email": "bob@example.com", "active": True},
        {"id": 3, "name": "Charlie", "email": "charlie@example.com", "active": False}
    ]
}

# Compare formats
result = estimate_savings(user_data)
print(f"JSON: {result['json_tokens']} tokens")
print(f"TOON: {result['toon_tokens']} tokens")
print(f"Savings: {result['savings_percent']:.1f}%")
# JSON: 112 tokens
# TOON: 68 tokens
# Savings: 39.3%

Cost Estimation

Calculate actual dollar savings based on your API usage:

from toon_format import estimate_savings

# Your typical prompt data
prompt_data = {
    "context": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Analyze this data"}
    ],
    "data": [
        {"id": i, "value": f"Item {i}", "score": i * 10}
        for i in range(1, 101)  # 100 items
    ]
}

result = estimate_savings(prompt_data["data"])

# GPT-5 pricing (example: \$0.01 per 1K tokens)
cost_per_1k = 0.01
json_cost = (result['json_tokens'] / 1000) * cost_per_1k
toon_cost = (result['toon_tokens'] / 1000) * cost_per_1k

print(f"JSON cost per request: ${json_cost:.4f}")
print(f"TOON cost per request: ${toon_cost:.4f}")
print(f"Savings per request: ${json_cost - toon_cost:.4f}")
print(f"Savings per 10,000 requests: ${(json_cost - toon_cost) * 10000:.2f}")

Detailed Comparison

Get a formatted report for documentation or analysis:

from toon_format import compare_formats

api_response = {
    "status": "success",
    "results": [
        {"id": 1, "score": 0.95, "category": "A"},
        {"id": 2, "score": 0.87, "category": "B"},
        {"id": 3, "score": 0.92, "category": "A"}
    ],
    "total": 3
}

print(compare_formats(api_response))
# Format Comparison
# ────────────────────────────────────────────────
# Format      Tokens    Size (chars)
# JSON            78             189
# TOON            48             112
# ────────────────────────────────────────────────
# Savings: 30 tokens (38.5%)

Integration Pattern

Use token counting in production to monitor savings:

import json
from toon_format import encode, count_tokens

def send_to_llm(data, use_toon=True):
    """Send data to LLM with optional TOON encoding."""
    if use_toon:
        formatted = encode(data)
        format_type = "TOON"
    else:
        formatted = json.dumps(data, indent=2)
        format_type = "JSON"

    tokens = count_tokens(formatted)
    print(f"[{format_type}] Sending {tokens} tokens")

    # Your LLM API call here
    # response = openai.ChatCompletion.create(...)

    return formatted, tokens

# Example usage
data = {"items": [{"id": 1}, {"id": 2}]}
formatted, token_count = send_to_llm(data, use_toon=True)

Real-World Use Cases

Use Case 1: Structured Data Extraction

Prompt:

Extract user information from the text below. Respond in TOON format.

Text: "Alice (age 30) works at ACME. Bob (age 25) works at XYZ."

Format:
users[N,]{name,age,company}:
  ...

Model Response:

users[2,]{name,age,company}:
  Alice,30,ACME
  Bob,25,XYZ

Processing:

from toon_format import decode

response = """users[2,]{name,age,company}:
  Alice,30,ACME
  Bob,25,XYZ"""

data = decode(response)
# {'users': [
#   {'name': 'Alice', 'age': 30, 'company': 'ACME'},
#   {'name': 'Bob', 'age': 25, 'company': 'XYZ'}
# ]}

Use Case 2: Configuration Generation

Prompt:

Generate a server configuration in TOON format with:
- app: "myapp"
- port: 8080
- database settings (host, port, name)
- enabled features: ["auth", "logging", "cache"]

Model Response:

app: myapp
port: 8080
database:
  host: localhost
  port: 5432
  name: myapp_db
features[3]: auth,logging,cache

Processing:

config = decode(response)
# Use config dict directly in your application

Use Case 3: API Response Formatting

Prompt:

Convert this data to TOON format for efficient transmission:

Products:
1. Widget A (\$9.99, stock: 50)
2. Widget B (\$14.50, stock: 30)
3. Widget C (\$19.99, stock: 0)

Model Response:

products[3,]{id,name,price,stock}:
  1,"Widget A",9.99,50
  2,"Widget B",14.50,30
  3,"Widget C",19.99,0

Advanced Techniques

1. Few-Shot Learning

Provide examples in your prompt:

Convert the following to TOON format. Examples:

Input: {"name": "Alice", "age": 30}
Output:
name: Alice
age: 30

Input: [{"id": 1, "item": "A"}, {"id": 2, "item": "B"}]
Output:
[2,]{id,item}:
  1,A
  2,B

Now convert this: <your data>

2. Validation Instructions

Add explicit validation rules:

Respond in TOON format. Rules:
1. Array lengths MUST match actual count: [3] means exactly 3 items
2. Tabular arrays require uniform keys across all objects
3. Use quotes for: empty strings, keywords (null/true/false), numeric strings
4. Indentation: 2 spaces per level

If you cannot provide valid TOON, respond with an error message.

3. Delimiter Selection

Choose delimiters based on your data:

# For data with commas (addresses, descriptions)
encode(data, {"delimiter": "\t"})  # Use tab

# For data with tabs (code snippets)
encode(data, {"delimiter": "|"})   # Use pipe

# For general use
encode(data, {"delimiter": ","})   # Use comma (default)

Tell the model which delimiter to use:

"Use tab-separated values in tabular arrays due to commas in descriptions."

Error Handling

Graceful Degradation

Always wrap TOON decoding in error handling:

from toon_format import decode, ToonDecodeError

def safe_decode(toon_str):
    try:
        return decode(toon_str)
    except ToonDecodeError as e:
        print(f"TOON decode error: {e}")
        # Fall back to asking model to regenerate
        return None

Model Error Prompting

If decoding fails, ask the model to fix it:

The TOON you provided has an error: "Expected 3 items, but got 2"

Please regenerate with correct array lengths. Original:
items[3]: a,b

Should be either:
items[2]: a,b  (fix length)
OR
items[3]: a,b,c  (add missing item)

Token Efficiency Best Practices

1. Prefer Tabular Format

Less efficient (list format):

users[3]:
  - id: 1
    name: Alice
  - id: 2
    name: Bob
  - id: 3
    name: Charlie

More efficient (tabular format):

users[3,]{id,name}:
  1,Alice
  2,Bob
  3,Charlie

2. Minimize Nesting

Less efficient:

data:
  metadata:
    items:
      list[2]: a,b

More efficient:

items[2]: a,b

3. Use Compact Keys

Less efficient:

user_identification_number: 123
user_full_name: Alice

More efficient:

id: 123
name: Alice

Common Pitfalls

❌ Don't: Trust Model Without Validation

# BAD: No validation
response = llm.generate(prompt)
data = decode(response)  # May raise error

# GOOD: Validate and handle errors
response = llm.generate(prompt)
try:
    data = decode(response, {"strict": True})
except ToonDecodeError:
    # Retry or fall back

❌ Don't: Mix Formats Mid-Conversation

First response: JSON
Second response: TOON

Be consistent - stick to TOON throughout the conversation.

❌ Don't: Forget Quoting Rules

Model might produce:

code: 123  # Wrong! Numeric string needs quotes

Should be:

code: "123"  # Correct

Solution: Explicitly mention quoting in prompts.

Integration Examples

With OpenAI API

import openai
from toon_format import decode

def ask_for_toon_data(prompt):
    response = openai.ChatCompletion.create(
        model="gpt-5",
        messages=[
            {"role": "system", "content": "Respond using TOON format"},
            {"role": "user", "content": prompt}
        ]
    )

    toon_str = response.choices[0].message.content

    # Extract TOON from code blocks if wrapped
    if "```toon" in toon_str:
        toon_str = toon_str.split("```toon")[1].split("```")[0].strip()
    elif "```" in toon_str:
        toon_str = toon_str.split("```")[1].split("```")[0].strip()

    return decode(toon_str)

With Anthropic Claude API

import anthropic
from toon_format import decode

def claude_toon(prompt):
    client = anthropic.Anthropic()

    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        messages=[{
            "role": "user",
            "content": f"{prompt}\n\nRespond in TOON format (Token-Oriented Object Notation)."
        }]
    )

    toon_str = message.content[0].text

    # Remove code blocks if present
    if "```" in toon_str:
        toon_str = toon_str.split("```")[1].strip()
        if toon_str.startswith("toon\n"):
            toon_str = toon_str[5:]

    return decode(toon_str)

Performance Metrics

Based on testing with gpt5 and Claude:

Data Type	JSON Tokens	TOON Tokens	Reduction
Simple config (10 keys)	45	28	38%
User list (50 users)	892	312	65%
Nested structure	234	142	39%
Mixed arrays	178	95	47%

Average reduction: 30-60% depending on data structure and tokenizer.

Note: Comprehensive benchmarks across gpt5, gpt5-mini, and other models are coming soon. See the roadmap for details.

Debugging Tips

1. Log Raw TOON

Always log the raw TOON before decoding:

print("Raw TOON from model:")
print(repr(toon_str))

try:
    data = decode(toon_str)
except ToonDecodeError as e:
    print(f"Decode error: {e}")

2. Test with Strict Mode

Enable strict validation during development:

decode(toon_str, {"strict": True})  # Strict validation

Disable for production if lenient parsing is acceptable:

decode(toon_str, {"strict": False})  # Lenient

3. Validate Against Schema

After decoding, validate the Python structure:

data = decode(toon_str)

# Validate structure
assert "users" in data
assert isinstance(data["users"], list)
assert all("id" in user for user in data["users"])

Resources

Format Specification - Complete TOON syntax reference
API Reference - Function documentation
Official Spec - Normative specification
Benchmarks - Token efficiency analysis

Summary

Key Takeaways:

Explicit prompting - Tell the model to use TOON format clearly
Validation - Always validate model output with error handling
Examples - Provide few-shot examples in prompts
Consistency - Use TOON throughout the conversation
Tabular format - Prefer tabular arrays for maximum efficiency
Error recovery - Handle decode errors gracefully

TOON can reduce LLM costs by 30-60% while maintaining readability and structure. Start with simple use cases and expand as you become familiar with the format.