TOON Format Specification
November 4, 2025 · View on GitHub
Detailed format rules, syntax, and examples for TOON (Token-Oriented Object Notation).
Overview
TOON uses indentation-based structure like YAML for nested objects and tabular format like CSV for uniform arrays. This document explains the complete syntax and formatting rules.
Objects
Objects use key: value pairs with indentation for nesting.
Simple Objects
{"name": "Alice", "age": 30, "active": True}
name: Alice
age: 30
active: true
Nested Objects
{
"user": {
"name": "Alice",
"settings": {
"theme": "dark"
}
}
}
user:
name: Alice
settings:
theme: dark
Object Keys
Keys follow identifier rules or must be quoted:
{
"simple_key": 1,
"with-dash": 2,
"123": 3, # Numeric key
"with space": 4, # Spaces require quotes
"": 5 # Empty key requires quotes
}
simple_key: 1
with-dash: 2
"123": 3
"with space": 4
"": 5
Arrays
All arrays include length indicator [N] for validation.
Primitive Arrays
Arrays of primitives use inline format with comma separation:
[1, 2, 3, 4, 5]
[5]: 1,2,3,4,5
["alpha", "beta", "gamma"]
[3]: alpha,beta,gamma
Note: Comma delimiter is hidden in primitive arrays: [5]: not [5,]:
Tabular Arrays
Uniform objects with primitive-only fields use CSV-like format:
[
{"id": 1, "name": "Alice", "age": 30},
{"id": 2, "name": "Bob", "age": 25},
{"id": 3, "name": "Charlie", "age": 35}
]
[3,]{id,name,age}:
1,Alice,30
2,Bob,25
3,Charlie,35
Tabular Format Rules:
- All objects must have identical keys
- All values must be primitives (no nested objects/arrays)
- Field order in header determines column order
- Delimiter appears in header:
[N,]or[N|]or[N\t]
List Arrays
Non-uniform or nested arrays use list format with - markers:
[
{"name": "Alice"},
42,
"hello"
]
[3]:
- name: Alice
- 42
- hello
Nested Arrays
{
"matrix": [
[1, 2, 3],
[4, 5, 6]
]
}
matrix[2]:
- [3]: 1,2,3
- [3]: 4,5,6
Empty Arrays
{"items": []}
items[0]:
Delimiters
Three delimiter options for array values:
Comma (Default)
encode([1, 2, 3]) # Default delimiter
[3]: 1,2,3
For tabular arrays, delimiter shown in header:
users[2,]{id,name}:
1,Alice
2,Bob
Tab
encode([1, 2, 3], {"delimiter": "\t"})
[3 ]: 1 2 3
Tabular with tab:
users[2 ]{id,name}:
1 Alice
2 Bob
Pipe
encode([1, 2, 3], {"delimiter": "|"})
[3|]: 1|2|3
Tabular with pipe:
users[2|]{id,name}:
1|Alice
2|Bob
String Quoting Rules
Strings are quoted only when necessary to avoid ambiguity.
Unquoted Strings (Safe)
"hello" # Simple identifier
"hello world" # Internal spaces OK
"user_name" # Underscores OK
"hello-world" # Hyphens OK
hello
hello world
user_name
hello-world
Quoted Strings (Required)
Empty strings:
""
""
Reserved keywords:
"null"
"true"
"false"
"null"
"true"
"false"
Numeric-looking strings:
"42"
"-3.14"
"1e5"
"0123" # Leading zero
"42"
"-3.14"
"1e5"
"0123"
Leading/trailing whitespace:
" hello"
"hello "
" hello "
" hello"
"hello "
" hello "
Structural characters:
"key: value" # Colon
"[array]" # Brackets
"{object}" # Braces
"- item" # Leading hyphen
"key: value"
"[array]"
"{object}"
"- item"
Delimiter characters:
# When using comma delimiter
"a,b"
"a,b"
Control characters:
"line1\nline2"
"tab\there"
"line1\nline2"
"tab\there"
Escape Sequences
Inside quoted strings:
| Sequence | Meaning |
|---|---|
\" | Double quote |
\\ | Backslash |
\n | Newline |
\r | Carriage return |
\t | Tab |
\uXXXX | Unicode character (4 hex digits) |
Example:
{
"text": "Hello \"world\"\nNew line",
"path": "C:\\Users\\Alice"
}
text: "Hello \"world\"\nNew line"
path: "C:\\Users\\Alice"
Primitives
Numbers
Integers:
42
-17
0
42
-17
0
Floats:
3.14
-0.5
0.0
3.14
-0.5
0
Special Numbers:
- Scientific notation accepted in decoding:
1e5,-3.14E-2 - Encoders must NOT use scientific notation - always decimal form
- Negative zero normalized:
-0.0→0 - Non-finite values → null:
Infinity,-Infinity,NaN→null
Large integers (>-1):
9007199254740993 # Exceeds JS safe integer
"9007199254740993" # Quoted for JS compatibility
Booleans
True # true in TOON (lowercase)
False # false in TOON (lowercase)
true
false
Null
None # null in TOON (lowercase)
null
Indentation
Default: 2 spaces per level (configurable)
{
"level1": {
"level2": {
"level3": "value"
}
}
}
level1:
level2:
level3: value
With 4-space indent:
encode(data, {"indent": 4})
level1:
level2:
level3: value
Strict mode rules:
- Indentation must be consistent multiples of
indentvalue - Tabs not allowed in indentation
- Mixing spaces and tabs causes errors
Array Length Indicators
All arrays include [N] to indicate element count for validation.
Without Length Marker (Default)
items[3]: a,b,c
users[2,]{id,name}:
1,Alice
2,Bob
With Length Marker (#)
encode(data, {"lengthMarker": "#"})
items[#3]: a,b,c
users[#2,]{id,name}:
1,Alice
2,Bob
The # prefix makes length indicators more explicit for validation-focused use cases.
Blank Lines
Within arrays: Blank lines are not allowed in strict mode
# ❌ Invalid (blank line in array)
items[3]:
- a
- b
- c
# ✅ Valid (no blank lines)
items[3]:
- a
- b
- c
Between top-level keys: Blank lines are allowed and ignored
# ✅ Valid (blank lines between objects)
name: Alice
age: 30
Comments
TOON does not support comments. The format prioritizes minimal syntax for token efficiency.
If you need to document TOON data, use surrounding markdown or separate documentation files.
Whitespace
Trailing Whitespace
Trailing whitespace on lines is allowed and ignored.
Leading Whitespace in Values
Leading/trailing whitespace in string values requires quoting:
{"text": " value "}
text: " value "
Order Preservation
Object key order and array element order are always preserved during encoding and decoding.
from collections import OrderedDict
data = OrderedDict([("z", 1), ("a", 2), ("m", 3)])
toon = encode(data)
z: 1
a: 2
m: 3
Decoding preserves order:
decoded = decode(toon)
list(decoded.keys()) # ['z', 'a', 'm']
Complete Examples
Simple Configuration
{
"app": "myapp",
"version": "1.0.0",
"debug": False,
"port": 8080
}
app: myapp
version: "1.0.0"
debug: false
port: 8080
Nested Structure with Arrays
{
"metadata": {
"version": 2,
"author": "Alice"
},
"items": [
{"id": 1, "name": "Item1", "qty": 10},
{"id": 2, "name": "Item2", "qty": 5}
],
"tags": ["alpha", "beta", "gamma"]
}
metadata:
version: 2
author: Alice
items[2,]{id,name,qty}:
1,Item1,10
2,Item2,5
tags[3]: alpha,beta,gamma
Mixed Array Types
{
"data": [
{"type": "user", "id": 1},
{"type": "user", "id": 2, "extra": "field"}, # Non-uniform
42,
"hello"
]
}
data[4]:
- type: user
id: 1
- type: user
id: 2
extra: field
- 42
- hello
Token Efficiency Comparison
JSON (177 chars):
{"users":[{"id":1,"name":"Alice","age":30,"active":true},{"id":2,"name":"Bob","age":25,"active":true},{"id":3,"name":"Charlie","age":35,"active":false}]}
TOON (85 chars, 52% reduction):
users[3,]{id,name,age,active}:
1,Alice,30,true
2,Bob,25,true
3,Charlie,35,false
See Also
- API Reference - Complete function documentation
- LLM Integration - Best practices for LLM usage
- Official Specification - Normative spec