AVP Binary Format Specification

April 3, 2026 ยท View on GitHub

Message Structure

Every AVP message consists of three parts:

+---------------------+
|   Header (12 bytes) |
+---------------------+
|   Metadata          |
+---------------------+
|   Payload           |
+---------------------+

Header (12 bytes)

Byte 0-1:  Magic number (0x4156 = "AV" in ASCII)
Byte 2:    Protocol version (0x01)
Byte 3:    Flags
           Bit 0: Compressed (0=no, 1=yes, zstd)
           Bit 1: Has AVP map (0=no, 1=yes, cross-model projection)
           Bit 2: KV-cache payload (0=no, 1=yes)
           Bit 3-7: Reserved
Byte 4-7:  Payload length (uint32, little-endian)
           Total bytes after header (metadata + tensor data)
Byte 8-11: Metadata length (uint32, little-endian)
           Length of the Protocol Buffer metadata section

Note on payload length: The payload_length field (bytes 4-7) encodes the total number of bytes following the header, i.e. len(metadata) + len(tensor_bytes). The metadata_length field (bytes 8-11) allows the decoder to locate the boundary between metadata and tensor data without parsing the protobuf first.

Metadata (Variable length)

Protocol Buffer encoded metadata. See schemas/avp.proto for the canonical schema.

Fields:

FieldNumberTypeDescription
session_id1stringSession identifier from handshake
source_agent_id2stringSender agent identifier
target_agent_id3stringRecipient agent identifier
model_id4stringModel that produced the payload, e.g. "meta-llama/Llama-2-7b"
hidden_dim5uint32Hidden state dimensionality, e.g. 4096
num_layers6uint32Number of transformer layers
payload_type7PayloadTypeHIDDEN_STATE (0) or KV_CACHE (1)
dtype8DataTypeFLOAT32 (0), FLOAT16 (1), BFLOAT16 (2), INT8 (3)
tensor_shape9repeated uint32Shape of the tensor payload
mode10CommunicationModeLATENT (0) or JSON_MODE (1)
compression11stringCompression algorithm if compressed, e.g. "zstd"
avp_map_id13stringCross-model projection map identifier. Format: "vocab:{tokenizer_hash[:16]}" for vocabulary-mediated, "vocab_overlap:{overlap_count}" for vocabulary-overlap, "{src_hash[:16]}_{tgt_hash[:16]}" for pre-calibrated maps. Empty for same-model communication.
extra14map<string,string>Extensible key-value pairs
payload_checksum15uint32 (optional)CRC32 of pre-compression payload bytes. Omit for same-process transfers. Decoders SHOULD verify when present and reject on mismatch.

Payload Types

HIDDEN_STATE (0): Raw hidden state tensor bytes from a transformer layer. Little-endian, dtype specified in metadata. Used for same-model latent communication where agents share intermediate representations.

KV_CACHE (1): Serialized key-value cache from transformer attention layers. Layout: [K_l0][V_l0][K_l1][V_l1]... where each tensor is contiguous little-endian. Preceded by a 17-byte KV-cache header (num_layers, num_kv_heads, head_dim, seq_len, dtype).

If the compressed flag is set, the payload is zstd-compressed.

Decoding Algorithm

1. Read 12 bytes -> header
2. Verify magic == 0x4156
3. Verify version == 0x01
4. Read payload_length bytes after header
5. First metadata_length bytes -> parse as Protobuf Metadata
6. Remaining bytes -> raw tensor payload (decompress if flag bit 0 set)
7. Interpret payload using payload_type, dtype, and tensor_shape from metadata

Example

4096-dimensional float32 hidden state:

  • Header: 12 bytes
  • Metadata: ~50 bytes
  • Payload: 16,384 bytes (4096 x 4)
  • Total: ~16,446 bytes

Size Comparison (measured, hidden state payloads)

DimensionsdtypeAVP (bytes)AVP+zstdJSON (bytes)Ratio
384float321,5671,5157,9635.1x
768float323,1032,93115,9305.1x
1,024float324,1273,85521,1905.1x
4,096float3216,41515,21284,6545.2x
384float167998155,8027.3x
4,096float168,2237,63861,2137.4x

Note: zstd compression provides significant savings for embeddings but is less effective for hidden states and KV-cache data (typically 1-7% reduction). The primary value of latent communication is skipping autoregressive generation, not bandwidth savings.