TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

April 28, 2026 ยท View on GitHub

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Project Website arXiv

Update 2026-04-27:

We are glad to share that we have released Tuna-2, a new encoder-free unified multimodal model that does not depend on any pretrained VAE or representation encoder components. Please check out these resources if you are interested!

We have also released Tuna's inference code in the tuna-2 repo. We are currently working on model checkpoint release for both Tuna and Tuna-2.