TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

April 28, 2026 · View on GitHub

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Update 2026-04-27:

We are glad to share that we have released Tuna-2, a new encoder-free unified multimodal model that does not depend on any pretrained VAE or representation encoder components. Please check out these resources if you are interested!

Project Website
Paper link
Code

We have also released Tuna's inference code in the tuna-2 repo. We are currently working on model checkpoint release for both Tuna and Tuna-2.