TBAC-UniImage-3B
August 14, 2025 ยท View on GitHub

Overview
This repository contains the official model checkpoints of TBAC-UniImage-3B, an unified understanding and generation model developed by Basic Algorithm Center, Platform and Content Group, Tencent.
Our model is composed of two components: the Qwen2.5-VL-3B-Instruct serves as the understanding module, while the SANA-1600M acts as the generation module. The conditions for generation are originate from representations of different Qwen2.5-VL-3B-Instruct layers.

Update
2025.8.14 Update Image-Text-to-Image results.
2025.8.13 Released training code.
Text-to-Image Generation Performance
Qualitative Results

GenEval and DPG-Bench
| Method | Base (M)LLM | GenEval | DPG-Bench |
|---|---|---|---|
| MetaQuery | Qwen2.5-VL-3B-Instruct | 0.78 | 81.10 |
| Qwen2.5-VL-7B-Instruct | 0.80 | 82.05 | |
| BILP-3o | Qwen2.5-VL-3B-Instruct | 0.81 | 79.36 |
| Qwen2.5-VL-7B-Instruct | 0.83 | 80.73 | |
| BAGEL | MoT-7B | 0.82 | - |
| Show-o2 | Qwen2.5-1.5B-Instruct | 0.73 | 85.02 |
| Qwen2.5-7B-Instruct | 0.76 | 86.14 | |
| Tar | Qwen2.5-1.5B-Instruct | 0.76 | 82.96 |
| Qwen2.5-7B-Instruct | 0.84 | 84.65 | |
| Qwen-Image | Qwen2.5-VL-7B-Instruct | 0.87 | 88.32 |
| Ours | Qwen2.5-VL-3B-Instruct | 0.87 | 80.97 |
TIIF-Bench

Image Editing Performance
The input image is processed by the Qwen2.5-VL image encoder and then fed into the MLLM along with text and learnable queries. We use only the learnable queries, which have fused the multimodal information, as the generative condition, without directly incorporating any image VAE representations like other works. Despite this, the model still achieves promising multimodal understanding and consistency performance in Image Editing tasks. This accomplishment validates the feasibility of high-fidelity image editing using only the intrinsic features of an MLLM, without external generative priors.
Qualitative Results

ImgEdit

Installation
pip install -r requirements.txt
Quick Start
## Inference
python app.py --checkpoint_path TencentBAC/TBAC-UniImage-3B
## Train
sh train.sh
Acknowledgements
The training and inference codes are modified from MetaQuery. We thank them for their contribution!
About
Created by the Tencent PCG Basic Algorithm Center. All rights reserved.