EVE Series: Encoder-Free VLMs from BAAI

July 24, 2025 ยท View on GitHub

๐Ÿ’ก Motivation

  • Can we remove vision encoder from VLMs?

  • How to transfer an LLM to an encoder-free VLM efficiently and stably?

  • How to bridge the performance gap between encoder-free and encoder-based VLMs?

๐Ÿ“œ News

[2025/06] ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ EVEv2 has been accepted by ICCV 2025 (highlight) !
[2025/02] The paper, weights, and code of EVEv2 are released !
[2024/11] ๐Ÿ’ฅ๐Ÿ’ฅ๐Ÿ’ฅ EVEv2 has been completed !
[2024/09] ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ EVE has been accepted by NeurIPS 2024 (spotlight) !
[2024/06] The paper, weights, and code of EVE are released !
[2024/05] ๐Ÿ’ฅ๐Ÿ’ฅ๐Ÿ’ฅ EVE has been completed !

๐Ÿ’ก Highlights

  • ๐Ÿ”ฅ Superior Capability: An originated encoder-free LVLM with arbitrary image aspect ratio, outperforming the counterparts and approaching existing modular encoder-based LVLMs.

  • ๐Ÿ”ฅ Data Efficiency: Filter and recaption solely <100M publicly avaliable data from OpenImages, SAM, LAION, Datacomp for pre-training.

  • ๐Ÿ”ฅ Pioneering Route: We attempt to provide an efficient, transparent, and practical training strategy and procedure for developing a pure decoder-only architecture across modalities.

โœ’๏ธ Citation

If EVE series is helpful for your research, please consider star โญ and citation ๐Ÿ“ :

@article{diao2024EVE,
  title={Unveiling Encoder-Free Vision-Language Models},
  author={Diao, Haiwen and Cui, Yufeng and Li, Xiaotong and Wang, Yueze and Lu, Huchuan and Wang, Xinlong},
  journal={arXiv preprint arXiv:2406.11832},
  year={2024}
}
@article{diao2025EVEv2,
  title={EVEv2: Improved Baselines for Encoder-Free Vision-Language Models},
  author={Diao, Haiwen and Li, Xiaotong and Cui, Yufeng and Wang, Yueze and Deng, Haoge and Pan, Ting and Wang, Wenxuan and Lu, Huchuan and Wang, Xinlong},
  journal={arXiv preprint arXiv:2502.06788},
  year={2025}
}

๐Ÿ“„ License

The content of this project itself is licensed under LICENSE.