README.md
May 21, 2025 ยท View on GitHub
๐ค HuggingFace Demoย ย | ย ย ๐ Homepageย ย | ย ย ๐ arXiv
Today, we are excited to introduce Seed1.5-VL ๐, a powerful and efficient vision-language foundation model designed for advanced general-purpose multimodal understanding and reasoning.
๐ Highlights
- ๐ง Efficient Powerhouse: Achieves top performance with a relatively modest architecture, 532M vision encoder & 20B active parameter MoE LLM.
- ๐ Exceptional Benchmark Performance: Delivers State-of-the-Art results on 38 out of 60 public VLM benchmarks, demonstrating broad competence.
- ๐ก Versatile Capabilities: Excels across diverse capabilities including complex reasoning (e.g., visual puzzles like Rebus), OCR, diagram understanding, visual grounding, 3D spatial understanding, and video comprehension.
- ๐ค Advanced Agent-Centric Abilities: Demonstrates leading performance in interactive agent tasks, showcasing strong capabilities in GUI control and gameplay.
This repository offers usage cookbook and best practices designed to help developers effectively use Seed1.5-VL.
๐ข News
2025-05-13:We have deployed our Seed1.5-VL on ๐ค HuggingFace Spaces, Welcome to try out our model!2025-05-12:We have released the Seed1.5-VL Technical Report.2025-05-12:We are extremely delighted to release the flagship Seed1.5-VL on Volcano Engine. The Model ID isdoubao-1-5-thinking-vision-pro-250428. You can try it now!
๐ฎ Notice
Call for Bad Cases: If you have encountered any cases where the model performs poorly, we would greatly appreciate it if you could share them in the issue [https://github.com/ByteDance-Seed/Seed1.5-VL/issues/12]
๐ Seed1.5-VL Cookbook
The Seed1.5-VL cookbook is designed to help you start using the Seed1.5-VL API with diverse code samples. Our flagship Seed1.5-VL has been deployed on Volcano Engine. After obtaining your API_KEY, you can use the examples in this cookbook to rapidly understand and leverage the diverse capabilities of our Seed1.5-VL.
Quick Start
- Cookbook for online/offline Gradio Demo
- Cookbook for turning on/off LongCoT
- Cookbook for 2D Grounding
- Cookbook for 3D Understanding
- Cookbook for Video Understanding
- Cookbook for GUI Agents
Citations
If you Seed1.5-VL useful in your research or applications, please consider giving us a star ๐ and citing it by the following BibTeX entry.
@article{seed2025seed1_5vl,
title={Seed1.5-VL Technical Report},
author={ByteDance Seed Team},
journal={arXiv preprint arXiv:2505.07062},
year={2025}
}
License
This repo is under Apache-2.0 License.
About ByteDance Seed Team
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.