VCM: Vision Concept Modeling with Adaptive Vision Token Compression via Instruction Fine-Tuning
May 10, 2026 · View on GitHub
The official repository for "VCM: Vision Concept Modeling with Adaptive Vision Token Compression via Instruction Fine-Tuning".
🤗 VCM-7B (Coming Soon)  |   🤗 VCM-13B (Coming Soon)  |   📑 Paper Â
Introduction
VCM (Vision Concept Modeling) is a novel framework designed to enhance the efficiency of Large Multimodal Models (LMMs). By introducing adaptive vision token compression during the instruction fine-tuning stage, VCM dynamically identifies and preserves essential visual concepts while reducing redundant tokens. This approach significantly lowers computational overhead without compromising performance on downstream multimodal tasks.
Installation and Setup
VCM is built upon the LLaVA framework. To use VCM, please follow these steps:
-
Clone the official LLaVA repository:
git clone https://github.com/haotian-liu/LLaVA.git cd LLaVA -
Install the environment: Follow the original LLaVA installation instructions to set up your Python environment and dependencies.
-
Apply VCM Modifications: Replace the original
llava_arch.pyfile in the LLaVA source code with the one provided in this repository:cp path/to/vcm/llava_arch.py llava/model/llava_arch.py
Training and Inference
Once you have replaced the architecture file, you can follow the standard LLaVA training and inference pipelines. VCM will automatically handle the adaptive token compression based on the Vision Concept Modeling logic during the forward pass.
Refer to the LLaVA Documentation for detailed commands on:
- Pre-training (Feature Alignment)
- Visual Instruction Tuning
Evaluation
We utilize the lmms-eval toolkit for comprehensive benchmarking.
Citation
If you find VCM useful for your research, please cite our paper:
@article{vcm2025,
title={VCM: Vision Concept Modeling with Adaptive Vision Token Compression via Instruction Fine-Tuning},
author={Run Luo and Renke Shan and Longze Chen and Ziqiang Liu and Lu Wang and Min Yang and Xiaobo Xia},
journal={arXiv preprint arXiv:2504.19627},
year={2025}
}