Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models
March 20, 2026 ยท View on GitHub
A reinforcement-learning-based visual token pruning framework to accelerate inference of Large Vision Language Models (LVLMs).
๐ Method Overview
TPRL formulates visual token pruning as a Markov Decision Process (MDP):
- Learning from Demonstrations (LfD): Generate demonstration trajectories using heuristics and pretrain the policy network.
- PPO Fine-tuning: Fine-tune the policy with Proximal Policy Optimization to jointly optimize task performance and computational efficiency.
- Inference: One-shot pruning that retains the most important visual tokens.
Architecture
visual input โ ViT โ Projector โ [TPRL pruner] โ LLM โ output
๐ Quick Start
Installation
# Clone the repository
git clone https://github.com/MagicVicCoder/TPRL.git
cd TPRL
# Install requirements
pip install -r requirements.txt
Training
Step 1: Learning from Demonstrations
python train_lfd.py
Step 2: PPO Training
# Set the LfD checkpoint path in config.py first
python train_ppo.py
Evaluation
python main.py
๐ Project Structure
TPRL/
โโโ model/
โ โโโ autoencoder.py # Token compression (optional)
โ โโโ rl_networks.py # Policy and value networks
โ โโโ llava_mllm.py # LLaVA model wrapper
โ โโโ qwen_mllm.py # Qwen model wrapper
โโโ pruner/
โ โโโ rl_pruner.py # RL-based pruner
โ โโโ random_pruner.py # Baseline random pruner
โ โโโ mlp_pruner.py # MLP-based pruner
โโโ train_lfd.py # LfD training script
โโโ train_ppo.py # PPO training script
โโโ config.py # Configuration
โโโ main.py # Evaluation / inference script
๐ฏ Core Idea
MDP Formulation
- State: (visual tokens, text query)
- Action: keep / prune decision for each token
- Reward: downstream task performance + computational efficiency
Reward Function
reward = alpha * task_reward + beta * efficiency_reward
task_reward: change in task performance (e.g., IoU / accuracy)efficiency_reward: compression / efficiency metric
๐ ๏ธ Requirements
- Python >= 3.8
- PyTorch >= 2.0
- Transformers >= 4.37.0
- See
requirements.txtfor full dependency list
โญ If you find this repository useful, please give it a Star!
๐ Citation
If you find this work useful, please cite:
@misc{cao2026languageguidedtokencompressionreinforcement,
title={Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models},
author={Sihan Cao and Jianwei Zhang and Pengcheng Zheng and Jiaxin Yan and Caiyan Qin and Yalan Ye and Wei Dong and Peng Wang and Yang Yang and Chaoning Zhang},
year={2026},
eprint={2603.13394},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.13394}
}