Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models

March 20, 2026 ยท View on GitHub

License Python 3.8+ PyTorch

A reinforcement-learning-based visual token pruning framework to accelerate inference of Large Vision Language Models (LVLMs).

๐Ÿ“‹ Method Overview

TPRL formulates visual token pruning as a Markov Decision Process (MDP):

  1. Learning from Demonstrations (LfD): Generate demonstration trajectories using heuristics and pretrain the policy network.
  2. PPO Fine-tuning: Fine-tune the policy with Proximal Policy Optimization to jointly optimize task performance and computational efficiency.
  3. Inference: One-shot pruning that retains the most important visual tokens.

Architecture

visual input โ†’ ViT โ†’ Projector โ†’ [TPRL pruner] โ†’ LLM โ†’ output

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/MagicVicCoder/TPRL.git
cd TPRL

# Install requirements
pip install -r requirements.txt

Training

Step 1: Learning from Demonstrations

python train_lfd.py

Step 2: PPO Training

# Set the LfD checkpoint path in config.py first
python train_ppo.py

Evaluation

python main.py

๐Ÿ“ Project Structure

TPRL/
โ”œโ”€โ”€ model/
โ”‚   โ”œโ”€โ”€ autoencoder.py      # Token compression (optional)
โ”‚   โ”œโ”€โ”€ rl_networks.py      # Policy and value networks
โ”‚   โ”œโ”€โ”€ llava_mllm.py       # LLaVA model wrapper
โ”‚   โ””โ”€โ”€ qwen_mllm.py        # Qwen model wrapper
โ”œโ”€โ”€ pruner/
โ”‚   โ”œโ”€โ”€ rl_pruner.py        # RL-based pruner
โ”‚   โ”œโ”€โ”€ random_pruner.py    # Baseline random pruner
โ”‚   โ””โ”€โ”€ mlp_pruner.py       # MLP-based pruner
โ”œโ”€โ”€ train_lfd.py            # LfD training script
โ”œโ”€โ”€ train_ppo.py            # PPO training script
โ”œโ”€โ”€ config.py               # Configuration
โ””โ”€โ”€ main.py                 # Evaluation / inference script

๐ŸŽฏ Core Idea

MDP Formulation

  • State: (visual tokens, text query)
  • Action: keep / prune decision for each token
  • Reward: downstream task performance + computational efficiency

Reward Function

reward = alpha * task_reward + beta * efficiency_reward
  • task_reward: change in task performance (e.g., IoU / accuracy)
  • efficiency_reward: compression / efficiency metric

๐Ÿ› ๏ธ Requirements

  • Python >= 3.8
  • PyTorch >= 2.0
  • Transformers >= 4.37.0
  • See requirements.txt for full dependency list

โญ If you find this repository useful, please give it a Star!

๐Ÿ“„ Citation

If you find this work useful, please cite:

@misc{cao2026languageguidedtokencompressionreinforcement,
  title={Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models},
  author={Sihan Cao and Jianwei Zhang and Pengcheng Zheng and Jiaxin Yan and Caiyan Qin and Yalan Ye and Wei Dong and Peng Wang and Yang Yang and Chaoning Zhang},
  year={2026},
  eprint={2603.13394},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2603.13394}
}