README.md

April 12, 2026 Β· View on GitHub

Awesome MLLM Compression

This repository contains a regularly updated paper list for MLLM Compression.

TechRxiv Awesome License Contributions Last Commit

From Data to Model: A Survey of the Compression Lifecycle in MLLMs

Hao Wu*,1, Junlong Tong*,1,2, Xudong Wang1, Yang Tan3, Changyu Zeng1, Anastasia Antsiferova1, Xiaoyu Shen†,1

1Institute of Digital Twin, Eastern Institute of Technology, Ningbo

2Shanghai Jiao Tong University, 3Southeast University, 4Innopolis University

* Core Contribution, † Corresponding Author.

Contact: haowu.ai.research@gmail.com, xyshen@eitech.edu.cn

If you find our paper of this resource helpful, please consider cite:

@article{Wu_2026,
    title={From Data to Model: A Survey of the Compression Lifecycle in MLLMs},
    url={http://dx.doi.org/10.36227/techrxiv.177220375.55495124/v1},
    DOI={10.36227/techrxiv.177220375.55495124/v1},
    publisher={Institute of Electrical and Electronics Engineers (IEEE)},
    author={Wu, Hao and Tong, Junlong and Wang, Xudong and Tan, Yang and Zeng, Changyu and Antsiferova, Anastasia and Shen, Xiaoyu},
    year={2026},
    month=feb 
}

Important

We actively maintain this repository and welcome community contributions. If you would like to:

  • Add newly released MLLM compression papers
  • Propose refinements to our taxonomy
  • Correct or update existing entries
  • Discuss classification or methodology

Please submit a pull request or contact the authors.

πŸ”₯News

  • [2026.02.27] The preprint is now published!

πŸ’‘ Highlights

  • Lifecycle perspective for MLLM compression: We introduce a Data-to-Model view that organizes compression methods according to where compression occurs in the MLLM pipeline, including the Input, Encoder, Projector, and LLM stages.
  • Five fundamental compression operations: We distill existing methods into five fundamental operations: Dropping, Aggregation, Encoding, Resampling, and Skipping, providing a unified abstraction for analyzing compression strategies.
  • Joint compression across efficiency dimensions: We advocate jointly considering token compression, operation compression, and KV cache compression as complementary strategies for improving the efficiency of MLLMs.
  • Cross-level compression coordination: We advocate that coordinated compression across multiple pipeline levels provides a more effective way to balance efficiency and model performance.
  • Beyond efficiency-oriented compression: We argue that compression should not be viewed solely as an efficiency technique, but also as a design principle that can reshape representations, architectures, and multimodal processing in MLLMs.

πŸ“š Contents

  • News: Latest updates, news, and announcements.
  • Highlights: Core insights and perspectives that this survey aims to emphasize.
  • Tag Description: Brief explanation of tags in this repository.
  • Libraries: A collection of MLLM compression papers compiled in this repository.
  • License: License information for this repository.
  • Acknowledgments: Credits to projects and contributors that inspired or supported this work.
  • Contact: Contact information for questions, feedback, or collaboration.
  • Related Projects: Research projects from our group (EIT-NLP) related to MLLM compression.

πŸ“‹ Tag Description

  • Preprint for preprint papers.
  • PDF for conference or journal papers.
  • GitHub for GitHub repositories.
  • Area for research areas (primarily categorized by modality).
  • Level for compression positions (i.e., Input, Encoder, Projector, LLM)
  • Op for compression operation types (i.e., Dropping, Aggregation, Encoding, Resampling, Skipping)
  • Mech for specific compression mechanisms (the third level in our taxonomy).
  • Target for compression dimensions (i.e., Token Compression, Operation Compression, KV Cache Compression)
  • Cost for training cost (i.e., Training-Free, Retraining, Post-Training).

🌟 Libraries

πŸ”” Please check out the papers by selecting the sub-area you are interested in. Within each sub-area, papers are organized according to our compression taxonomy. The main page presents all survey papers, together with major conference (i.e., ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP, NAACL) papers from the past year and recently released papers within the last six months. Note that papers already included in the major conference papers from the past year are excluded from the recent papers.

Survey

Title & Authors & LinksDateTaxonomyHighlight
PDF Preprint
Efficient Inference for Large Vision-Language Models: Bottlenecks, Techniques, and Prospects
Jun Zhang, Yicheng Ji, Feiyang Ren, Yihang Li, Bowen Zeng, Zonghao Chen, Ke Chen, Lidan Shou, Gang Chen, Huan Li
26.4.07
Preprint GitHub
From Data to Model: A Survey of the Compression Lifecycle in MLLMs
Hao Wu, Junlong Tong, Xudong Wang, Yang Tan, Changyu Zeng, Anastasia Antsiferova, Xiaoyu Shen
26.2.27Compression position &
Compression operation &
Mechanisim
Compression Lifecycle
Preprint
Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification
Xin Jin, Jinming Liu, Yuntao Wei, Junyan Lin, Zhicheng Wang, Jianguo Huang, Xudong Yang, Yanxiao Liu, Wenjun Zeng
26.01.28Codec &
Token Technology
Compression
as Intelligence
Preprint GitHub
Towards Efficient Multimodal Large Language Models: A Survey on Token Compression
Linli Yao, Long Xing, Yang Shi, Sida Li, Yuanxin Liu, Yuhao Dong, Yi-Fan Zhang, Lei Li, Qingxiu Dong, Xiaoyi Dong, Qidong Huang, Haotian Wang, Feng Wu, Yuanxing Zhang, Pengfei Wan, Zhouchen Lin, Xu Sun
26.01.12Compression Position &
Mechanisim
-
PDF Preprint
Revisiting MLLM Token Technology through the Lens of Classical Visual Coding
Jinming Liu, Junyan Lin, Yuntao Wei, Kele Shao, Keda Tao, Jianguo Huang, Xudong Yang, Zhibo Chen, Huan Wang, Xin Jin
25.08.19Codec &
Token Technology
-
PDF Preprint GitHub
A Survey of Token Compression for Efficient Multimodal Large Language Models
Kele Shao, Keda Tao, Kejia Zhang, Sicheng Feng, Mu Cai, Yuzhang Shang, Haoxuan You, Can Qin, Yang Sui, Huan Wang
25.07.27Modality &
Mechanisim
Modality-centric
Preprint GitHub
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality
Zhenglun Kong, Yize Li, Fanhu Zeng, Lei Xin, Shvat Messica, Xue Lin, Pu Zhao, Manolis Kellis, Hao Tang, Marinka Zitnik
25.05.23Compression operationCompression
Beyond Efficiency

Recent Papers (Last 6 Months)

Image (TODO)
Title & Authors & LinksAreasTags
Video (TODO)
Title & Authors & LinksAreasTags
Audio
Title & Authors & LinksAreasTags
PDF Preprint GitHub
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
Umberto Cappellazzo, Xubo Liu, Pingchuan Ma, Stavros Petridis, Maja Pantic
Area
Area
Cost Level
Op Mech
Target
PDF Preprint
Segmentwise Pruning in Audio-Language Models
Marcel Gibier, RaphaΓ«l Duroselle, Pierre Serrano, Olivier Boeffard, Jean-FranΓ§ois Bonastre
AreaCost Level
Op Mech
Target
Preprint
Towards Audio Token Compression in Large Audio Language Models
Saurabhchand Bhati, Samuel Thomas, Hilde Kuehne, Rogerio Feris, James Glass
AreaCost Level
Op Mech
Target
3D
Title & Authors & LinksAreasTags
Preprint GitHub
OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer
Si-Yu Lu, Po-Ting Chen, Hui-Che Hsu, Sin-Ye Jhong, Wen-Huang Cheng, Yung-Yao Chen
AreaCost Level
Op Mech
Target Note
PDF Preprint GitHub
HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models
Liheng Zhang, Jin Wang, Hui Li, Bingfeng Zhang, Weifeng Liu
AreaCost Level
Op Target
Omni (TODO)
Title & Authors & LinksAreasTags

Published in Recent Conferences (Last 12 months)

CVPR 2026 (TODO)
Title & Authors & LinksAreasTags
PDF Preprint GitHub
UTPTrack: Towards Simple and Unified Token Pruning for Visual Tracking
Hao Wu, Xudong Wang, Jialiang Zhang, Junlong Tong, Xinghao Chen, Junyan Lin, Yunpu Ma, Xiaoyu Shen
AreaCost Level
Op Mech
Target
PDF Preprint GitHub
Prune2Drive: A Plug-and-Play Framework for Accelerating Vision-Language Models in Autonomous Driving
Minhao Xiong, Zichen Wen, Zhuangcheng Gu, Xuyang Liu, Rui Zhang, Hengrui Kang, Jiabing Yang, Junyuan Zhang, Weijia Li, Conghui He, Yafei Wang, Linfeng Zhang
AreaCost Level
Op Mech
Target
ICLR 2026 (TODO)
Title & Authors & LinksAreasTags
PDF Preprint GitHub
HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
Hao Wu, Yingqi Fan, Jinyang Dai, Junlong Tong, Yunpu Ma, Xiaoyu Shen
AreaCost Level
Op Mech
Op Mech
Target Target
EMNLP 2025 (TODO)
Title & Authors & LinksAreasTags
PDF Preprint GitHub
METok: Multi-Stage Event-based Token Compression for Efficient Long
Video Understanding

Mengyue Wang, Shuo Chen, Kristian Kersting, Volker Tresp, Yunpu Ma
AreaCost
Level Op
Mech Mech
Level Op
Mech Target
NeurIPS 2025 (TODO)
Title & Authors & LinksAreasTags
PDF Preprint GitHub
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen, Guoqiang Gong, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Guiguang Ding
AreaCost Level
Op Mech
Op Mech
Target
PDF Preprint GitHub
VQToken: Neural Discrete Token Representation Learning for Extreme Token
Reduction in Video Large Language Models

Haichao Zhang, Yun Fu
AreaCost Level
Op Mech
Op Target
PDF Preprint GitHub
Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior
Yulin Li, Haokun Gui, Ziyang Fan, Junjie Wang, Bin Kang, Bin Chen, Zhuotao Tian
AreaCost Level
Op Mech
Target
PDF Preprint GitHub
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language
Understanding

Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao, Lemeng Wu, Jun Chen, Chenchen Zhu,
Zechun Liu, Fanyi Xiao, Balakrishnan Varadarajan, Florian Bordes, Zhuang Liu, Hu Xu,
Hyunwoo J. Kim, Bilge Soran, Raghuraman Krishnamoorthi, Mohamed Elhoseiny,
Vikas Chandra
AreaCost Level
Op Mech
Level Op
Mech Mech
Op Mech
Target
ICCV 2025 (TODO)
Title & Authors & LinksAreasTags
PDF Preprint GitHub
Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning
Lizhen Xu, Xiuxiu Bai, Xiaojun Jia, Jianwu Fang, Shanmin Pang
AreaCost Level
Op Mech
Target
PDF Preprint
STORM: Token-Efficient Long Video Understanding for Multimodal LLMs
Jindong Jiang, Xiuyu Li, Zhijian Liu, Muyang Li, Guo Chen, Zhiqi Li, De-An Huang, Guilin Liu,
Zhiding Yu, Kurt Keutzer, Sungjin Ahn, Jan Kautz, Hongxu Yin, Yao Lu, Song Han, Wonmin Byeon
AreaCost Level
Op Mech
Target
PDF Preprint GitHub
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Haoji Zhang, Yiqin Wang, Yansong Tang, Yong Liu, Jiashi Feng, Jifeng Dai, Xiaojie Jin
Area
Note
Note
Cost Level
Op Mech
Op Mech
Target
PDF Preprint GitHub
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan
AreaCost Cost Level
Op Mech
Op Mech
Target
PDF Preprint GitHub
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
Mark Endo, Xiaohan Wang, Serena Yeung-Levy
AreaCost Level
Op Mech
Target
PDF Preprint GitHub
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective
Token Pruning in VLMs

Qizhe Zhang, Aosong Cheng, Ming Lu, Renrui Zhang, Zhiyong Zhuo, Jiajun Cao,
Shaobo Guo, Qi She, Shanghang Zhang
AreaCost Level
Op Mech Mech
Target
PDF Preprint GitHub
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong, Zhuoming Liu, Yin Li, Liwei Wang
Area
Area
Cost
Level Op
Mech
Level Op
Mech Target
ICML 2025 (TODO)
Title & Authors & LinksAreasTags
PDF Preprint GitHub
SparseVLM: Visual Token Sparsification for Efficient Vision-Language
Model Inference

Yuan Zhang, Chun-Kai Fan, Junpeng Ma, Wenzhao Zheng, Tao Huang, Kuan Cheng,
Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Shanghang Zhang
AreaCost Level
Op Mech
Op Mech
Target
ACL 2025 (TODO)
Title & Authors & LinksAreasTags
PDF Preprint GitHub
PruneVid: Visual Token Pruning for Efficient Video Large Language Models
Xiaohu Huang, Hao Zhou, Kai Han
AreaCost Level
Op Mech
Level Op
Mech Target
PDF Preprint GitHub
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo, Hyeongseop Rha, Se Jin Park, Yong Man Ro
Area
Area
Cost Level
Op Mech
Op Target
NAACL 2025
Title & Authors & LinksAreasTags
PDF Preprint
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language
Model Compression

Souvik Kundu, Anahita Bhiwandiwalla, Sungduk Yu, Phillip Howard, Tiep Le,
Sharath Nittur Sridhar, David Cobbley, Hao Kang, Vasudev Lal
AreaTarget Target
PDF Preprint GitHub
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Zhongwei Wan, Hui Shen, Xin Wang, Che Liu, Zheda Mai, Mi Zhang
AreaCost Level
Op Mech
Op Mech
Target
PDF Preprint
LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach
for Multi-modal Large Language Models

Yizheng Sun, Yanze Xin, Hao Li, Jingyuan Sun, Chenghua Lin, Riza Batista-Navarro
AreaCost Level
Op Mech
Target

πŸ“„ License

This project is released under the MIT License.

πŸ™ Acknowledgments

This repository is inspired by Awesome-Multimodal-Token-Compression, Awesome-Latent-CoT, and Awesome-Efficient-LLM.

βœ‰οΈ Contact

For questions, suggestions, or collaboration opportunities, please feel free to reach out: