Token Compression for Vision Domain

August 10, 2025 · View on GitHub

Token Compression for Vision Domain

:loudspeaker: Collections of Awesome Token Compression for Vision Understanding Domain Resources.

📚 Contents

Image Recognition

2021

  • [1] IA-RED²: Interpretability-aware Redundancy Reduction for Vision Transformers, NeurIPS 2021.

    Pan, Bowen and Panda, Rameswar and Jiang, Yifan and Wang, Zhangyang and Feris, Rogerio and Oliva, Aude.

    [Paper] [Code]

    BibTex
    @inproceedings{Pan2021:IA-RED2,
      title={IA-RED $\^{} 2$: Interpretability-aware redundancy reduction for vision transformers},
      author={Pan, Bowen and Panda, Rameswar and Jiang, Yifan and Wang, Zhangyang and Feris, Rogerio and Oliva, Aude},
      booktitle=NIPS,
      volume={34},
      pages={24898--24911},
      year={2021}
    }
    
  • [2] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification, NeurIPS 2021.

    Rao, Yongming and Zhao, Wenliang and Liu, Benlin and Lu, Jiwen and Zhou, Jie and Hsieh, Cho-Jui.

    [Paper] [Code]

    BibTex
    @inproceedings{Rao2021:DynamicViT,
      title={{DynamicViT}: Efficient Vision Transformers with Dynamic Token Sparsification},
      author={Yongming Rao and Wenliang Zhao and Benlin Liu and Jiwen Lu and Jie Zhou and Cho{-}Jui Hsieh},
      booktitle=NIPS,
      volume={34},
      pages={13937--13949},
      year={2021}
    }
    
  • [3] Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition, NeurIPS 2021.

    Wang, Yulin and Huang, Rui and Song, Shiji and Huang, Zeyi and Huang, Gao.

    [Paper] [Code]

    BibTex
    @inproceedings{wang2021dvt,
      title={Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition},
      author={Wang, Yulin and Huang, Rui and Song, Shiji and Huang, Zeyi and Huang, Gao},
      booktitle=NIPS,
      volume={34},
      pages={11960--11973},
      year={2021}
    }
    

2022

  • [1] Adaptive Token Sampling for Efficient Vision Transformers, ECCV 2022.

    Fayyaz, Mohsen and Koohpayegani, Soroush Abbasi and Jafari, Farnoush Rezaei and Sengupta, Sunando and Joze, Hamid Reza Vaezi and Sommerlade, Eric and Pirsiavash, Hamed and Gall, Jürgen.

    [Paper] [Code]

  • [2] SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning, ECCV 2022.

    Kong, Zhenglun and Dong, Peiyan and Ma, Xiaolong and Meng, Xin and Niu, Wei and Sun, Mengshu and Shen, Xuan and Yuan, Geng and Ren, Bin and Tang, Hao and others.

    [Paper] [Code]

    BibTex
    @inproceedings{Kong2022:SPViT,
      title={Spvit: Enabling faster vision transformers via latency-aware soft token pruning},
      author={Kong, Zhenglun and Dong, Peiyan and Ma, Xiaolong and Meng, Xin and Niu, Wei and Sun, Mengshu and Shen, Xuan and Yuan, Geng and Ren, Bin and Tang, Hao and others},
      booktitle={European conference on computer vision},
      pages={620--640},
      year={2022},
      organization={Springer}
    }
    
  • [3] SaiT: Sparse Vision Transformers through Adaptive Token Pruning, arXiv 2022.

    Li, Ling and Thorsley, David and Hassoun, Joseph.

    [Paper] [Code]

    BibTex
     @article{Li2022SaiT,
      title={Sait: Sparse vision transformers through adaptive token pruning},
      author={Li, Ling and Thorsley, David and Hassoun, Joseph},
      journal={arXiv preprint arXiv:2210.05832},
      year={2022}
    }
    
  • [4] Not all patches are what you need: Expediting vision transformers via token reorganizations, ICLR 2022.

    Liang, Youwei and Ge, Chongjian and Tong, Zhan and Song, Yibing and Wang, Jue and Xie, Pengtao.

    [Paper] [Code]

  • [5] Patch Slimming for Efficient Vision Transformers, CVPR 2022.

    Tang, Yehui and Han, Kai and Wang, Yunhe and Xu, Chang and Guo, Jianyuan and Xu, Chao and Tao, Dacheng.

    [Paper] [Code]

    BibTex
    @inproceedings{Tang2022:PatchSlim,
      title={Patch slimming for efficient vision transformers},
      author={Tang, Yehui and Han, Kai and Wang, Yunhe and Xu, Chang and Guo, Jianyuan and Xu, Chao and Tao, Dacheng},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={12165--12174},
      year={2022}
    }
    
  • [6] Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer, AAAI 2022.

    Xu, Yifan and Zhang, Zhijie and Zhang, Mengdan and Sheng, Kekai and Li, Ke and Dong, Weiming and Zhang, Liqing and Xu, Changsheng and Sun, Xing.

    [Paper] [Code]

    BibTex
    @inproceedings{Xu2022:Evo-ViT,
    title={Evo-vit: Slow-fast token evolution for dynamic vision transformer},
    author={Xu, Yifan and Zhang, Zhijie and Zhang, Mengdan and Sheng, Kekai and Li, Ke and Dong, Weiming and Zhang, Liqing and Xu, Changsheng and Sun, Xing},
    booktitle={Proceedings of the AAAI conference on artificial intelligence},
    volume={36},
    number={3},
    pages={2964--2972},
    year={2022}
    }
    
  • [7] A-ViT: Adaptive Tokens for Efficient Vision Transformer, CVPR 2022.

    Yin, Hongxu and Vahdat, Arash and Alvarez, Jose M and Mallya, Arun and Kautz, Jan and Molchanov, Pavlo.

    [Paper] [Code]

    BibTex
    @inproceedings{Yin2022:A-ViT,
      title={A-vit: Adaptive tokens for efficient vision transformer},
      author={Yin, Hongxu and Vahdat, Arash and Alvarez, Jose M and Mallya, Arun and Kautz, Jan and Molchanov, Pavlo},
      booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
      pages={10809--10818},
      year={2022}
    }
    
  • [8] Learning to Merge Tokens in Vision Transformers, arXiv 2022.

    Renggli, Cedric and Pinto, André Susano and Houlsby, Neil and Mustafa, Basil and Puigcerver, Joan and Riquelme, Carlos.

    [Paper] [Code]

  • [9] Self-slimmed Vision Transformer, ECCV 2022.

    Zong, Zhuofan and Li, Kunchang and Song, Guanglu and Wang, Yali and Qiao, Yu and Leng, Biao and Liu, Yu.

    [Paper] [Code]

    BibTex
    @inproceedings{Zong2022:SiT,
      title={Self-slimmed vision transformer},
      author={Zong, Zhuofan and Li, Kunchang and Song, Guanglu and Wang, Yali and Qiao, Yu and Leng, Biao and Liu, Yu},
      booktitle={European Conference on Computer Vision},
      pages={432--448},
      year={2022},
      organization={Springer}
    }
    

2023

  • [1] Token Pooling in Vision Transformers for Image Classification, WACV 2023.

    Marin, Dmitrii and Chang, Jen-Hao Rick and Ranjan, Anurag and Prabhu, Anish and Rastegari, Mohammad and Tuzel, Oncel.

    [Paper] [Code]

    BibTex
    @inproceedings{Marin2023:TokenPooling,
      title={Token pooling in vision transformers for image classification},
      author={Marin, Dmitrii and Chang, Jen-Hao Rick and Ranjan, Anurag and Prabhu, Anish and Rastegari, Mohammad and Tuzel, Oncel},
      booktitle={Proceedings of the IEEE/CVF winter conference on applications of computer vision},
      pages={12--21},
      year={2023}
    }
    
  • [2] Beyond attentive tokens: Incorporating token importance and diversity for efficient vision transformers, CVPR 2023.

    Long, Sifan and Zhao, Zhen and Pi, Jimin and Wang, Shengsheng and Wang, Jingdong.

    [Paper] [Code]

    BibTex
    @inproceedings{Long2023:BAT,
      title={Beyond attentive tokens: Incorporating token importance and diversity for efficient vision transformers},
      author={Long, Sifan and Zhao, Zhen and Pi, Jimin and Wang, Shengsheng and Wang, Jingdong},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={10334--10343},
      year={2023}
    }
    
  • [3] Which Tokens to Use? Investigating Token Reduction in Vision Transformers, ICCVw 2023.

    Haurum, Joakim Bruslund and Escalera, Sergio and Taylor, Graham W and Moeslund, Thomas B.

    [Paper] [Code]

  • [4] Efficient Vision Transformer via Token Merger, TIP 2023.

    Feng, Zhanzhou and Zhang, Shiliang.

    [Paper] [Code]

    BibTex
    @article{Feng2023TokenMerger,
      title={Efficient vision transformer via token merger},
      author={Feng, Zhanzhou and Zhang, Shiliang},
      journal=TIP,
      year={2023},
      publisher={IEEE}
    }
    
  • [5] Token Merging: Your ViT But Faster, ICLR 2023.

    Bolya, Daniel and Fu, Cheng-Yang and Dai, Xiaoliang and Zhang, Peizhao and Feichtenhofer, Christoph and Hoffman, Judy.

    [Paper] [Code]

    BibTex
    @inproceedings{Bolya2023:ToMe,
      title={Token Merging: Your {ViT} But Faster},
      author={Daniel Bolya and Cheng{-}Yang Fu and Xiaoliang Dai and Peizhao Zhang and Christoph Feichtenhofer and Judy Hoffman},
      booktitle=ICLR,
      year= {2023}
    }
    

2024

  • [1] Efficient Transformer Adaptation with Soft Token Merging, CVPRw 2024.

    Yuan, Xin and Fei, Hongliang and Baek, Jinoo.

    [Paper] [Code]

    BibTex
    @inproceedings{Yuan2024:SoftToMe,
      title={Efficient transformer adaptation with soft token merging},
      author={Yuan, Xin and Fei, Hongliang and Baek, Jinoo},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={3658--3668},
      year={2024}
    }
    
  • [2] Agglomerative Token Clustering, ECCV 2024.

    Haurum, Joakim Bruslund and Escalera, Sergio and Taylor, Graham W and Moeslund, Thomas B.

    [Paper] [Code]

    BibTex
    @inproceedings{haurum2024agglomerative,
      title={Agglomerative Token Clustering},
      author={Haurum, Joakim Bruslund and Escalera, Sergio and Taylor, Graham W and Moeslund, Thomas B},
      booktitle={European Conference on Computer Vision},
      pages={200--218},
      year={2024},
      organization={Springer}
    }
    
  • [3] Token Pruning using a Lightweight Background Aware Vision Transformer, NeurIPSw 2024.

    Sah, Sudhakar and Kumar, Ravish and Rohmetra, Honnesh and Saboori, Ehsan.

    [Paper] [Code]

  • [4] GTP-ViT: Efficient Vision Transformers via Graph-Based Token Propagation, WACV 2024.

    Xu, Xuwei and Wang, Sen and Chen, Yudong and Zheng, Yanping and Wei, Zhewei and Liu, Jiajun.

    [Paper] [Code]

    BibTex
    @inproceedings{Xu2024:GTP-ViT,
      title={GTP-ViT: efficient Vision transformers via graph-based token propagation},
      author={Xu, Xuwei and Wang, Sen and Chen, Yudong and Zheng, Yanping and Wei, Zhewei and Liu, Jiajun},
      booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
      pages={86--95},
      year={2024}
    }
    
  • [5] PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference, arXiv 2024.

    Li, Ye and Tang, Chen and Meng, Yuan and Fan, Jiajun and Chai, Zenghao and Ma, Xinzhu and Wang, Zhi and Zhu, Wenwu.

    [Paper] [Code]

  • [6] TPC-ViT: Token Propagation Controller for Efficient Vision Transformer, WACV 2024.

    Zhu, Wentao.

    [Paper] [Code]

  • [7] Efficient Visual Transformer by Learnable Token Merging, arXiv 2024.

    Wang, Yancheng and Yang, Yingzhen.

    [Paper] [Code]

  • [8] HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers, HPCA 2023.

    Dong, Peiyan and Sun, Mengshu and Lu, Alec and Xie, Yanyue and Liu, Kenneth and Kong, Zhenglun and Meng, Xin and Li, Zhengang and Lin, Xue and Fang, Zhenman and others.

    [Paper] [Code]

  • [9] Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers, arXiv 2024.

    Lee, Dong Hoon and Hong, Seunghoon.

    [Paper] [Code]

  • [10] Energy Minimizing-based Token Merging for Accelerating Transformers, ICLRw 2024.

    Tran, Hoai-Chau and Nguyen, Duy Minh Ho and Nguyen, Manh-Duy and Le, Ngan Hoang and Nguyen, Binh T.

    [Paper] [Code]

  • [11] Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers, CVPR 2024.

    Wang, Hongjie and Dedhia, Bhishma and Jha, Niraj K.

    [Paper] [Code]

    BibTex
    @inproceedings{wang2024zero,
      title={Zero-TPrune: Zero-shot token pruning through leveraging of the attention graph in pre-trained transformers},
      author={Wang, Hongjie and Dedhia, Bhishma and Jha, Niraj K},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={16070--16079},
      year={2024}
    }
    
  • [12] PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference, ECCV 2024.

    Tanvir Mahmud, Burhaneddin Yaman, Chun-Hao Liu, Diana Marculescu.

    [Paper] [Code]

    BibTex
    @inproceedings{Mahmud2024:PaPr,
      title={Papr: Training-free one-step patch pruning with lightweight convnets for faster inference},
      author={Mahmud, Tanvir and Yaman, Burhaneddin and Liu, Chun-Hao and Marculescu, Diana},
      booktitle={European Conference on Computer Vision},
      pages={110--128},
      year={2024},
      organization={Springer}
    }
    
  • [13] Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer, arXiv 2024.

    Shuai Peng, Di Fu, Baole Wei, Yong Cao, Liangcai Gao, Zhi Tang.

    [Paper] [Code]

  • [14] Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation, NeurIPS 2024.

    Zhao, Wangbo and Tang, Jiasheng and Han, Yizeng and Song, Yibing and Wang, Kai and Huang, Gao and Wang, Fan and You, Yang.

    [Paper] [Code]

    BibTex
    @inproceedings{Zhao2024:DyT,
      title={Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation},
      author={Zhao, Wangbo and Tang, Jiasheng and Han, Yizeng and Song, Yibing and Wang, Kai and Huang, Gao and Wang, Fan and You, Yang},
      booktitle=NIPS,
      year={2024}
    }
    

2025

  • [1] TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning, ICCV 2025.

    Luo, Siqi and Yang, Haoran and Xin, Yi and Yi, Mingyang and Wu, Guangyang and Zhai, Guangtao and Liu, Xiaohon.

    [Paper] [Code]

Video Recognition:

  • [1] TokenLearner: Adaptive Space-Time Tokenization for Videos, NeurIPS 2021.

    Michael Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova.

    [Paper] [Code]

    BibTex
    @inproceedings{Ryoo2021:TokenLearner,
      title={Tokenlearner: Adaptive space-time tokenization for videos},
      author={Ryoo, Michael and Piergiovanni, AJ and Arnab, Anurag and Dehghani, Mostafa and Angelova, Anelia},
      booktitle=NIPS,
      volume={34},
      pages={12786--12797},
      year={2021}
    }
    
  • [2] Efficient Video Transformers with Spatial-Temporal Token Selection, ECCV 2022.

    Junke Wang and Xitong Yang and Hengduo Li and Li Liu and Zuxuan Wu and Yu-Gang Jiang.

    [Paper] [Code]

    BibTex
    @inproceedings{Wang2022:STTS,
      title={Efficient video transformers with spatial-temporal token selection},
      author={Wang, Junke and Yang, Xitong and Li, Hengduo and Liu, Li and Wu, Zuxuan and Jiang, Yu-Gang},
      booktitle={European Conference on Computer Vision},
      pages={69--86},
      year={2022},
      organization={Springer}
    }
    
  • [3] Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation, ICCV 2023.

    Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian.

    [Paper] [Code]

    BibTex
    @inproceedings{ding2023:PSTA,
      title={Prune spatio-temporal tokens by semantic-aware temporal accumulation},
      author={Ding, Shuangrui and Zhao, Peisen and Zhang, Xiaopeng and Qian, Rui and Xiong, Hongkai and Tian, Qi},
      booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
      pages={16945--16956},
      year={2023}
    }
    
  • [4] vid-TLDR: Training Free Token merging for Light-weight Video Transformer, CVPR 2024.

    Joonmyung Choi and Sanghyeok Lee and Jaewon Chu and Minhyuk Choi and Hyunwoo J. Kim.

    [Paper] [Code]

  • [5] Efficient Video Transformers via Spatial-temporal Token Merging for Action Recognition, ACM MM 2024.

    Zhanzhou Feng, Jiaming Xu, Lei Ma, Shiliang Zhang.

    [Paper] [Code]

    BibTex
    @article{Feng2024:STTM,
      title={Efficient video transformers via spatial-temporal token merging for action recognition},
      author={Feng, Zhanzhou and Xu, Jiaming and Ma, Lei and Zhang, Shiliang},
      journal={ACM Transactions on Multimedia Computing, Communications and Applications},
      volume={20},
      number={4},
      pages={1--21},
      year={2024},
      publisher={ACM New York, NY}
    }
    
  • [6] Don't Look Twice: Faster Video Transformers with Run-Length Tokenization, NeurIPS 2024.

    Choudhury, Rohan and Zhu, Guanglei and Liu, Sihan and Niinuma, Koichiro and Kitani, Kris M and Jeni, Laszlo Attila.

    [Paper] [Code]

    BibTex
    @article{Choudhury2024:RLT,
      title={Don't Look Twice: Faster Video Transformers with Run-Length Tokenization},
      author={Choudhury, Rohan and Zhu, Guanglei and Liu, Sihan and Niinuma, Koichiro and Kitani, Kris and Jeni, L{\'a}szl{\'o}},
      journal={Advances in Neural Information Processing Systems},
      volume={37},
      pages={28127--28149},
      year={2024}
    }
    
  • [7] TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval, arXiv 2024.

    Shen, Leqi and Hao, Tianxiang and Zhao, Sicheng and Zhang, Yifeng and Liu, Pengzhang and Bao, Yongjun and Ding, Guiguang.

    [Paper] [Code]

  • [8] Efficient Video Action Detection with Token Dropout and Context Refinement, ICCV 2023.

    Chen, Lei and Tong, Zhan and Song, Yibing and Wu, Gangshan and Wang, Limin.

    [Paper] [Code]

    BibTex
    @inproceedings{Chen2023:EVAD,
      title={Efficient video action detection with token dropout and context refinement},
      author={Chen, Lei and Tong, Zhan and Song, Yibing and Wu, Gangshan and Wang, Limin},
      booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
      pages={10388--10399},
      year={2023}
    }
    

Dense Prediction:

2022

  • [1] GroupViT: Semantic Segmentation Emerges From Text Supervision, CVPR 2022.

    Xu, Jiarui and De Mello, Shalini and Liu, Sifei and Byeon, Wonmin and Breuel, Thomas and Kautz, Jan and Wang, Xiaolong.

    [Paper] [Code]

  • [2] Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning, NeurIPS 2022.

    Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han.

    [Paper] [Code]

    BibTex
    @article{liang2022expediting,
      title={Expediting large-scale vision transformer for dense prediction without fine-tuning},
      author={Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
      journal={Advances in Neural Information Processing Systems},
      volume={35},
      pages={35462--35477},
      year={2022}
    }
    
    @article{yuan2023expediting,
      title={Expediting large-scale vision transformer for dense prediction without fine-tuning},
      author={Yuan, Yuhui and Liang, Weicong and Ding, Henghui and Liang, Zhanhao and Zhang, Chao and Hu, Han},
      journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
      volume={46},
      number={1},
      pages={250--266},
      year={2023},
      publisher={IEEE}
    }
    

2023

  • [1] Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation, ICCV 2023.

    Tang, Quan and Zhang, Bowen and Liu, Jiajun and Liu, Fagui and Liu, Yifan.

    [Paper] [Code]

    BibTex
    @inproceedings{Tang2023:DToP,
      title={Dynamic token pruning in plain vision transformers for semantic segmentation},
      author={Tang, Quan and Zhang, Bowen and Liu, Jiajun and Liu, Fagui and Liu, Yifan},
      booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
      pages={777--786},
      year={2023}
    
  • [2] Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers, CVPR 2023.

    Lu, Chenyang and de Geus, Daan and Dubbelman, Gijs.

    [Paper] [Code]

    BibTex
    @inproceedings{Lu2023:CTS,
      title={Content-aware token sharing for efficient semantic segmentation with vision transformers},
      author={Lu, Chenyang and de Geus, Daan and Dubbelman, Gijs},
      booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
      pages={23631--23640},
      year={2023}
    }
    
  • [3] Efficient Video Action Detection with Token Dropout and Context Refinement, ICCV 2023.

    Chen, Lei and Tong, Zhan and Song, Yibing and Wu, Gangshan and Wang, Limin.

    [Paper] [Code]

    BibTex
    @inproceedings{chen2023:EVAD,
      title={Efficient video action detection with token dropout and context refinement},
      author={Chen, Lei and Tong, Zhan and Song, Yibing and Wu, Gangshan and Wang, Limin},
      booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
      pages={10388--10399},
      year={2023}
    }
    
  • [4] Sparsevit: Revisiting activation sparsity for efficient high-resolution vision transformer, CVPR 2023.

    Chen, Xuanyao and Liu, Zhijian and Tang, Haotian and Yi, Li and Zhao, Hang and Han, Song.

    [Paper] [Code]

    BibTex
    @inproceedings{Chen202:SparseViT,
      title={Sparsevit: Revisiting activation sparsity for efficient high-resolution vision transformer},
      author={Chen, Xuanyao and Liu, Zhijian and Tang, Haotian and Yi, Li and Zhao, Hang and Han, Song},
      booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
      pages={2061--2070},
      year={2023}
    }
    

2024

  • [1] Revisiting token pruning for object detection and instance segmentation, WACV 2024.

    Liu, Yifei and Gehrig, Mathias and Messikommer, Nico and Cannici, Marco and Scaramuzza, Davide.

    [Paper] [Code]

    BibTex
    @inproceedings{liu2024revisiting,
      title={Revisiting token pruning for object detection and instance segmentation},
      author={Liu, Yifei and Gehrig, Mathias and Messikommer, Nico and Cannici, Marco and Scaramuzza, Davide},
      booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
      pages={2658--2668},
      year={2024}
    }
    
  • [2] Dynamic Token-Pass Transformers for Semantic Segmentation, WACV 2024.

    Liu, Yuang and Zhou, Qiang and Wang, Jin and Wang, Zhibin and Wang, Fan and Wang, Jun and Zhang, Wei.

    [Paper] [Code]

    BibTex
    @inproceedings{Liu2024:DoViT,
      title={Dynamic token-pass transformers for semantic segmentation},
      author={Liu, Yuang and Zhou, Qiang and Wang, Jing and Wang, Zhibin and Wang, Fan and Wang, Jun and Zhang, Wei},
      booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
      pages={1827--1836},
      year={2024}
    }
    
  • [3] DTMFormer: Dynamic Token Merging for Boosting Transformer-Based Medical Image Segmentation, AAAI 2024.

    Wang, Zhehao and Lin, Xian and Wu, Nannan and Yu, Li and Cheng, Kwang-Ting and Yan, Zengqiang.

    [Paper] [Code]

    BibTex
    @inproceedings{Wang2024:DTMFormer,
      title={DTMFormer: Dynamic Token Merging for Boosting Transformer-Based Medical Image Segmentation},
      author={Wang, Zhehao and Lin, Xian and Wu, Nannan and Yu, Li and Cheng, Kwang-Ting and Yan, Zengqiang},
      booktitle={Proceedings of the AAAI conference on artificial intelligence},
      volume={38},
      number={6},
      pages={5814--5822},
      year={2024}
    }
    
  • [4] Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation, MIPR 2024.

    Daniel Kienzle and Marco Kantonis and Robin Schön and Rainer Lienhart.

    [Paper] [Code]

    BibTex
    @inproceedings{Kienzle2024:Segformer++,
      title={Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation},
      author={Kienzle, Daniel and Kantonis, Marco and Sch{\"o}n, Robin and Lienhart, Rainer},
      booktitle={2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR)},
      pages={75--81},
      year={2024},
      organization={IEEE}
    }
    

2025

  • [1] Less is More: Token Context-aware Learning for Object Tracking, AAAI 2025.

    Chenlong Xu and Bineng Zhong and Qihua Liang and Yaozong Zheng and Guorong Li and Shuxiang Song.

    [Paper] [Code]