RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference
May 6, 2026 · View on GitHub
🔥News
- [2026.05.01] 🎉 Our training-free inference acceleration method RTPrune has been accepted at ICML 2026.
✨Highlights
- Our RTPrune consistently outperforms prior token pruning methods on DeepSeek-OCR, retaining over 97.88% of accuracy with 84% of visual tokens on olmOCR-Bench.
- Our RTPrune reduces GFLOPs by nearly 15.29% and prefill time by nearly 18.90% on OmniDocBench when maintaining 99.47% accuracy.
🌈Method
- We introduce RTPrune, a plug-and-play visual token pruning method in DeepSeek-OCR which mimics the reading twice behavior of the LLM via a two-stage pipeline: retaining high-norm tokens and then merging the remaining ones via optimal transport.
- We propose a dynamic pruning strategy to enable a better efficiency–accuracy trade-off, which combines the post-encoding inter-token similarity and the original textual density of the image.
📦Installation
-
Install the DeepSeek-OCR environment.
-
Download the ckpt files from huggingface and put them in ./DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-ckpt.
-
Replace the corresponding files or add new files with our code and the added part can be searched by "[modified]".
🚀Quick Start
Run the following command:
cd DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-hf
python run_dpsk_ocr.py
📊Evaluation
-
The evaluation code follows the pipeline of OmniDocBench, olmOCR-Bench and Ocean-OCR Benchmark.
-
The evaluation for prefilling time and decoding time is provided in our code.
-
We also provide the implementations of VisionZip, DivPrune and CDPruner on DeepSeek-OCR.
👏Acknowledgement
-
This work is built upon DeepSeek-OCR. We thank them for their excellent open-source contributions.
-
We also thank VisionZip, DivPrune, CDPruner, and others for their contributions, which have provided valuable insights.