RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

May 6, 2026 · View on GitHub

🔥News

[2026.05.01] 🎉 Our training-free inference acceleration method RTPrune has been accepted at ICML 2026.

✨Highlights

Our RTPrune consistently outperforms prior token pruning methods on DeepSeek-OCR, retaining over 97.88% of accuracy with 84% of visual tokens on olmOCR-Bench.
Our RTPrune reduces GFLOPs by nearly 15.29% and prefill time by nearly 18.90% on OmniDocBench when maintaining 99.47% accuracy.

🌈Method

Stanford-Alpaca

We introduce RTPrune, a plug-and-play visual token pruning method in DeepSeek-OCR which mimics the reading twice behavior of the LLM via a two-stage pipeline: retaining high-norm tokens and then merging the remaining ones via optimal transport.
We propose a dynamic pruning strategy to enable a better efficiency–accuracy trade-off, which combines the post-encoding inter-token similarity and the original textual density of the image.

📦Installation

Install the DeepSeek-OCR environment.
Download the ckpt files from huggingface and put them in ./DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-ckpt.
Replace the corresponding files or add new files with our code and the added part can be searched by "[modified]".

🚀Quick Start

Run the following command:

cd DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-hf
python run_dpsk_ocr.py

📊Evaluation

The evaluation code follows the pipeline of OmniDocBench, olmOCR-Bench and Ocean-OCR Benchmark.
The evaluation for prefilling time and decoding time is provided in our code.
We also provide the implementations of VisionZip, DivPrune and CDPruner on DeepSeek-OCR.

👏Acknowledgement

This work is built upon DeepSeek-OCR. We thank them for their excellent open-source contributions.
We also thank VisionZip, DivPrune, CDPruner, and others for their contributions, which have provided valuable insights.