RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

May 6, 2026 · View on GitHub

🔥News

  • [2026.05.01] 🎉 Our training-free inference acceleration method RTPrune has been accepted at ICML 2026.

✨Highlights

  1. Our RTPrune consistently outperforms prior token pruning methods on DeepSeek-OCR, retaining over 97.88% of accuracy with 84% of visual tokens on olmOCR-Bench.
  2. Our RTPrune reduces GFLOPs by nearly 15.29% and prefill time by nearly 18.90% on OmniDocBench when maintaining 99.47% accuracy.

🌈Method

Stanford-Alpaca

  1. We introduce RTPrune, a plug-and-play visual token pruning method in DeepSeek-OCR which mimics the reading twice behavior of the LLM via a two-stage pipeline: retaining high-norm tokens and then merging the remaining ones via optimal transport.
  2. We propose a dynamic pruning strategy to enable a better efficiency–accuracy trade-off, which combines the post-encoding inter-token similarity and the original textual density of the image.

📦Installation

  1. Install the DeepSeek-OCR environment.

  2. Download the ckpt files from huggingface and put them in ./DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-ckpt.

  3. Replace the corresponding files or add new files with our code and the added part can be searched by "[modified]".

🚀Quick Start

Run the following command:

cd DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-hf
python run_dpsk_ocr.py

📊Evaluation

👏Acknowledgement

  • This work is built upon DeepSeek-OCR. We thank them for their excellent open-source contributions.

  • We also thank VisionZip, DivPrune, CDPruner, and others for their contributions, which have provided valuable insights.