ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models
March 12, 2026 · View on GitHub
ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models
Yingxin Lai1, Zitong Yu1⋆, Jun Wang1⋆, Linlin Shen2, Yong Xu3, and Xiaochun Cao4
1 Great Bay University
2 Shenzhen University
3 Harbin Institute of Technology
4 School of Cyber Science and Technology, Sun Yat-sen University
🔍 Overview
Multimodal Large Language Models (MLLMs) enable interpretable multimedia forensics by generating textual rationales for forgery detection. However, processing dense visual sequences incurs high computational cost, especially for high-resolution images and videos. Existing visual token pruning methods are mostly semantic-driven: they preserve salient objects while often discarding background regions where manipulation traces such as high-frequency anomalies and temporal jitters reside.
To address this issue, we introduce ForensicZip, a training-free framework that reformulates token compression from a forgery-driven perspective. ForensicZip models temporal token evolution as a Birth-Death Optimal Transport problem with a slack dummy node, quantifying physical discontinuities associated with transient generative artifacts. The final forensic score further integrates transport-based novelty with high-frequency priors, allowing forensic evidence to be preserved under large-ratio compression.
On deepfake and AIGC benchmarks, ForensicZip delivers strong detection performance at aggressive compression ratios, achieving 2.97× speedup and over 90% FLOPs reduction at 10% token retention while maintaining state-of-the-art accuracy.
Figure 1. Overview of the ForensicZip framework. The method preserves forgery-relevant evidence under aggressive token compression by combining transport-based novelty with forensic priors.
🧱 Repository Structure
forensiczip/— method implementation and helper utilitiesfakevlm/— FakeVLM-compatible skeleton modulesscripts/— evaluation entrypointsdocs/— running and data preparation notesimgs/— method figures
🛠️ Installation
conda create -n forensiczip python=3.10 -y
conda activate forensiczip
pip install -r requirements.txt
If you already have a compatible environment, you can reuse it directly.
🚀 Running
1. FakeClue Evaluation
MODEL_PATH_7B=<MODEL_PATH> \
FAKECLUE_TEST_JSON=<FAKECLUE_JSON> \
FAKECLUE_DATA_BASE=<FAKECLUE_MEDIA_DIR> \
CUDA_DEVICES=0 \
PYTHON_BIN=python \
bash scripts/eval_forensiczip_fakeclue.sh
2. LOKI Evaluation
MODEL_PATH_7B=<MODEL_PATH> \
LOKI_JSON_DIR=<LOKI_JSON_DIR> \
LOKI_MEDIA_ROOT=<LOKI_MEDIA_ROOT> \
CUDA_DEVICES=0 \
PYTHON_BIN=python \
bash scripts/eval_forensiczip_loki.sh
3. Common Options
RETENTION_RATIOS_STRVAL_BATCH_SIZEWORKERSMAX_LENGTHMAX_NEW_TOKENSFORENSICZIP_SELECT_LAYERFORENSICZIP_BIRTH_COSTFORENSICZIP_DEATH_COSTFORENSICZIP_SINKHORN_EPSFORENSICZIP_SINKHORN_ITERSFORENSICZIP_EMA_BETAFORENSICZIP_BIRTH_WEIGHTFORENSICZIP_POS_LAMBDAFORENSICZIP_FORENSIC_ETA
Detailed usage notes are available in docs/running.md.
📦 External Resources
These resources are used by this repository but are not introduced by this work.
See docs/data_preparation.md for the expected local file layout.
🙏 Acknowledgement
This codebase is built on top of FakeVLM. We thank the FakeVLM project for providing the base model and evaluation structure used in this release.
📝 Citation
If you find this repository useful, please consider citing:
@article{lai2026forensiczip,
title={ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models},
author={Lai, Yingxin and Yu, Zitong and Wang, Jun and Shen, Linlin and Xu, Yong and Cao, Xiaochun},
journal={arXiv preprint},
year={2026}
}
📬 Contact
For questions about this repository, please contact: yingxinlai2@gmail.com