OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer

March 9, 2026 · View on GitHub

Si-Yu Lu¹, Po-Ting Chen², Hui-Che Hsu², Sin-Ye Jhong², Wen-Huang Cheng¹, Yung-Yao Chen²

¹National Taiwan University ²National Taiwan University of Science and Technology

TL;DR: OVGGT is a training-free framework enabling streaming 3D reconstruction from arbitrarily long video with constant memory and compute — achieving O(1) per-frame cost while surpassing full-cache baselines in accuracy.

Left: Quantitative comparison on 7-Scenes across 200 frames. Right: Qualitative 3D reconstructions demonstrating OVGGT's stability over long sequences (50–500 frames).

News

Overview

OVGGT is a training-free framework that enables streaming 3D reconstruction from arbitrarily long video with constant memory and compute. It combines Self-Selective Caching (SSC) for zero-overhead KV cache compression via FFN residual magnitudes, and Dynamic Anchor Protection (DAP) to shield geometrically critical tokens from eviction, suppressing coordinate drift over long sequences. OVGGT is fully compatible with FlashAttention and processes videos within a fixed VRAM envelope while surpassing full-cache baselines in accuracy.

⚙️ Installation

Clone OVGGT

git clone https://github.com/<your-username>/OVGGT.git
cd OVGGT

Create conda environment

conda create -n OVGGT python=3.11 cmake=3.14.0
conda activate OVGGT

Install requirements

pip install -r requirements.txt
conda install 'llvm-openmp<16'

Download Checkpoints

Please download checkpoint of StreamVGGT from Hugging Face or Tsinghua cloud.

Evaluation

The evaluation code follows MonST3R, CUT3R, TTT3R, StreamVGGT and InfiniteVGGT.

cd src/

Multi-view Reconstruction

bash eval/mv_recon/run.sh

Results will be saved in eval_results/mv_recon/${model_name}_${ckpt_name}/logs_all.txt.

Video Depth

bash eval/video_depth/run.sh

Results will be saved in eval_results/video_depth/${data}_${model_name}/result_scale.json.

Pose Evaluation

bash eval/pose_evaluation/run.sh

Results will be saved in eval_results/pose_evaluation/{data}_${model_name}/_error_log.txt.

🚀 Quick Start

Viser Demo (Interactive 3D Visualization)

We provide a demo for OVGGT, based on the demo code from InfiniteVGGT. You can follow the instructions below to launch it.

python demo_viser.py  \
    --seq_path path/to/nrgbd/image_sequence \
    --frame_interval 10 \
    --gt_path path/to/nrgbd/gt_camera (Optional)

Gradio Demo (Web UI)

We provide a demo for OVGGT, based on the demo code from VGGT. You can follow the instructions below to launch it.

pip install -r requirements_demo.txt
python demo_gradio.py

🙏 Acknowledgements

Our code is based on the following brilliant repositories:

DUSt3R MonST3R Spann3R CUT3R VGGT Point3R StreamVGGT TTT3R Evict3R InfiniteVGGT

Many thanks to these authors!

📝 Citation

@article{lu2026ovggt,
  title={OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer},
  author={Si-Yu Lu and Po-Ting Chen and Hui-Che Hsu and Sin-Ye Jhong and Wen-Huang Cheng and Yung-Yao Chen},
  journal={arXiv preprint arXiv:2603.05959},
  year={2026}
}