Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
September 26, 2025 · View on GitHub
Paper | Project Page
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
Yuqi Wu*, Wenzhao Zheng*, Jie Zhou, Jiwen Lu
* Equal contribution. Project leader.
Point3R is an online framework for dense streaming 3D reconstruction using explicit spatial memory, which achieves competitive performance with low training costs.
News
- [2025/7/3] Training/finetuning/evaluation code release.
Overview
Given streaming image inputs, our method maintains an explicit spatial pointer memory in which each pointer is assigned a 3D position and points to a changing spatial feature. We conduct a pointer-image interaction to integrate new observations into the global coordinate system and update our spatial pointer memory accordingly. Our method achieves competitive or state-of-the-art performance across various tasks: dense 3D reconstruction, monocular and video depth estimation, and camera pose estimation.
Getting Started
Installation
Our code is based on the following environment.
1. Clone
git clone https://github.com/YkiWu/Point3R.git
cd Point3R
2. Create conda environment
conda create -n point3r python=3.11 cmake=3.14.0
conda activate point3r
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
conda install 'llvm-openmp<16'
Data Preparation
Please follow CUT3R to prepare the training datasets. The official links of all used datasets are listed below.
- ARKitScenes
- BlendedMVS
- CO3Dv2
- Hypersim
- MegaDepth
- MVS-Synth
- OmniObject3D
- PointOdyssey
- ScanNet++
- ScanNet
- Spring
- Virtual KITTI 2
- WayMo Open dataset
- WildRGB-D
Training from Scratch
We provide the following commands for training from scratch.
Please download DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth and place it on your own path.
cd src/
# stage 1, 224 version + 5-frame sequences
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py --config-name 224_stage1
# stage 2, 512 version + 5-frame sequences
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py --config-name 512_stage2
# stage 3, freeze the encoder and fine-tune other parts on 8-frame sequences
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py --config-name long_stage3
Fine-tuning
If you want to fine-tune our checkpoint, you can use the following command.
1. Download Checkpoints
Click HERE to download our checkpoint and place it on your own path.
2. Start Finetuning
You can modify the configuration file according to your own needs.
cd src/
# finetune
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py --config-name finetune
Evaluation
Data Preparation
Please follow MonST3R and Spann3R to prepare the evaluation datasets.
Scripts
Our evaluation code follows MonST3R and CUT3R.
3D Reconstruction
bash eval/mv_recon/run.sh
Results will be saved in eval_results/mv_recon/${model_name}_${ckpt_name}/logs_all.txt.
Monodepth
bash eval/monodepth/run.sh
Results will be saved in eval_results/monodepth/${data}_${model_name}/metric.json.
Video Depth
bash eval/video_depth/run.sh
Results will be saved in eval_results/video_depth/${data}_${model_name}/result_scale.json.
Camera Pose Estimation
bash eval/relpose/run.sh
Results will be saved in eval_results/relpose/${data}_${model_name}/_error_log.txt.
Acknowledgements
Our code is based on the following awesome repositories:
Many thanks to these authors!
Citation
If you find this project helpful, please consider citing the following paper:
@article{point3r,
title={Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory},
author={Yuqi Wu and Wenzhao Zheng and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2507.02863},
year={2025}
}