Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

September 26, 2025 · View on GitHub

Paper | Project Page

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Yuqi Wu^*, Wenzhao Zheng^* $\dagger$ , Jie Zhou, Jiwen Lu

^* Equal contribution. $\dagger$ Project leader.

Point3R is an online framework for dense streaming 3D reconstruction using explicit spatial memory, which achieves competitive performance with low training costs.

News

[2025/7/3] Training/finetuning/evaluation code release.

Given streaming image inputs, our method maintains an explicit spatial pointer memory in which each pointer is assigned a 3D position and points to a changing spatial feature. We conduct a pointer-image interaction to integrate new observations into the global coordinate system and update our spatial pointer memory accordingly. Our method achieves competitive or state-of-the-art performance across various tasks: dense 3D reconstruction, monocular and video depth estimation, and camera pose estimation.

git clone https://github.com/YkiWu/Point3R.git
cd Point3R

2. Create conda environment

conda create -n point3r python=3.11 cmake=3.14.0
conda activate point3r
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia 
pip install -r requirements.txt
conda install 'llvm-openmp<16'

Data Preparation

Please follow CUT3R to prepare the training datasets. The official links of all used datasets are listed below.

Training from Scratch

We provide the following commands for training from scratch.

Please download DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth and place it on your own path.

cd src/

# stage 1, 224 version + 5-frame sequences
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py  --config-name 224_stage1

# stage 2, 512 version + 5-frame sequences
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py  --config-name 512_stage2

# stage 3, freeze the encoder and fine-tune other parts on 8-frame sequences
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py  --config-name long_stage3

cd src/

# finetune 
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py  --config-name finetune

bash eval/mv_recon/run.sh

Results will be saved in eval_results/mv_recon/${model_name}_${ckpt_name}/logs_all.txt.

Monodepth

bash eval/monodepth/run.sh

Results will be saved in eval_results/monodepth/${data}_${model_name}/metric.json.

Video Depth

bash eval/video_depth/run.sh

Results will be saved in eval_results/video_depth/${data}_${model_name}/result_scale.json.

Camera Pose Estimation

bash eval/relpose/run.sh

Results will be saved in eval_results/relpose/${data}_${model_name}/_error_log.txt.

Acknowledgements

Our code is based on the following awesome repositories:

Many thanks to these authors!

Citation

If you find this project helpful, please consider citing the following paper:

@article{point3r,
      title={Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory}, 
      author={Yuqi Wu and Wenzhao Zheng and Jie Zhou and Jiwen Lu},
      journal={arXiv preprint arXiv:2507.02863},
      year={2025}
}

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Paper | Project Page

News

Overview

Getting Started

Installation

1. Clone

2. Create conda environment

Data Preparation

Training from Scratch

Fine-tuning

1. Download Checkpoints

2. Start Finetuning

Evaluation

Data Preparation

Scripts

3D Reconstruction

Monodepth

Video Depth

Camera Pose Estimation

Acknowledgements

Citation