Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
June 2, 2025 · View on GitHub
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi
3DV, 2025. arXiv 2406.04343
News
-
19.07.2024: Training code and data release
Setup
Create a python environment
Flash3D has been trained and tested with the followings software versions:
- Python 3.10
- Pytorch 2.2.2
- CUDA 11.8
- GCC 11.2 (or more recent)
Begin by installing CUDA 11.8 and adding the path containing the nvcc compiler to the PATH environmental variable.
Then the python environment can be created either via conda:
conda create -y python=3.10 -n flash3d
conda activate flash3d
or using Python's venv module (assuming you already have access to Python 3.10 on your system):
python3.10 -m venv .venv
. .venv/bin/activate
Finally, install the required packages as follows:
pip install -r requirements-torch.txt --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Download training data
RealEstate10K dataset
For downloading the RealEstate10K dataset we base our instructions on the Behind The Scenes scripts.
First you need to download the video sequence metadata including camera poses from https://google.github.io/realestate10k/download.html and unpack it into data/ such that the folder layout is as follows:
data/RealEstate10K/train
data/RealEstate10K/test
Finally download the training and test sets of the dataset with the following commands:
python datasets/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m train
python datasets/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m test
This step will take several days to complete. Finally, download additional data for the RealEstate10K dataset. In particular, we provide pre-processed COLMAP cache containing sparse point clouds which are used to estimate the scaling factor for depth predictions. The last two commands filter the training and testing set from any missing video sequences.
sh datasets/dowload_realestate10k_colmap.sh
python -m datasets.preprocess_realestate10k -d data/RealEstate10K -s train
python -m datasets.preprocess_realestate10k -d data/RealEstate10K -s test
Download and evaluate the pretrained model
We provide model weights that could be downloaded and evaluated on RealEstate10K test set:
python -m misc.download_pretrained_models -o exp/re10k_v2
sh evaluate.sh exp/re10k_v2
Training
In order to train the model on RealEstate10K dataset execute this command:
python train.py \
+experiment=layered_re10k \
model.depth.version=v1 \
train.logging=false
For multiple GPU, we can run with this command:
sh train.sh
You can modify the cluster information in configs/hydra/cluster.
BibTeX
@article{szymanowicz2024flash3d,
author = {Szymanowicz, Stanislaw and Insafutdinov, Eldar and Zheng, Chuanxia and Campbell, Dylan and Henriques, Joao and Rupprecht, Christian and Vedaldi, Andrea},
title = {Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image},
journal = {arxiv},
year = {2024},
}