Sat3DGen
May 18, 2026 ยท View on GitHub
[ICLR 2026] Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Ming Qian, Zimin Xia, Changkun Liu, Shuailei Ma, Wen Wang, Zeran Ke, Bin Tan, Hang Zhang, Gui-Song Xia
https://github.com/user-attachments/assets/4efaf089-9cdc-4663-ab2a-1ac128d44454
โญ If you find this work interesting or useful, please give us a star! It helps others discover the project and motivates us to keep improving it.
๐ข News
- [May 15, 2025] ๐ ArXiv paper is now publicly available! ArXiv.
- [Apr 27, 2025] ๐ Code, data, and model weights are now publicly available! Online demo is live on HuggingFace Spaces.
- [Jan 29, 2025] Repository initialized.
Abstract
Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current methods present a stark trade-off: geometry-colorization models achieve high geometric fidelity but are typically building-focused and lack semantic diversity. In contrast, proxy-based models use feed-forward image-to-3D frameworks to generate holistic scenes by jointly learning geometry and texture, a process that yields rich content but coarse and unstable geometry.
We introduce Sat3DGen to address these fundamental challenges, embodying a geometry-first methodology. This methodology enhances the feed-forward paradigm by integrating novel geometric constraints with a perspective-view training strategy, explicitly countering the primary sources of geometric error. This geometry-centric strategy yields a dramatic leap in both 3D accuracy and photorealism. We demonstrate the versatility of our high-quality 3D assets through diverse downstream applications, including semantic-map-to-3D synthesis, multi-camera video generation, large-scale meshing, and unsupervised single-image Digital Surface Model (DSM) estimation.
๐ About This Release
This repository contains the public release of Sat3DGen, including:
- training
- single-image inference
- large-image slicing inference
- DSM export and DSM evaluation
- DSM preparation and alignment utilities
๐ฎ Online Demo
Try Sat3DGen directly in your browser on HuggingFace Spaces (no installation needed):
๐ค https://huggingface.co/spaces/qian43/Sat3DGen
Note: The online demo runs on CPU and may be slow. For faster inference, we recommend deploying locally with a GPU.
๐ Documentation
For different parts of the release, please refer to:
- training: docs/training.md
- inference: docs/inference.md
- evaluation: docs/evaluation.md
- dataset layout: docs/dataset_layout.md
- released config notes: docs/config_notes.md
๐ง Installation
The released training, inference, and checkpoint-based evaluation paths assume a CUDA-enabled environment.
1. Create Environment
conda create -n sat3dgen python=3.10
conda activate sat3dgen
2. Install PyTorch
Please install a PyTorch version that matches your CUDA environment.
Example:
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
3. Install Dependencies
pip install -r requirements.txt
4. Model Weights
Our model is hosted on HuggingFace โ no manual download required:
๐ค https://huggingface.co/qian43/Sat3DGen
All inference scripts default to loading from HuggingFace automatically.
The first run downloads the weights (~1.5 GB) and caches them under
~/.cache/huggingface/hub/. Subsequent runs load instantly from cache.
Quick test (no local checkpoints needed):
from source.generator import Sat3DGen
Sat3DGen._skip_backbone_weights = True
model = Sat3DGen.from_pretrained("qian43/Sat3DGen")
model = model.to("cuda:0").eval()
Alternative: use local checkpoints or ModelScope
You can also download the weights manually and place them under ./checkpoints/:
checkpoints/
โโโ config.json
โโโ diffusion_pytorch_model.safetensors
Pass --model_path checkpoints to the inference scripts to use local weights.
Weights are also available on ModelScope: https://modelscope.cn/models/xasDsacxsax/Sat3DGen
DINOv3 backbone (only needed for training)
For inference, DINOv3 backbone weights are already bundled in the released checkpoint โ no extra download needed.
For training, the DINOv3 backbone weights are loaded separately. The
default backbone is facebook/dinov3-vitl16-pretrain-sat493m.
Download: request access and download from HuggingFace: facebook/dinov3-vitl16-pretrain-sat493m
The code looks up weights in this order:
- Environment variable
SAT3DGEN_DINOV3_SAT_PATH(orSAT3DGEN_DINOV3_LVD_PATH). - Local Hugging Face cache:
~/.cache/huggingface/hub/... - Hugging Face Hub:
facebook/dinov3-vitl16-pretrain-sat493m.
If you have HuggingFace access, the first training run downloads it automatically. Otherwise, download manually and set the environment variable:
export SAT3DGEN_DINOV3_SAT_PATH=/path/to/your/dinov3-vitl16-pretrain-sat493m
๐ฆ Dataset Preparation
Our experiments in the released code are based on VIGOR. Please download the original VIGOR dataset first.
The original VIGOR release provides the satellite and panorama RGB images. We release the project-specific supplements on HuggingFace (available now):
๐ค https://huggingface.co/datasets/qian43/VIGOR_SAT3DGEN_add_skymask_DSM_satdepth
The supplement includes:
sat_depth/pano_sky_mask/Seattle_DSM/- training split
.txtfiles - test split
.txtfiles
Important note:
Seattle_DSM/should be placed at the same level as the city folders such asSeattle/, not insideSeattle/.
For the expected folder organization, please see docs/dataset_layout.md.
๐ Gradio Web Demo
We provide an interactive Gradio web demo for users who prefer a graphical interface (no command line, no notebook required):
python app.py
Then open http://localhost:7860 in your browser.
The web demo provides two tabs:
- 3D Mesh Generation: upload a satellite image and download the
reconstructed
.objmesh, with an in-browser 3D preview. - Video Rendering: select or upload a satellite image and a sky panorama, then render a walkthrough video along a pre-generated trajectory.
Optional environment variables:
GRADIO_SERVER_PORTโ change the listening port (default:7860).
๐ Command-Line Demo
After installing dependencies and downloading the checkpoint, you can run the end-to-end demo on a single satellite image:
bash inference.sh data/vigor/Seattle/satellite/<your_image>.png 0
The script will:
- Generate a 3D triplane representation and export a textured mesh.
- If no trajectory is found at
results/demo/<image_stem>/trajectory.csv, it will pause and ask you to draw one interactively ininference/make_trajectory.ipynb(open it in VSCode and run all cells; first time install withpip install ipympl). - Render panorama + 4-direction perspective views along the trajectory.
- Render a 3D mesh orbit video.
- Compose a final demo video.
Outputs are saved under results/demo/<image_stem>/:
input_sat.png # input satellite image
trajectory.csv # trajectory used for rendering
trajectory.png # trajectory visualization
mesh.obj # extracted 3D mesh
trajectory_video.mp4 # satellite + moving camera marker
mesh_orbit_video.mp4 # orbiting view of the 3D mesh
panorama_video.mp4 # panorama rendering along trajectory
streetview_video.mp4 # 4 perspective views along trajectory
demo_video.mp4 # composed final video
For more inference options (single-image inference, large-image slicing, DSM visualization, etc.), see docs/inference.md.
๐ Code Structure
Sat3DGen_clean/
โโโ configs/
โ โโโ dino_v3_large_sat_0906.json
โโโ demo_train.sh
โโโ train.py
โโโ app.py # interactive Gradio web demo
โโโ inference.sh # one-click end-to-end CLI demo
โโโ inference/
โ โโโ demo_inference.py # demo pipeline
โ โโโ make_trajectory.py # CLI trajectory tool (needs display)
โ โโโ make_trajectory.ipynb # notebook trajectory tool (no display)
โ โโโ single_image_inference.py
โ โโโ big_image_slice_inference.py
โ โโโ evaluate_img_metrics.py
โ โโโ visualize_dsm.py
โโโ DSM_processing/
โ โโโ calculate_DSM_metric2.py
โ โโโ processing_DSM_pair_from_txt.py
โโโ metrics/
โโโ my_datasets/
โโโ requirements.txt
โโโ src/
โโโ source/
โโโ docs/
โ โโโ config_notes.md
โ โโโ dataset_layout.md
โ โโโ evaluation.md
โ โโโ inference.md
โ โโโ training.md
โโโ LICENSE
โโโ CONTRIBUTING.md
๐ Notes
- This public release does not ship a
data/vigordirectory. Users should prepare their own VIGOR root. - Model weights are released on HuggingFace and ModelScope; VIGOR supplements are available on HuggingFace Datasets.
- The current evaluation pipeline does not include the DINO metric.
๐ค Contributing
We welcome contributions of any kind โ bug fixes, new features, documentation improvements, and benchmark extensions. See CONTRIBUTING.md for details.
๐ License
This project is released under the MIT License.
๐ Citation
If our work helps your research, please cite:
@inproceedings{
qian2026satdgen,
title={Sat3{DG}en: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image},
author={Ming Qian and Zimin Xia and Changkun Liu and Shuailei Ma and Wen Wang and Zeran Ke and Bin Tan and Hang Zhang and Gui-Song Xia},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=E7JzkZCofa}
}
@ARTICLE{Qian_2026_Sat2Densitypp,
author={Qian, Ming and Tan, Bin and Wang, Qiuyu and Zheng, Xianwei and Xiong, Hanjiang and Xia, Gui-Song and Shen, Yujun and Xue, Nan},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Seeing Through Satellite Images at Street Views},
year={2026},
volume={48},
number={5},
pages={5692-5709},
doi={10.1109/TPAMI.2026.3652860}}
@InProceedings{Qian_2023_Sat2Density,
author = {Qian, Ming and Xiong, Jincheng and Xia, Gui-Song and Xue, Nan},
title = {Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {3683-3692}
}
๐ Acknowledgements
This work builds on a number of excellent projects and prior efforts, including:
We also thank our collaborators and colleagues for their discussions and feedback during the project.