Sat3DGen

May 18, 2026 · View on GitHub

⭐ If you find this work interesting or useful, please give us a star! It helps others discover the project and motivates us to keep improving it.

📢 News

[May 15, 2025] 🎉 ArXiv paper is now publicly available! ArXiv.
[Apr 27, 2025] 🎉 Code, data, and model weights are now publicly available! Online demo is live on HuggingFace Spaces.
[Jan 29, 2025] Repository initialized.

Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current methods present a stark trade-off: geometry-colorization models achieve high geometric fidelity but are typically building-focused and lack semantic diversity. In contrast, proxy-based models use feed-forward image-to-3D frameworks to generate holistic scenes by jointly learning geometry and texture, a process that yields rich content but coarse and unstable geometry.

We introduce Sat3DGen to address these fundamental challenges, embodying a geometry-first methodology. This methodology enhances the feed-forward paradigm by integrating novel geometric constraints with a perspective-view training strategy, explicitly countering the primary sources of geometric error. This geometry-centric strategy yields a dramatic leap in both 3D accuracy and photorealism. We demonstrate the versatility of our high-quality 3D assets through diverse downstream applications, including semantic-map-to-3D synthesis, multi-camera video generation, large-scale meshing, and unsupervised single-image Digital Surface Model (DSM) estimation.

📝 About This Release

This repository contains the public release of Sat3DGen, including:

training
single-image inference
large-image slicing inference
DSM export and DSM evaluation
DSM preparation and alignment utilities

🎮 Online Demo

Try Sat3DGen directly in your browser on HuggingFace Spaces (no installation needed):

🤗 https://huggingface.co/spaces/qian43/Sat3DGen

Note: The online demo runs on CPU and may be slow. For faster inference, we recommend deploying locally with a GPU.

📚 Documentation

For different parts of the release, please refer to:

training: docs/training.md
inference: docs/inference.md
evaluation: docs/evaluation.md
dataset layout: docs/dataset_layout.md
released config notes: docs/config_notes.md

🔧 Installation

The released training, inference, and checkpoint-based evaluation paths assume a CUDA-enabled environment.

1. Create Environment

conda create -n sat3dgen python=3.10
conda activate sat3dgen

2. Install PyTorch

Please install a PyTorch version that matches your CUDA environment.

Example:

conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

3. Install Dependencies

pip install -r requirements.txt

4. Model Weights

Our model is hosted on HuggingFace — no manual download required:

🤗 https://huggingface.co/qian43/Sat3DGen

All inference scripts default to loading from HuggingFace automatically. The first run downloads the weights (~1.5 GB) and caches them under ~/.cache/huggingface/hub/. Subsequent runs load instantly from cache.

Quick test (no local checkpoints needed):

from source.generator import Sat3DGen

Sat3DGen._skip_backbone_weights = True
model = Sat3DGen.from_pretrained("qian43/Sat3DGen")
model = model.to("cuda:0").eval()

Alternative: use local checkpoints or ModelScope

You can also download the weights manually and place them under ./checkpoints/:

checkpoints/
├── config.json
└── diffusion_pytorch_model.safetensors

Pass --model_path checkpoints to the inference scripts to use local weights.

Weights are also available on ModelScope: https://modelscope.cn/models/xasDsacxsax/Sat3DGen

DINOv3 backbone (only needed for training)

For inference, DINOv3 backbone weights are already bundled in the released checkpoint — no extra download needed.

For training, the DINOv3 backbone weights are loaded separately. The default backbone is facebook/dinov3-vitl16-pretrain-sat493m.

Download: request access and download from HuggingFace: facebook/dinov3-vitl16-pretrain-sat493m

The code looks up weights in this order:

Environment variable SAT3DGEN_DINOV3_SAT_PATH (or SAT3DGEN_DINOV3_LVD_PATH).
Local Hugging Face cache: ~/.cache/huggingface/hub/...
Hugging Face Hub: facebook/dinov3-vitl16-pretrain-sat493m.

If you have HuggingFace access, the first training run downloads it automatically. Otherwise, download manually and set the environment variable:

export SAT3DGEN_DINOV3_SAT_PATH=/path/to/your/dinov3-vitl16-pretrain-sat493m

📦 Dataset Preparation

Our experiments in the released code are based on VIGOR. Please download the original VIGOR dataset first.

The original VIGOR release provides the satellite and panorama RGB images. We release the project-specific supplements on HuggingFace (available now):

🤗 https://huggingface.co/datasets/qian43/VIGOR_SAT3DGEN_add_skymask_DSM_satdepth

The supplement includes:

sat_depth/
pano_sky_mask/
Seattle_DSM/
training split .txt files
test split .txt files

Important note:

Seattle_DSM/ should be placed at the same level as the city folders such as Seattle/, not inside Seattle/.

For the expected folder organization, please see docs/dataset_layout.md.

🌐 Gradio Web Demo

We provide an interactive Gradio web demo for users who prefer a graphical interface (no command line, no notebook required):

python app.py

Then open http://localhost:7860 in your browser.

The web demo provides two tabs:

3D Mesh Generation: upload a satellite image and download the reconstructed .obj mesh, with an in-browser 3D preview.
Video Rendering: select or upload a satellite image and a sky panorama, then render a walkthrough video along a pre-generated trajectory.

Optional environment variables:

GRADIO_SERVER_PORT — change the listening port (default: 7860).

🚀 Command-Line Demo

After installing dependencies and downloading the checkpoint, you can run the end-to-end demo on a single satellite image:

bash inference.sh data/vigor/Seattle/satellite/<your_image>.png 0

The script will:

Generate a 3D triplane representation and export a textured mesh.
If no trajectory is found at results/demo/<image_stem>/trajectory.csv, it will pause and ask you to draw one interactively in inference/make_trajectory.ipynb (open it in VSCode and run all cells; first time install with pip install ipympl).
Render panorama + 4-direction perspective views along the trajectory.
Render a 3D mesh orbit video.
Compose a final demo video.

Outputs are saved under results/demo/<image_stem>/:

input_sat.png            # input satellite image
trajectory.csv           # trajectory used for rendering
trajectory.png           # trajectory visualization
mesh.obj                 # extracted 3D mesh
trajectory_video.mp4     # satellite + moving camera marker
mesh_orbit_video.mp4     # orbiting view of the 3D mesh
panorama_video.mp4       # panorama rendering along trajectory
streetview_video.mp4     # 4 perspective views along trajectory
demo_video.mp4           # composed final video

For more inference options (single-image inference, large-image slicing, DSM visualization, etc.), see docs/inference.md.

📂 Code Structure

Sat3DGen_clean/
├── configs/
│   └── dino_v3_large_sat_0906.json
├── demo_train.sh
├── train.py
├── app.py                          # interactive Gradio web demo
├── inference.sh                    # one-click end-to-end CLI demo
├── inference/
│   ├── demo_inference.py           # demo pipeline
│   ├── make_trajectory.py          # CLI trajectory tool (needs display)
│   ├── make_trajectory.ipynb       # notebook trajectory tool (no display)
│   ├── single_image_inference.py
│   ├── big_image_slice_inference.py
│   ├── evaluate_img_metrics.py
│   └── visualize_dsm.py
├── DSM_processing/
│   ├── calculate_DSM_metric2.py
│   └── processing_DSM_pair_from_txt.py
├── metrics/
├── my_datasets/
├── requirements.txt
├── src/
├── source/
├── docs/
│   ├── config_notes.md
│   ├── dataset_layout.md
│   ├── evaluation.md
│   ├── inference.md
│   └── training.md
├── LICENSE
└── CONTRIBUTING.md

📌 Notes

This public release does not ship a data/vigor directory. Users should prepare their own VIGOR root.
Model weights are released on HuggingFace and ModelScope; VIGOR supplements are available on HuggingFace Datasets.
The current evaluation pipeline does not include the DINO metric.

@inproceedings{
    qian2026satdgen,
    title={Sat3{DG}en: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image},
    author={Ming Qian and Zimin Xia and Changkun Liu and Shuailei Ma and Wen Wang and Zeran Ke and Bin Tan and Hang Zhang and Gui-Song Xia},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=E7JzkZCofa}
}

@ARTICLE{Qian_2026_Sat2Densitypp,
    author={Qian, Ming and Tan, Bin and Wang, Qiuyu and Zheng, Xianwei and Xiong, Hanjiang and Xia, Gui-Song and Shen, Yujun and Xue, Nan},
    journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
    title={Seeing Through Satellite Images at Street Views}, 
    year={2026},
    volume={48},
    number={5},
    pages={5692-5709},
    doi={10.1109/TPAMI.2026.3652860}}

@InProceedings{Qian_2023_Sat2Density,
    author    = {Qian, Ming and Xiong, Jincheng and Xia, Gui-Song and Xue, Nan},
    title     = {Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {3683-3692}
}

🙏 Acknowledgements

This work builds on a number of excellent projects and prior efforts, including:

We also thank our collaborators and colleagues for their discussions and feedback during the project.