VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics

March 9, 2026 ยท View on GitHub

arXiv Project Page Hugging Face Space

Daniel Cher*, Brian Wei*, Srikumar Sastry, Nathan Jacobs

(*Corresponding Author)

This repository is the official implementation of VectorSynth. VectorSynth is a suite of models for synthesizing satellite images with global style and text-driven layout control.

๐Ÿค— Models

VectorSynth: Hugging Face Model

VectorSynth-COSA: Hugging Face Model

๐ŸŒ Inference

from diffusers import StableDiffusionControlNetPipeline
pipe = StableDiffusionControlNetPipeline.from_pretrained("MVRL/VectorSynth")

See inference.py for a complete example with hint processing.

๐Ÿ”ฌ COSA

For using COSA, see cosa/README.md.

๐Ÿง‘โ€๐Ÿ’ป Setup and Training

Create a conda environment:

conda env create -f environment.yaml
conda activate vectorsynth

The dataset can be downloaded from here. See dataset.md for generating your own data from OpenStreetMap.

See train.md for training details.

๐Ÿ“‘ Citation

@inproceedings{cher2025vectorsynth,
  title={VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics},
  author={Cher, Daniel and Wei, Brian and Sastry, Srikumar and Jacobs, Nathan},
  year={2025},
  eprint={arXiv:2511.07744},
  note={arXiv preprint}
}

Check out our lab website for other interesting works on geospatial understanding and mapping:

  • Multi-Modal Vision Research Lab (MVRL) - Link
  • Related Works from MVRL - Link
  • See our previous work - Link