๐งฌ Generative Spatial Transformer (GST)
August 9, 2025 ยท View on GitHub
๐งฌ Generative Spatial Transformer (GST)
Implementation of GST from Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction in Pytorch.
โจ๏ธ News
- 2025-2: Code is released.
๐ ๏ธ Installation
- Environment setting
conda create -n gst python=3.8
pip install -r requirements.txt
- Model weight download
We provide Image tokenizer, Camera tokenizer, and Auto-regressive model in ย . Please download the following three ckpt and place them in the folder
./ckpts.
image-16.pt # Adopting from LlamaGen
camera-4.pt
gst.pt
๐ Inference
GST has constructed a joint distribution of images and corresponding perspectives.
Use the following command to sample --num-sample perspectives and images under a given observation --image-path.
python run_sample_camera_image.py \
--image-ckpt /path/to/image-16.pt \
--gpt-ckpt /path/to/gst.pt \
--camera-ckpt /path/to/camera-4.pt \
--image-path assets/hydrant.jpg \
--num-sample 16
More optional parameters can be found in the script run_sample_camera_image.py.
After sampling, the results will be saved in the folder sample.
The folder structure is as follows:
sample
โโโ camera.ply # Saved the 3D position and orientation of the perspectives
โโโ images.obj # Saved the images corresponding to each perspective
โย ย
โโโ material_0.png # Texture
โโโ material_1.png
โโโ ...
โโโ material.mtl # Texture mapping of 3D files
โย ย
โโโ sample_0.png # Sampled image
โโโ sample_0.npy # The camera matrix obtained by converting the sampled camera
โโโ sample_1.png
โโโ sample_1.npy
โโโ ...
The GST employs the RDF coordinate system,
where the positive direction of the x-axis is oriented to the right (R),
the positive direction of the y-axis is directed downward (D),
and the positive direction of the z-axis is oriented forward (F).
The sampled ply and obj files can be opened in meshlab or other three-dimensional software, as illustrated below:
๐ License
The majority of this project is licensed under MIT License. Portions of the project are available under separate license of referred projects, detailed in corresponding files.
โจ Citation
If our work assists your research, feel free to give us a star โญ or cite us using:
@article{chen2024and,
title={Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction},
author={Chen, Junyi and Huang, Di and Ye, Weicai and Ouyang, Wanli and He, Tong},
journal={arXiv preprint arXiv:2410.18962},
year={2024}
}
๐ Acknowledgement
We would like to express our gratitude to the contributors of the codebase provided by LlamaGen, which served as the foundation for our work. Special thanks are extended to the pioneering contributions of Zero123, ZeroNVS and RayDiffusion within the field, which have enriched our understanding and inspired our endeavors.