README.md
October 20, 2025 ยท View on GitHub
MMGeo: Multimodal Compositional Geo-Localization for UAVs
- Part I: Code Released
- Part II: Dataset
Table of contents
Dataset Access
Coming Soon.
Quick Start
cd MMGeo
pip install -r requirements.txt
First initialize the visual model from game4loc training
# GTA-UAV-MM cross-area setting
python train_gta.py \
--data_root <The directory of the GTA-UAV-MM dataset> \
--train_pairs_meta_file "cross-area-drone2sate-train.json" \
--test_pairs_meta_file "cross-area-drone2sate-test.json" \
--model "vit_base_patch16_rope_reg1_gap_256.sbb_in1k" \
--gpu_ids 0,1 --lr 0.0001 --batch_size 64 \
--with_weight --k 5 --epoch 5
Then do the multimodal training with text modality
# with text
python train_gta_mm.py \
--data_root <The directory of the GTA-UAV-MM dataset> \
--train_pairs_meta_file "cross-area-drone2sate-train.json" \
--test_pairs_meta_file "cross-area-drone2sate-test.json" \
--model "vit_base_patch16_rope_reg1_gap_256.sbb_in1k" \
--checkpoint_start <pretrained visual model .pth> \
--with_text --token_length 50 \
--gpu_ids 0,1 --lr 0.0001 --batch_size 64 \
--with_weight --k 5 --epoch 5
Or with point cloud by setting --with_pc, with depth by --with_depth.
Citation
Coming Soon.