NMT: Normalized Matching Transformer (arxiv)
March 25, 2025 ยท View on GitHub
Normalized Matching Transformer (NMT) is an end-to-end deep learning pipeline that fuses a swin-transformer backbone, a SplineCNN for geometry-aware keypoint refinement, and a normalized transformer decoder with Sinkhorn matching and advanced contrastive/hyperspherical losses to achieve state-of-the-art sparse keypoint correspondence.

Results
PascalVOC

Spair-71k

Requirements
We use CUDA 12.4 and GCC 11.4.0. All needed packages and libraries are in environment.yml.
Download datasets
Run the download_data.sh script.
Backbone weights
We use the SwinV2 model as our backbone. You need to download the SwinV2-L* weights, which were pretrained on ImageNet-22K and finetuned on ImageNet-1K (SwinV2).
The weights location path should be ./utils/checkpoints/.
Installation
conda env.
- Entry the path of your conda environment folder in the last line of the
environment.ymlfile. - Entry the command:
conda env create -f environment.yml
Usage
Parameters
All Parameters are in the experiments/ folder.
Running Training / Evaluation
python -m torch.distributed.run --nproc_per_node=1 train_eval.py ./experiments/voc_basic.json
--nproc_per_node=1sets how many GPUs you want to run the model on (for now use 1 GPU to get same results)../experiments/voc_basic.jsonis which Parameters and Dataset to use. Other option would be./experiments/spair.json.