Flickr30k

April 30, 2021 ยท View on GitHub

AnyBox protocol

BackbonePre-training Image DataVal R@1Val R@5Val R@10Test R@1Test R@5Test R@10urlsize
Resnet-101COCO+VG+Flickr82.592.994.983.493.595.3model3GB
EfficientNet-B3COCO+VG+Flickr82.993.295.284.093.895.6model2.4GB
EfficientNet-B5COCO+VG+Flickr83.693.495.184.393.995.8model2.7GB

MergedBox protocol

BackbonePre-training Image DataVal R@1Val R@5Val R@10Test R@1Test R@5Test R@10urlsize
Resnet-101COCO+VG+Flickr82.391.893.783.892.794.4model3GB

Data preparation

The config for this dataset can be found in configs/flickr.json and is also shown below:

{
  "combine_datasets": ["flickr"],
  "combine_datasets_val": ["flickr"],
  "GT_type" : "separate",
  "flickr_img_path" : "",
  "flickr_dataset_path" : "" ,
  "flickr_ann_path" : "mdetr_annotations/"
}
  • Download the original Flickr30k image dataset from : Flickr30K webpage and update the flickr_img_path to the folder containing the images.
  • Download the original Flickr30k entities annotations from: Flickr30k annotations and update the flickr_dataset_path to the folder with annotations.
  • Download our pre-processed annotations that are converted to coco format (all datasets present in the same zip folder for MDETR annotations): Pre-processed annotations and update the flickr_ann_path to this folder with pre-processed annotations.

Script to reproduce results

Model weights (can also be loaded directly from url):

  1. pretrained_resnet101_checkpoint.pth
  2. flickr_merged_resnet101_checkpoint.pth
  3. pretrained_EB3_checkpoint.pth
  4. pretrained_EB5_checkpoint.pth

For results using the AnyBox protocol, the pre-trained models are directly evaluated on the val/test set.

The script to run the evaluation for the resnet-101 backbone pre-trained model is : This command will run the evaluation on val. For test results, pass --test

MDEDTR-Resnet101:

python run_with_submitit.py --dataset_config configs/flickr.json  --resume https://zenodo.org/record/4721981/files/pretrained_resnet101_checkpoint.pth  --ngpus 1 --nodes 2  --ema  --eval 

To run on a single node with 2 gpus

python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --dataset_config configs/flickr.json --resume https://zenodo.org/record/4721981/files/pretrained_resnet101_checkpoint.pth --ema --eval

MDETR-EB3:

python run_with_submitit.py --backbone "timm_tf_efficientnet_b3_ns" --dataset_config configs/flickr.json --resume https://zenodo.org/record/4721981/files/pretrained_EB3_checkpoint.pth  --ngpus 1 --nodes 2 --ema  --eval 

To run on a single node with 2 gpus

python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --dataset_config configs/flickr.json --backbone timm_tf_efficientnet_b3_ns --resume https://zenodo.org/record/4721981/files/pretrained_EB3_checkpoint.pth --ema --eval

MDETR-EB5:

python run_with_submitit.py --backbone "timm_tf_efficientnet_b5_ns" --dataset_config configs/flickr.json --resume https://zenodo.org/record/4721981/files/pretrained_EB5_checkpoint.pth  --ngpus 1 --nodes 2  --ema  --eval 

To run on a single node with 2 gpus

python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --dataset_config configs/flickr.json --backbone timm_tf_efficientnet_b5_ns --resume https://zenodo.org/record/4721981/files/pretrained_EB5_checkpoint.pth --ema --eval

For the MergedBox protocol, we provide the model fine-tuned on the merged ground truth.

Change the "GT_type" option in configs/flickr.json to "merged", and then run:

python run_with_submitit.py --dataset_config configs/flickr.json --resume https://zenodo.org/record/4721981/files/flickr_merged_resnet101_checkpoint.pth  --ngpus 1 --nodes 2 --ema  --eval 

Similarly to the above, pass --test for test set evaluation.

To run on a single node with 2 gpus

python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --dataset_config configs/flickr.json --resume https://zenodo.org/record/4721981/files/flickr_merged_resnet101_checkpoint.pth --ema --eval