TaxaBind: A Unified Embedding Space for Ecological Applications

November 8, 2024 ยท View on GitHub

arXiv Project Page Hugging Face Models Hugging Face Models Hugging Face Space

Srikumar Sastry*, Subash Khanal, Aayush Dhakal, Adeel Ahmad, Nathan Jacobs (*Corresponding Author)

WACV 2025

This repository is the official implementation of TaxaBind. TaxaBind is a suite of multimodal models useful for downstream ecological tasks covering six modalities: ground-level image, geographic location, satellite image, text, audio, and environmental features.

๐ŸŽฏ Zero-Shot Image Classification

Our framework outperforms the state-of-the-art in both unimodal (BioCLIP, ArborCLIP) and multimodal setting (ImageBind).

๐Ÿ”ฅ Large Mulitmodal Ecological Datasets

  • We release TaxaBench-8k, a truly multimodal dataset containing six paired modalities for evaluating large ecological models.
  • We release iSatNat, containing 2.7M pairs of satellite images and ground-level species images.
  • We release iSoundNat, containing 88,130 pairs of audio and ground-level species images.

โš™๏ธ Usage

Our pretrained models are made available through rshf and transformers package for easy inference.

Load and initialize taxabind config:

from transformers import PretrainedConfig
from rshf.taxabind import TaxaBind

config = PretrainedConfig.from_pretrained("MVRL/taxabind-config")
taxabind = TaxaBind(config)

๐Ÿ“Ž Loading ground-level image and text encoders:

# Loads open_clip style model

model = taxabind.get_image_text_encoder()
tokenizer = taxabind.get_tokenizer()
processor = taxabind.get_image_processor()

๐Ÿ›ฐ๏ธ Loading satellite image encoder:

sat_encoder = taxabind.get_sat_encoder()
sat_processor = taxabind.get_sat_processor()

๐Ÿ“ Loading location encoder:

location_encoder = taxabind.get_location_encoder()

๐Ÿ”ˆ Loading audio encoder:

audio_encoder = taxabind.get_audio_encoder()
audio_processor = taxabind.get_audio_processor()

๐ŸŒฆ๏ธ Loading environmental encoder:

env_encoder = taxabind.get_env_encoder()
env_processor = taxabind.get_env_processor()

๐Ÿ“‘ Citation

@inproceedings{sastry2025taxabind,
    title={TaxaBind: A Unified Embedding Space for Ecological Applications},
    author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Ahmad, Adeel and Jacobs, Nathan},
    booktitle={Winter Conference on Applications of Computer Vision},
    year={2025},
    organization={IEEE/CVF}
}

Check out our lab website for other interesting works on geospatial understanding and mapping:

  • Multi-Modal Vision Research Lab (MVRL) - Link
  • Related Works from MVRL - Link