๐ Relabeled Mapillary Traffic Sign Validation Dataset (2000 Images)
December 17, 2025 ยท View on GitHub
Welcome to the official repository for our relabeled Mapillary validation dataset, focused on traffic sign understanding. In this project, we release 2,000 curated images from the Mapillary dataset with manually re-annotated traffic sign labels, structured into a new, cleaner label taxonomy.
๐ Project Overview (Published in ICCV Workshop)
This project aims to standardize and improve traffic sign annotations by relabeling existing Mapillary data using modern Visual-Language Models (VLMs) like:
These models were leveraged to assist human annotators in refining the dataset, improving label clarity and coverage while reducing noise.
๐๏ธ Dataset Description
We relabeled 2,000 validation images from Mapillary with a focus on traffic sign recognition, especially relevant for autonomous driving and urban scene understanding.
โ New Label Taxonomy
Each traffic sign in the dataset is annotated into one of the following 12 categories:
stop signsspeed limit signsyield signsdo not entercrosswalk signsparking signsno parkingroundabout signsturn signscycle lane signsothers
๐ Motivation
While Mapillary provides a large and diverse dataset, the traffic sign annotations are often coarse or inconsistent. Our goal is to:
- Create a cleaner, task-specific benchmark for traffic sign detection and classification
- Enable VLM-based semi-automated relabeling pipelines
- Serve as a reference for fine-tuning or evaluating vision-language models on structured scene understanding
๐ Getting Started
1. Clone the repository
git clone https://github.com/nec-labs-ma/relabeling.git
cd relabeling
2. Install requirements
pip install -r requirements.txt
pip install awscli==1.25.0
3. Download the dataset
- Download images here:
https://mapillary-signs.s3.us-west-2.amazonaws.com/images.ziporaws s3 cp s3://mapillary-signs/images.zip . - Download annotations for signs:
https://mapillary-signs.s3.us-west-2.amazonaws.com/instances_default.jsonoraws s3 cp s3://mapillary-signs/instances_default.json .orhttps://huggingface.co/datasets/sparshgarg57/mapillary_traffic_signs
VLM-Based Dataset Relabeling
This repository supports batch relabeling of vision datasets using multiple variants of InternVL-3 and Gemma-3 Visual Language Models (VLMs). It covers four tasks across the Mapillary and BDD100K datasets.
๐ Tasks
The relabeling scripts support the following visual tasks:
- Mapillary Vehicles
- Mapillary Pedestrians
- Mapillary Traffic Signs
- BDD Vehicles
๐ File Structure
.
โโโ Relabeling_gemmi3-bdd_vehicles.py
โโโ Relabeling_gemmi3-mapillary_vehicles.py
โโโ Relabeling_gemmi3-mapillary_signs.py
โโโ Relabeling_gemmi3-pedestrains.py
โโโ Relabeling_internvl-bdd_vehicles.py
โโโ Relabeling_internvl-mapillary_vehicles.py
โโโ Relabeling_internvl-mapillary_signs.py
โโโ Relabeling_internvl-pedestrains.py
โโโ load_parallel_jobs.sh
Each script expects a single model name as a command-line argument.
๐ง Supported Models
InternVL-3
OpenGVLab/InternVL3-1BOpenGVLab/InternVL3-2BOpenGVLab/InternVL3-8BOpenGVLab/InternVL3-9BOpenGVLab/InternVL3-14B
Gemma-3
google/gemma-3-4b-itgoogle/gemma-3-12b-it
๐ Running the Jobs
Jobs are dispatched using a Slurm batch script load_parallel_jobs.sh.
๐ง Configuration Steps
- Edit
modelsarray inload_parallel_jobs.shto include desired model variants. - Replace the Python script name in the
sbatch--wrap=section according to the task and model family (Gemma or InternVL). - Submit jobs with:
bash load_parallel_jobs.sh
๐ฅ Example: Relabel with InternVL-3 9B for BDD Vehicles
models=(
"OpenGVLab/InternVL3-9B"
)
# In load_parallel_jobs.sh:
--wrap="python Relabeling_internvl-bdd_vehicles.py ${model}"
๐ฅ Example: Relabel with Gemma-3 12B for Mapillary Signs
models=(
"google/gemma-3-12b-it"
)
# In load_parallel_jobs.sh:
--wrap="python Relabeling_gemmi3-mapillary_signs.py ${model}"
๐ฆ Output
All Slurm logs are saved under the logs/ directory. Each model variant generates updated labels or predictions for the task-specific dataset in .txt and .json format
logs/
โโโ OpenGVLab-InternVL3-9B_bdd.out
โโโ OpenGVLab-InternVL3-9B_bdd.err
...
๐ง Classifier-Based Relabeling (ResNet50 / ResNet101)
In addition to VLMs, this repository includes support for relabeling using ResNet-based classifiers for all four tasks.
๐ง Tasks
- Mapillary Vehicles
- Mapillary Pedestrians
- Mapillary Traffic Signs
- BDD Vehicles
๐๏ธ Training
Trainer scripts are available for each task:
bdd_vehicles_classifier.py
humans_classifier.py
mapillary_vehicles_classifier.py
signs_classifier.py
Each script trains a ResNet-50 or ResNet-101 model and saves:
classifier.pth: The trained PyTorch model weightslabel_mapping.pkl: A mapping of class indices to labels
๐งช Inference
After training, use the corresponding inference script:
bdd_vehicles_classifier_inference.py
human_classifier_inference.py
mapillary_vehicles_classifier_inference.py
signs_classifier_inference.py
Make sure to specify the correct path to the classifier.pth and label_mapping.pkl files.
๐ Output
Each inference script generates updated labels or predictions for the task-specific dataset in .txt and .json format.
๐ External Data Integration & DINOv2 Feature Extraction
This repository also supports using external datasets (e.g., Roboflow and Object365) to enhance performance through DINOv2-based feature extraction.
๐ฆ External Data Sources
๐งฐ Roboflow
- Download the relevant dataset ZIP files from Roboflow.
Traffic Signs: https://universe.roboflow.com/ai-camp-weekend-t3odm/traffic-signs-detection-dpnpl
https://universe.roboflow.com/radu-oprea-r4xnm/traffic-signs-detection-europe
https://universe.roboflow.com/kendrickxy/european-road-signs
MotorCycles: https://universe.roboflow.com/cc-kzuq0/helmeteeeeeeeee
Person: https://universe.roboflow.com/mochammad-giri-wiwaha-ngulandoro/person-vthiu
Cyclists: https://universe.roboflow.com/bicycle-detection/bike-detect-ct
Pedestrians: https://universe.roboflow.com/erickson49366-gmail-com/cyclist-detector-training-data-v3
- List the paths to these ZIPs in
extract_boxes.py. - Run the script to extract and crop object instances:
python extract_boxes.py
๐ Object365
- Use
fetch_object_365.pyto extract and crop objects:python fetch_object_365.py
All cropped object images will be saved in task-specific directories.
๐ DINOv2 Feature Extraction

- Use
extract_features.pyto compute DINOv2 features for cropped objects (e.g., speed signs, yield signs). - Features will be saved as:
facebook_dinov2-giant_<object>.pt
๐ง Using DINOv2 Features
Use the extracted .pt feature files as input to the following scripts via the class_features parameter:
dinov2_bdd_vehicles.py
dinov2_humans.py
dinov2_mapillary_signs.py
dinov2_mapillary_vehicles.py
Run these scripts after passing the appropriate paths to the precomputed class features.
๐ License
This dataset is released for research and academic use only. Please check LICENSE.txt for details.
๐ Acknowledgements
- Mapillary Vistas Dataset
- InternVL by OpenGVLab
- Gemma by Google DeepMind
๐ฌ Contact
If you have any questions, suggestions, or collaboration ideas, feel free to reach out:
- ๐ง Email: sparsh@nec-labs.com
- ๐ผ LinkedIn: linkedin.com/in/garg-sparsh