Protected-Attribute Tag Association dataset

November 16, 2022 ยท View on GitHub

This dataset is a proposed benchmark for measuring biases in Vision-Language models like OpenAI CLIP. It consists of a list of images organized as a set of scenes, and a set of captions applicable to each scene, organized according to specific protected attributes. We consider the binary gender, 5 ethno-racial groups (Black, Caucasian, East-Asian, Hispanic-Latino and Indian), and 2 age categories (young, old).

Distribution of images in the different protected attribute groups

AttributeScenesLabelCount
Age8Young3748
Age8Old1186
Race24Black1024
Race24Caucasian1033
Race24EastAsian1095
Race24Hispanic948
Race24Indian834
Gender24Female2529
Gender24Male2405

Measuring mean Skew and mean Skew@k