Protected-Attribute Tag Association dataset

November 16, 2022 · View on GitHub

This dataset is a proposed benchmark for measuring biases in Vision-Language models like OpenAI CLIP. It consists of a list of images organized as a set of scenes, and a set of captions applicable to each scene, organized according to specific protected attributes. We consider the binary gender, 5 ethno-racial groups (Black, Caucasian, East-Asian, Hispanic-Latino and Indian), and 2 age categories (young, old).

Distribution of images in the different protected attribute groups

Attribute	Scenes	Label	Count
Age	8	Young	3748
Age	8	Old	1186
Race	24	Black	1024
Race	24	Caucasian	1033
Race	24	EastAsian	1095
Race	24	Hispanic	948
Race	24	Indian	834
Gender	24	Female	2529
Gender	24	Male	2405