README.md

June 4, 2026 ยท View on GitHub


MiVOLO: Multi-input Transformer for Age and Gender Estimation

MiVOLO: Multi-input Transformer for Age and Gender Estimation, Maksim Kuprashevich, Irina Tolstykh, 2023 arXiv 2307.04616

Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation, Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh 2024 arXiv 2403.02302

[Paper 2023] [Paper 2024] [Demo] [๐Ÿค— HuggingFace] [๐Ÿค— HuggingFace Detector] [Telegram Bot] [BibTex] [Data]

โšก NEW MiVOLO-Next โ€” live demos on ๐Ÿค— Spaces

Age & Gender demo ย ย  Adult vs Minor demo

Successor of MiVOLO v2 โ€” dual-stream face + person backbone ยท APPA-Real MAE 4.07 ยท 28 722 FPS on A100

MiVOLO pretrained models

Gender & Age recognition performance.

Model Type Dataset (train and test) Age MAE Age CS@5 Gender Accuracy download
volo_d1 face_only, age IMDB-cleaned 4.29 67.71 - checkpoint
volo_d1 face_only, age, gender IMDB-cleaned 4.22 68.68 99.38 checkpoint
mivolo_d1 face_body, age, gender IMDB-cleaned 4.24 [face+body]
6.87 [body]
68.32 [face+body]
46.32 [body]
99.46 [face+body]
96.48 [body]
model_imdb_cross_person_4.24_99.46.pth.tar
volo_d1 face_only, age UTKFace 4.23 69.72 - checkpoint
volo_d1 face_only, age, gender UTKFace 4.23 69.78 97.69 checkpoint
mivolo_d1 face_body, age, gender Lagenda 3.99 [face+body] 71.27 [face+body] 97.36 [face+body] demo
mivolov2_d1_384x384 face_body, age, gender Lagenda 3.65 [face+body] 74.48 [face+body] 97.99 [face+body] checkpoint
telegram bot

MiVOLO regression benchmarks

Gender & Age recognition performance.

Use valid_age_gender.sh to reproduce results with our checkpoints.

Model Type Train Dataset Test Dataset Age MAE Age CS@5 Gender Accuracy download
mivolo_d1 face_body, age, gender Lagenda AgeDB 5.55 [face] 55.08 [face] 98.3 [face] demo
mivolo_d1 face_body, age, gender IMDB-cleaned AgeDB 5.58 [face] 55.54 [face] 97.93 [face] model_imdb_cross_person_4.24_99.46.pth.tar

MiVOLO classification benchmarks

Gender & Age recognition performance.

Model Type Train Dataset Test Dataset Age Accuracy Gender Accuracy
mivolo_d1 face_body, age, gender Lagenda FairFace 61.07 [face+body] 95.73 [face+body]
mivolo_d1 face_body, age, gender Lagenda Adience 68.69 [face] 96.51[face]
mivolov2_d1_384 face_body, age, gender Lagenda Adience 69.43 [face] 97.39[face]

Dataset

Please, cite our papers if you use any this data!

  • Lagenda dataset: images and annotation.

  • IMDB-clean: follow these instructions to get images and download our annotations.

  • UTK dataset: origin full images and our annotation: split from the article, random full split.

  • Adience dataset: follow these instructions to get images and download our annotations.

    Click to expand!

    After downloading them, your data directory should look something like this:

    data
    โ””โ”€โ”€ Adience
        โ”œโ”€โ”€ annotations  (folder with our annotations)
        โ”œโ”€โ”€ aligned      (will not be used)
        โ”œโ”€โ”€ faces
        โ”œโ”€โ”€ fold_0_data.txt
        โ”œโ”€โ”€ fold_1_data.txt
        โ”œโ”€โ”€ fold_2_data.txt
        โ”œโ”€โ”€ fold_3_data.txt
        โ””โ”€โ”€ fold_4_data.txt
    

    We use coarse aligned images from faces/ dir.

    Using our detector we found a face bbox for each image (see tools/prepare_adience.py).

    This dataset has five folds. The performance metric is accuracy on five-fold cross validation.

    images before removalfold 0fold 1fold 2fold 3fold 4
    19,3704,4843,7303,8943,4463,816

    Not complete data

    only age not foundonly gender not foundSUM
    4011701,210 (6.2 %)

    Removed data

    failed to process imageage and gender not foundSUM
    0708708 (3.6 %)

    Genders

    femalemale
    9,3728,120

    Ages (8 classes) after mapping to not intersected ages intervals

    0-24-68-1215-2025-3238-4348-5360-100
    2,5092,1402,2931,7915,5892,490909901
  • FairFace dataset: follow these instructions to get images and download our annotations.

    Click to expand!

    After downloading them, your data directory should look something like this:

    data
    โ””โ”€โ”€ FairFace
       โ”œโ”€โ”€ annotations  (folder with our annotations)
       โ”œโ”€โ”€ fairface-img-margin025-trainval   (will not be used)
           โ”œโ”€โ”€ train
           โ”œโ”€โ”€ val
       โ”œโ”€โ”€ fairface-img-margin125-trainval
           โ”œโ”€โ”€ train
           โ”œโ”€โ”€ val
       โ”œโ”€โ”€ fairface_label_train.csv
       โ”œโ”€โ”€ fairface_label_val.csv
    
    

    We use aligned images from fairface-img-margin125-trainval/ dir.

    Using our detector we found a face bbox for each image and added a person bbox if it was possible (see tools/prepare_fairface.py).

    This dataset has 2 splits: train and val. The performance metric is accuracy on validation.

    images trainimages val
    86,74410,954

    Genders for validation

    femalemale
    5,1625,792

    Ages for validation (9 classes):

    0-23-910-1920-2930-3940-4950-5960-6970+
    1991,3561,1813,3002,3301,353796321118
  • AgeDB dataset: follow these instructions to get images and download our annotations.

    Click to expand!

    Ages: 1 - 101

    Genders: 9788 faces of M, 6700 faces of F

    images 0images 1images 2images 3images 4images 5images 6images 7images 8images 9
    1701172116151619162616431634159616761657

    Data splits were taken from here

    !! All splits(all dataset) were used for models evaluation.

Install

Install pytorch 1.13+ and other requirements.

pip install -r requirements.txt
pip install .

Demo

  1. Download body + face detector model to models/yolov8x_person_face.pt
  2. Download mivolo checkpoint to models/mivolo_imbd.pth.tar
wget https://variety.com/wp-content/uploads/2023/04/MCDNOHA_SP001.jpg -O jennifer_lawrence.jpg

python3 demo.py \
--input "jennifer_lawrence.jpg" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt " \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--with-persons \
--draw

To run demo for a youtube video:

python3 demo.py \
--input "https://www.youtube.com/shorts/pVh32k0hGEI" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt" \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--draw \
--with-persons

Validation

To reproduce validation metrics:

  1. Download prepared annotations for imbd-clean / utk / adience / lagenda / fairface.
  2. Download checkpoint
  3. Run validation:
python3 eval_pretrained.py \
  --dataset_images /path/to/dataset/utk/images \
  --dataset_annotations /path/to/dataset/utk/annotation \
  --dataset_name utk \
  --split valid \
  --batch-size 512 \
  --checkpoint models/mivolo_imbd.pth.tar \
  --half \
  --with-persons \
  --device "cuda:0"

Supported dataset names: "utk", "imdb", "lagenda", "fairface", "adience".

Changelog

CHANGELOG.md

ONNX and TensorRT export

As of now (11.08.2023), while ONNX export is technically feasible, it is not advisable due to the poor performance of the resulting model with batch processing. TensorRT and OpenVINO export is impossible due to its lack of support for col2im.

If you remain absolutely committed to utilizing ONNX export, you can refer to these instructions.

The most highly recommended export method at present is using TorchScript. You can achieve this with a single line of code:

torch.jit.trace(model)

This approach provides you with a model that maintains its original speed and only requires a single file for usage, eliminating the need for additional code.

License

Please, see here

Citing

If you use our models, code or dataset, we kindly request you to cite the following paper and give repository a :star:

@article{mivolo2023,
   Author = {Maksim Kuprashevich and Irina Tolstykh},
   Title = {MiVOLO: Multi-input Transformer for Age and Gender Estimation},
   Year = {2023},
   Eprint = {arXiv:2307.04616},
}
@article{mivolo2024,
   Author = {Maksim Kuprashevich and Grigorii Alekseenko and Irina Tolstykh},
   Title = {Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation},
   Year = {2024},
   Eprint = {arXiv:2403.02302},
}