@BENCH: Benchmarking Vision-Language Models for Human-centered Assistive Technology (WACV 2025)
October 14, 2024 · View on GitHub
by Xin Jiang*, Junwei Zheng*, Ruiping Liu, Jiahang Li, Jiaming Zhang†, Sven Matthiesen, Rainer Stiefelhagen
* denotes equal contribution and † denotes corresponding author
News
- [2024.09.17] ATBench (Assistive Technology Benchmark) is accepted to WACV2025.
- [2024.10.13] We are excited to release ATModel (Assistive Technology Model) training code (INSTALL.md, DATASET.md, TRAIN.md, EVALUATION.md)

Introduction

ATBench is designed by a pre-design user study with PVIs, including five five most crucial vision-language tasks: Panoptic Segmentation, Image Captioning, Visual Question Answering (VQA), Depth Estimation, Optical Character Recognition (OCR). And we also proposed a novel ATModel that can address all tasks simultaneously.
More detailed can be found in our arxiv paper.
Getting Started
Checkpoints and Numbers:
| PS (ADE-150) | DE (NYU-V2) | OCR (6 datasets avg) | IC (VizWiz_Cap) | VQA (VizWiz_VQA) | #Params | |
|---|---|---|---|---|---|---|
| Model | PQ | RMSE | Acc(%) | CIDEr | Acc(%) | |
| Unified-IO (S) | - | 0.649 | - | - | 42.4 | 71M |
| Unified-IO (B) | - | 0.469 | - | - | 45.8 | 241M |
| Unified-IO (L) | - | 0.402 | - | - | 47.7 | 776M |
| X-Decoder (T) | 41.6 | - | - | - | - | 164M |
| GIT (T) | - | - | - | 113.1 | 68.0 | 0.7B |
| PaLI (T) | - | - | - | 117.2 | 67.5 | 3.0B |
| ATModel | 38.5 | 0.425 | 80.1 | 52.5 | 53.7 | 62M |
Installation, Dataset, Training and Evaluation Guide:
Acknowledgement
- We build our work on top of X-Decoder and use their code. We appreciate the previous open-source repository X-Decoder.
Citation
If you find our work useful in your research, please cite:
@inproceedings{jiang2025atbench,
title={@BENCH: Benchmarking Vision-Language Models for Human-centered Assistive Technology},
author={Jiang, Xin and Zheng, Junwei and Liu, Ruiping and Li, Jiahang and Zhang, Jiaming and Matthiesen, Sven and Stiefelhagen, Rainer},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2025}
}