README.md
June 5, 2025 · View on GitHub
UBench: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions
Introduction • Quick Start • Citation
✨ Introduction
In this work, we present UBench, a novel benchmark designed to evaluate uncertainty estimation in large language models (LLMs). This work has been accepted to ACL 2025 Findings .
- Unlike other benchmarks, UBench is based on confidence intervals. It encompasses 11,978 multiple-choice questions spanning knowledge, language, understanding, and reasoning capabilities.
- We utilize UBENCH to conduct tests on 20 widely-adopted LLMs.
🚀 Quick Start
- todo
☕️ Citation
If you find this repository helpful, please consider citing our paper:
@misc{wang2025ubenchbenchmarkinguncertaintylarge,
title={UBench: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions},
author={Xunzhi Wang and Zhuowei Zhang and Gaonan Chen and Qiongyu Li and Bitong Luo and Zhixin Han and Haotian Wang and Zhiyu li and Hang Gao and Mengting Hu},
year={2025},
eprint={2406.12784},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.12784},
}