IFEval-Audio
May 29, 2026 ยท View on GitHub
Part of the AudioBench benchmark suite. See the main README for installation, the full dataset list, and how to run evaluations.
Overview
IFEval-Audio is a dataset to evaluate instruction-following in audio-based LLMs, with 280 audio-instruction-answer triples across six dimensions: Content, Capitalization, Symbol, List Structure, Length, and Format.
Dataset Structure
- Audio Input: From Spoken SQUAD, TED-LIUM 3, Muchomusic, etc.
- Text Instruction: Specifies one dimension (e.g., "Use JSON format").
- Expected Answer: Reference output.
- Dimensions: Content, Capitalization, Symbol, List Structure, Length, Format.
- Distribution: 240 speech triples (40/dimension), 40 music/environmental triples.
Dataset
The dataset is hosted on Hugging Face: YichenG170/AudioLLMInstructionFollowing
(you may need to log in and accept the access conditions). It is loaded automatically by the
evaluation code โ no manual download is required.
How to Run
IFEval-Audio is integrated into AudioBench as the dataset audiollm_instructionfollowing
with the metric llama3_70b_judge_combined. From the repo root:
# Step 1 (separate process): serve the Llama-3-70B judge used to score correctness.
bash vllm_model_judge_llama_3_70b.sh
# Step 2: run inference + evaluation for your model.
GPU=0
BATCH_SIZE=1
OVERWRITE=True
NUMBER_OF_SAMPLES=-1 # -1 = all 280 triples
MODEL_NAME=Qwen2-Audio-7B-Instruct # any supported model
DATASET=audiollm_instructionfollowing
METRICS=llama3_70b_judge_combined
bash eval.sh $DATASET $MODEL_NAME $GPU $BATCH_SIZE $OVERWRITE $METRICS $NUMBER_OF_SAMPLES
The judge reports the three metrics below, broken down by the six dimensions.
Evaluation Metrics
IFR: Format adherence score (0/1). SCR: Semantic correctness score (0/1). OSR: Triples with IFR=1 and SCR=1.
Citation
@article{gao2025ifevalaudio,
title={IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models},
author={Gao, Yiming and Wang, Bin and Wei, Chengwei and Sun, Shuo and Aw, AiTi},
journal={arXiv preprint arXiv:2505.16774},
year={2025}
}