README.md

December 29, 2025 · View on GitHub

YourBench Logo

YourBench: A Dynamic Benchmark Generation Framework

GitHub Repo stars

[GitHub] · [Dataset] · [Documentation] · [Paper]


Generate high-quality QA pairs and evaluation datasets from any source documents. YourBench transforms your PDFs, Word docs, and text files into structured benchmark datasets with configurable output formats. Appearing at COLM 2025. 100% free and open source.

Features

  • Document Ingestion – Parse PDFs, Word docs, HTML, and text files into standardized Markdown
  • Question Generation – Create single-hop and multi-hop questions with customizable schemas
  • Custom Output Schemas – Define your own Pydantic models for question/answer format
  • Multi-Model Support – Use different LLMs for different pipeline stages
  • HuggingFace Integration – Push datasets directly to the Hub or save locally
  • Quality Filtering – Citation scoring and deduplication built-in

Quick Start

Use uv to run the packaged CLI directly:

uvx --from yourbench yourbench run example/default_example/config.yaml --debug

The example config works out-of-the-box with env vars from .env (see .env.template).

Install locally if you prefer:

uv pip install yourbench
yourbench run example/default_example/config.yaml

Installation

Requires Python 3.12+.

# With uv (recommended)
uv pip install yourbench

# With pip
pip install yourbench

From source:

git clone https://github.com/huggingface/yourbench.git
cd yourbench
pip install -e .

Usage

Minimal config:

hf_configuration:
  hf_dataset_name: my-benchmark

model_list:
  - model_name: openai/gpt-4o-mini
    api_key: $OPENAI_API_KEY

pipeline:
  ingestion:
    source_documents_dir: ./my-documents
  summarization:
  chunking:
  single_hop_question_generation:
  prepare_lighteval:
yourbench run config.yaml

With custom output schema:

pipeline:
  single_hop_question_generation:
    question_schema: ./my_schema.py  # Must export DataFormat class
# my_schema.py
from pydantic import BaseModel, Field

class DataFormat(BaseModel):
    question: str = Field(description="The question")
    answer: str = Field(description="The answer")
    difficulty: str = Field(description="easy, medium, or hard")

CLI Commands

YourBench provides several CLI commands:

CommandDescription
yourbench run <config>Run the full pipeline
yourbench validate <config>Check config without running
yourbench estimate <config>Estimate token usage
yourbench initGenerate starter config interactively
yourbench stagesList available pipeline stages
yourbench versionShow version

See CLI Reference for full documentation.

Documentation

GuideDescription
ConfigurationFull config reference with all options
Custom SchemasDefine your own output formats
How It WorksPipeline architecture and stages
CLI ReferenceAll CLI commands and options
FAQCommon questions and troubleshooting
OpenAI-Compatible ModelsUse vLLM, Ollama, etc.
Dataset ColumnsOutput field descriptions
Academic PaperCOLM 2025 submission

Try Online

No installation needed:

Example Configs

The example/ folder contains ready-to-use configurations:

  • default_example/ – Basic setup with sample documents
  • harry_potter_quizz/ – Generate quiz questions from books
  • custom_prompts_demo/ – Custom prompts for domain-specific questions
  • local_vllm_private_data/ – Use local models for private data
  • rich_pdf_extraction_with_gemini/ – LLM-based PDF extraction for charts/figures

Run any example:

yourbench run example/default_example/config.yaml

API Keys

Set in environment or .env file:

HF_TOKEN=hf_xxx              # For Hub upload
OPENAI_API_KEY=sk-xxx        # For OpenAI models

Use $VAR_NAME in config to reference environment variables.

Contributing

PRs welcome! Open an issue first for major changes.

📈 Progress

Star History Chart

📜 License

Apache 2.0 – see LICENSE.

📚 Citation

@misc{shashidhar2025yourbencheasycustomevaluation,
      title={YourBench: Easy Custom Evaluation Sets for Everyone},
      author={Sumuk Shashidhar and Clémentine Fourrier and Alina Lozovskia and Thomas Wolf and Gokhan Tur and Dilek Hakkani-Tür},
      year={2025},
      eprint={2504.01833},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.01833}
}