YourBench CLI Reference

December 29, 2025 · View on GitHub

YourBench provides a rich command-line interface for generating evaluation datasets from your documents.

Installation

# Install with uv (recommended)
uv pip install yourbench

# Or run directly without installing
uvx --from yourbench yourbench --help

Commands Overview

CommandDescription
runRun the full pipeline with a config file
validateCheck a config file without running
estimateEstimate token usage before running
initGenerate a starter config interactively
stagesList all available pipeline stages
versionShow YourBench version

yourbench run

Run the YourBench pipeline with a configuration file.

yourbench run <config_path> [OPTIONS]

Arguments:

  • config_path - Path to your YAML configuration file (required)

Options:

  • --debug, -d - Enable debug logging (shows detailed progress)
  • --quiet, -q - Minimal output (only errors)
  • --no-banner - Hide the startup banner

Examples:

# Basic run
yourbench run config.yaml

# With debug output
yourbench run config.yaml --debug

# Quiet mode for scripts
yourbench run config.yaml --quiet

Output:

  • Progress bars for each pipeline stage
  • Token usage statistics per stage
  • Final dataset location (Hub URL or local path)

yourbench validate

Validate a configuration file without running the pipeline. Useful for catching errors before a long run.

yourbench validate <config_path>

Arguments:

  • config_path - Path to YAML config file to validate (required)

Examples:

yourbench validate config.yaml

Output:

✓ Configuration is valid!

                             Configuration Summary                              
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Setting     ┃ Value                                                          ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Dataset     │ my-benchmark                                                   │
│ Push to Hub │ ✓                                                              │
│ Private     │ ✗                                                              │
│ Models      │ openai/gpt-4o-mini                                             │
│ Stages      │ ingestion, summarization, chunking, ...                        │
└─────────────┴────────────────────────────────────────────────────────────────┘

Enabled stages (5):
  1. ingestion
  2. summarization
  3. chunking
  4. single_hop_question_generation
  5. prepare_lighteval

Checks performed:

  • YAML syntax validity
  • Required fields present
  • Model configuration correct
  • Stage dependencies satisfied
  • Environment variables resolved

yourbench estimate

Estimate token usage for a pipeline run before executing it. Helps with cost planning.

yourbench estimate <config_path>

Arguments:

  • config_path - Path to YAML config file (required)

Examples:

yourbench estimate config.yaml

Output:

Source Documents:
  Files: 3
  Estimated tokens: 15.2K

                           Token Estimation by Stage                            
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Stage           ┃ Input Tokens ┃ Output Tokens ┃ API Calls ┃ Notes           ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ Ingestion       │            - │             - │         - │ No LLM calls    │
│ Summarization   │         4.5K │          6.0K │         3 │                 │
│ Chunking        │            - │             - │         - │ No LLM calls    │
│ Single Hop QG   │        27.6K │          4.5K │         3 │                 │
└─────────────────┴──────────────┴───────────────┴───────────┴─────────────────┘

╭─────── Summary ────────╮
│ Total Estimated Usage: │
│   Input tokens:  32.1K │
│   Output tokens: 10.5K │
│   Total:         42.6K │
╰────────────────────────╯

Notes:

  • Estimates use tiktoken for accurate token counting
  • Actual usage may vary based on model responses
  • Stages without LLM calls (ingestion, chunking) show "-"

yourbench init

Generate a starter configuration file interactively.

yourbench init [OPTIONS]

Options:

  • --output, -o - Output file path (default: config.yaml)
  • --force, -f - Overwrite existing file without prompting

Examples:

# Create config.yaml in current directory
yourbench init

# Create with custom name
yourbench init -o my-project/config.yaml

# Overwrite existing
yourbench init -o config.yaml --force

Interactive prompts:

  1. Dataset name for HuggingFace Hub
  2. Model provider (OpenAI, HuggingFace, local vLLM, custom)
  3. Source documents directory
  4. Pipeline stages to enable
  5. Output preferences (Hub push, local save)

yourbench stages

Display all available pipeline stages with descriptions.

yourbench stages

Output:

                                Pipeline Stages                                 
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ #   ┃ Stage                              ┃ Description                       ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1   │ ingestion                          │ Process source documents          │
│ 2   │ summarization                      │ Generate summaries                │
│ 3   │ chunking                           │ Split into chunks                 │
│ 4   │ single_hop_question_generation     │ Generate standalone Q&A pairs     │
│ 5   │ multi_hop_question_generation      │ Multi-chunk questions             │
│ 6   │ cross_document_question_generation │ Cross-document questions          │
│ 7   │ question_rewriting                 │ Rewrite for clarity               │
│ 8   │ prepare_lighteval                  │ Format for LightEval              │
│ 9   │ citation_score_filtering           │ Filter by citation quality        │
└─────┴────────────────────────────────────┴───────────────────────────────────┘

Stage details:

StageLLM RequiredDescription
ingestionNoParse PDFs, Word docs, HTML into Markdown
summarizationYesGenerate document summaries
chunkingNoSplit documents into semantic chunks
single_hop_question_generationYesQ&A pairs from individual chunks
multi_hop_question_generationYesQuestions requiring multiple chunks
cross_document_question_generationYesQuestions spanning documents
question_rewritingYesImprove question clarity
prepare_lightevalNoFormat for evaluation framework
citation_score_filteringNoFilter low-quality citations

yourbench version

Show the installed YourBench version.

yourbench version

Output:

YourBench v0.9.0

Environment Variables

The CLI respects these environment variables (can also be set in .env):

VariableDescription
HF_TOKENHuggingFace token for Hub operations
HF_ORGANIZATIONDefault organization for dataset uploads
OPENAI_API_KEYOpenAI API key
OPENAI_BASE_URLCustom OpenAI-compatible endpoint
OPENAI_MODELDefault model name

Use $VAR_NAME syntax in config files to reference environment variables:

model_list:
  - model_name: $OPENAI_MODEL
    api_key: $OPENAI_API_KEY
    base_url: $OPENAI_BASE_URL

Workflow Example

Typical workflow for generating a benchmark:

# 1. Generate starter config
yourbench init -o my-benchmark/config.yaml

# 2. Edit config as needed
vim my-benchmark/config.yaml

# 3. Validate before running
yourbench validate my-benchmark/config.yaml

# 4. Estimate costs
yourbench estimate my-benchmark/config.yaml

# 5. Run the pipeline
yourbench run my-benchmark/config.yaml --debug

Troubleshooting

"Config validation failed"

  • Run yourbench validate config.yaml for detailed error messages
  • Check that all required environment variables are set

"No documents found"

  • Verify source_documents_dir path exists
  • Check file extensions are supported (.pdf, .md, .txt, .docx, .html)

"API rate limit exceeded"

  • Reduce max_concurrent_requests in model config
  • Add delays between runs

"Token limit exceeded"

  • Use yourbench estimate to check token usage
  • Reduce chunk size or number of questions per chunk

See FAQ for more troubleshooting tips.