YourBench CLI Reference
December 29, 2025 · View on GitHub
YourBench provides a rich command-line interface for generating evaluation datasets from your documents.
Installation
# Install with uv (recommended)
uv pip install yourbench
# Or run directly without installing
uvx --from yourbench yourbench --help
Commands Overview
| Command | Description |
|---|---|
run | Run the full pipeline with a config file |
validate | Check a config file without running |
estimate | Estimate token usage before running |
init | Generate a starter config interactively |
stages | List all available pipeline stages |
version | Show YourBench version |
yourbench run
Run the YourBench pipeline with a configuration file.
yourbench run <config_path> [OPTIONS]
Arguments:
config_path- Path to your YAML configuration file (required)
Options:
--debug, -d- Enable debug logging (shows detailed progress)--quiet, -q- Minimal output (only errors)--no-banner- Hide the startup banner
Examples:
# Basic run
yourbench run config.yaml
# With debug output
yourbench run config.yaml --debug
# Quiet mode for scripts
yourbench run config.yaml --quiet
Output:
- Progress bars for each pipeline stage
- Token usage statistics per stage
- Final dataset location (Hub URL or local path)
yourbench validate
Validate a configuration file without running the pipeline. Useful for catching errors before a long run.
yourbench validate <config_path>
Arguments:
config_path- Path to YAML config file to validate (required)
Examples:
yourbench validate config.yaml
Output:
✓ Configuration is valid!
Configuration Summary
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Setting ┃ Value ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Dataset │ my-benchmark │
│ Push to Hub │ ✓ │
│ Private │ ✗ │
│ Models │ openai/gpt-4o-mini │
│ Stages │ ingestion, summarization, chunking, ... │
└─────────────┴────────────────────────────────────────────────────────────────┘
Enabled stages (5):
1. ingestion
2. summarization
3. chunking
4. single_hop_question_generation
5. prepare_lighteval
Checks performed:
- YAML syntax validity
- Required fields present
- Model configuration correct
- Stage dependencies satisfied
- Environment variables resolved
yourbench estimate
Estimate token usage for a pipeline run before executing it. Helps with cost planning.
yourbench estimate <config_path>
Arguments:
config_path- Path to YAML config file (required)
Examples:
yourbench estimate config.yaml
Output:
Source Documents:
Files: 3
Estimated tokens: 15.2K
Token Estimation by Stage
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Stage ┃ Input Tokens ┃ Output Tokens ┃ API Calls ┃ Notes ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ Ingestion │ - │ - │ - │ No LLM calls │
│ Summarization │ 4.5K │ 6.0K │ 3 │ │
│ Chunking │ - │ - │ - │ No LLM calls │
│ Single Hop QG │ 27.6K │ 4.5K │ 3 │ │
└─────────────────┴──────────────┴───────────────┴───────────┴─────────────────┘
╭─────── Summary ────────╮
│ Total Estimated Usage: │
│ Input tokens: 32.1K │
│ Output tokens: 10.5K │
│ Total: 42.6K │
╰────────────────────────╯
Notes:
- Estimates use tiktoken for accurate token counting
- Actual usage may vary based on model responses
- Stages without LLM calls (ingestion, chunking) show "-"
yourbench init
Generate a starter configuration file interactively.
yourbench init [OPTIONS]
Options:
--output, -o- Output file path (default:config.yaml)--force, -f- Overwrite existing file without prompting
Examples:
# Create config.yaml in current directory
yourbench init
# Create with custom name
yourbench init -o my-project/config.yaml
# Overwrite existing
yourbench init -o config.yaml --force
Interactive prompts:
- Dataset name for HuggingFace Hub
- Model provider (OpenAI, HuggingFace, local vLLM, custom)
- Source documents directory
- Pipeline stages to enable
- Output preferences (Hub push, local save)
yourbench stages
Display all available pipeline stages with descriptions.
yourbench stages
Output:
Pipeline Stages
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ # ┃ Stage ┃ Description ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1 │ ingestion │ Process source documents │
│ 2 │ summarization │ Generate summaries │
│ 3 │ chunking │ Split into chunks │
│ 4 │ single_hop_question_generation │ Generate standalone Q&A pairs │
│ 5 │ multi_hop_question_generation │ Multi-chunk questions │
│ 6 │ cross_document_question_generation │ Cross-document questions │
│ 7 │ question_rewriting │ Rewrite for clarity │
│ 8 │ prepare_lighteval │ Format for LightEval │
│ 9 │ citation_score_filtering │ Filter by citation quality │
└─────┴────────────────────────────────────┴───────────────────────────────────┘
Stage details:
| Stage | LLM Required | Description |
|---|---|---|
ingestion | No | Parse PDFs, Word docs, HTML into Markdown |
summarization | Yes | Generate document summaries |
chunking | No | Split documents into semantic chunks |
single_hop_question_generation | Yes | Q&A pairs from individual chunks |
multi_hop_question_generation | Yes | Questions requiring multiple chunks |
cross_document_question_generation | Yes | Questions spanning documents |
question_rewriting | Yes | Improve question clarity |
prepare_lighteval | No | Format for evaluation framework |
citation_score_filtering | No | Filter low-quality citations |
yourbench version
Show the installed YourBench version.
yourbench version
Output:
YourBench v0.9.0
Environment Variables
The CLI respects these environment variables (can also be set in .env):
| Variable | Description |
|---|---|
HF_TOKEN | HuggingFace token for Hub operations |
HF_ORGANIZATION | Default organization for dataset uploads |
OPENAI_API_KEY | OpenAI API key |
OPENAI_BASE_URL | Custom OpenAI-compatible endpoint |
OPENAI_MODEL | Default model name |
Use $VAR_NAME syntax in config files to reference environment variables:
model_list:
- model_name: $OPENAI_MODEL
api_key: $OPENAI_API_KEY
base_url: $OPENAI_BASE_URL
Workflow Example
Typical workflow for generating a benchmark:
# 1. Generate starter config
yourbench init -o my-benchmark/config.yaml
# 2. Edit config as needed
vim my-benchmark/config.yaml
# 3. Validate before running
yourbench validate my-benchmark/config.yaml
# 4. Estimate costs
yourbench estimate my-benchmark/config.yaml
# 5. Run the pipeline
yourbench run my-benchmark/config.yaml --debug
Troubleshooting
"Config validation failed"
- Run
yourbench validate config.yamlfor detailed error messages - Check that all required environment variables are set
"No documents found"
- Verify
source_documents_dirpath exists - Check file extensions are supported (.pdf, .md, .txt, .docx, .html)
"API rate limit exceeded"
- Reduce
max_concurrent_requestsin model config - Add delays between runs
"Token limit exceeded"
- Use
yourbench estimateto check token usage - Reduce chunk size or number of questions per chunk
See FAQ for more troubleshooting tips.