CLI Reference

March 6, 2026 ยท View on GitHub

AgentAssay provides a command-line interface with six commands.

agentassay --help
agentassay --version

agentassay run

Run agent assay trials against test scenarios.

agentassay run [OPTIONS]

Options

OptionShortDescription
--config PATH-cYAML config file with AssayConfig parameters
--scenario PATH-sYAML scenario file with TestScenario definition
--n INTEGER-nNumber of trials (overrides config file)
--output PATH-oOutput JSON file for results

Examples

# Validate config and scenario files
agentassay run --config assay.yaml --scenario qa.yaml

# Override trial count
agentassay run -c config.yaml -s scenario.yaml -n 50

# Save validated config to JSON
agentassay run -c config.yaml -s scenario.yaml -o validated.json

Notes

The run command validates configuration files and displays parameters. For full trial execution with real agents, use the Python API (TrialRunner). The CLI is designed for configuration validation, result analysis, and reporting.


agentassay compare

Compare baseline vs. current results for regression detection.

agentassay compare [OPTIONS]

Options

OptionShortRequiredDescription
--baseline PATH-bYesJSON file with baseline trial results
--current PATH-cYesJSON file with current trial results
--alpha FLOAT-aNoSignificance level (default: 0.05)
--output PATH-oNoOutput JSON file for comparison results

Input Format

The JSON files must contain trial results in one of these formats:

[{"passed": true}, {"passed": false}, {"passed": true}]

or:

{"results": [{"passed": true}, {"passed": false}]}

or:

{"trials": [{"success": true}, {"success": false}]}

Examples

# Basic regression comparison
agentassay compare --baseline v1.json --current v2.json

# Stricter significance level
agentassay compare -b baseline.json -c current.json --alpha 0.01

# Save comparison results
agentassay compare -b v1.json -c v2.json -o comparison.json

Exit Codes

CodeMeaning
0No regression detected
1Regression detected

Output

Displays a table with:

  • Baseline and current trial counts, pass counts, and pass rates
  • 95% confidence intervals for both versions
  • Hypothesis test result (test name, statistic, p-value, effect size)
  • Verdict: REGRESSION DETECTED or NO REGRESSION

agentassay mutate

Run mutation testing on an agent to evaluate test suite sensitivity.

agentassay mutate [OPTIONS]

Options

OptionShortDescription
--config PATH-cYAML config file with AgentConfig parameters
--scenario PATH-sYAML scenario file with TestScenario definition
--operators TEXTComma-separated operator categories: prompt, tool, model, context
--output PATH-oOutput JSON file for mutation results

Operator Categories

CategoryOperators
promptSynonym substitution, instruction order, noise injection, instruction drop
toolTool removal, tool reorder, tool noise
modelModel swap, model version
contextContext truncation, context noise, context permutation

Examples

# List all mutation operators
agentassay mutate --config agent.yaml --scenario qa.yaml

# Only prompt and tool mutations
agentassay mutate -c agent.yaml -s qa.yaml --operators prompt,tool

# Save operator list
agentassay mutate -c agent.yaml -s qa.yaml -o mutations.json

agentassay coverage

Compute and display the five-dimensional coverage metrics from trial results.

agentassay coverage [OPTIONS]

Options

OptionShortRequiredDescription
--results PATH-rYesJSON file with trial results containing execution traces
--tools TEXTNoComma-separated list of known tool names
--models TEXTNoComma-separated list of known model names

Examples

# Basic coverage analysis
agentassay coverage --results trials.json

# With known tools for accurate tool coverage
agentassay coverage -r trials.json --tools search,calculate,write_file

# With known models
agentassay coverage -r trials.json --tools search --models gpt-4o,claude-opus-4-6

Output

Displays:

  • Five-dimensional coverage vector with visual bars
  • Overall score (geometric mean)
  • Weakest dimension identification
  • Summary statistics (traces analyzed, tools observed, unique paths)

Coverage Interpretation

ScoreStatus
>= 80%GOOD
50-79%MODERATE
< 50%LOW

agentassay report

Generate a self-contained HTML report from trial results.

agentassay report [OPTIONS]

Options

OptionShortRequiredDescription
--results PATH-rYesJSON file with trial results
--output PATH-oNoOutput HTML file path (default: agentassay-report.html)

Examples

# Generate with default output filename
agentassay report --results trials.json

# Custom output path
agentassay report -r results.json -o reports/latest.html

Report Contents

The generated HTML report includes:

  • Overall verdict (PASS / FAIL / INCONCLUSIVE / NO DATA)
  • Summary table: total trials, passed, failed, pass rate, confidence interval
  • Visual pass rate bar
  • Methodology section: framework version, CI method, regression test, verdict semantics

The report is self-contained (no external CSS or JavaScript dependencies) and can be opened in any browser.


agentassay dashboard

Launch the interactive dashboard for visualizing test results, trends, and behavioral fingerprints.

agentassay dashboard [OPTIONS]

Options

OptionShortDescription
--port INTEGER-pPort to run dashboard on (default: 8501)
--host TEXTHost to bind to (default: localhost)
--no-browserDon't auto-open browser
--theme TEXTColor theme: dark, light (default: dark)

Prerequisites

Install with dashboard extras:

pip install agentassay[dashboard]

Examples

# Launch with defaults (opens browser automatically)
agentassay dashboard

# Custom port, no auto-open
agentassay dashboard --port 9000 --no-browser

# Light theme
agentassay dashboard --theme light

# Bind to all interfaces (for remote access)
agentassay dashboard --host 0.0.0.0

Notes

The dashboard reads from the local results database. Run agentassay run at least once to have data to visualize. See Dashboard Guide for full documentation.