The AIPerf plugin system provides a flexible, extensible architecture for customizing benchmark behavior. It uses YAML-based configuration with lazy loading, priority-based conflict resolution, and dynamic enum generation.
The plugin system enables:
- Extensibility: Add custom endpoints, exporters, and timing strategies without modifying core code
- Lazy Loading: Classes load on first access, avoiding circular imports
- Conflict Resolution: Higher priority plugins override lower priority ones
- Type Safety: Auto-generated enums provide IDE autocomplete
- Validation: Validate plugins without importing them
| Term | Description | Code Type |
|---|
| Registry | Global singleton holding all plugins | _PluginRegistry |
| Package | Python package providing plugins | PackageInfo |
| Manifest | plugins.yaml declaring plugins | PluginsManifest |
| Category | Plugin type (e.g., endpoint, transport) | PluginType enum |
| Entry | Single registered plugin (name, class_path, priority, metadata) | PluginEntry |
| Class | Python class implementing a plugin (lazy-loaded) | type |
| Metadata | Typed configuration (e.g., EndpointMetadata) | Pydantic model |
Hierarchy:
Registry (singleton)
└── Package (1+) ─── discovered via entry points
└── Manifest (1+ per package) ─── plugins.yaml files
└── Category (1+)
└── Entry (1+) ─── PluginEntry
├── Class ─── lazy-loaded Python class
└── Metadata ─── optional typed config
| Component | File | Purpose |
|---|
| Plugin Registry | src/aiperf/plugin/plugins.py | Singleton managing discovery and loading |
| Plugin Entry | src/aiperf/plugin/types.py | Lazy-loading entry with metadata |
| Categories | src/aiperf/plugin/categories.yaml | Category definitions with protocols |
| Built-in Plugins | src/aiperf/plugin/plugins.yaml | Built-in plugin registrations |
| Schemas | src/aiperf/plugin/schema/schemas.py | Pydantic models for validation |
| Enums | src/aiperf/plugin/enums.py | Auto-generated enums from registry |
| CLI | src/aiperf/cli_commands/plugins.py | Plugin exploration commands |
Entry Points → plugins.yaml → Pydantic Validation → Registry
↓
get_class() → Import Module → Cache
| Phase | Action |
|---|
| 1. Discovery | Scan aiperf.plugins entry points for plugins.yaml files |
| 2. Loading | Parse YAML, validate with Pydantic, register with conflict resolution |
| 3. Access | get_class() imports module, caches class for reuse |
The plugin registry follows the singleton pattern with module-level exports:
from aiperf.plugin import plugins
from aiperf.plugin.enums import PluginType
# Get a plugin class by name
EndpointClass = plugins.get_class(PluginType.ENDPOINT, "chat")
# Iterate all plugins in a category
for entry, cls in plugins.iter_all(PluginType.ENDPOINT):
print(f"{entry.name}: {entry.description}")
AIPerf supports 33 plugin categories organized by function, including api_router and public_dataset_loader:
| Category | Enum | Description |
|---|
timing_strategy | TimingMode | Request scheduling strategies (fixed schedule, request rate, user-centric) |
arrival_pattern | ArrivalPattern | Inter-arrival time distributions (constant, Poisson, gamma, concurrency burst) |
ramp | RampType | Value ramping strategies (linear, exponential, Poisson) |
| Category | Enum | Description |
|---|
dataset_backing_store | DatasetBackingStoreType | Server-side dataset storage |
dataset_client_store | DatasetClientStoreType | Worker-side dataset access |
dataset_sampler | DatasetSamplingStrategy | Sampling strategies (random, sequential, shuffle) |
dataset_composer | ComposerType | Dataset generation (synthetic, custom, rankings) |
custom_dataset_loader | CustomDatasetType | JSONL format loaders |
public_dataset_loader | PublicDatasetType | Shared benchmark dataset fetchers (HTTP, HuggingFace) |
| Category | Enum | Description |
|---|
api_router | APIRouterType | Lifecycle-managed HTTP/WebSocket routers exposed via BaseRouter |
endpoint | EndpointType | API endpoint implementations (chat, completions, embeddings, etc.) |
transport | TransportType | Network transport (HTTP via aiohttp) |
| Category | Enum | Description |
|---|
record_processor | RecordProcessorType | Per-record metric computation |
results_processor | ResultsProcessorType | Aggregated results computation |
gpu_telemetry_processor | GPUTelemetryProcessorType | Side-channel GPU telemetry aggregation/export within GPUTelemetryManager |
server_metrics_processor | ServerMetricsProcessorType | Side-channel Prometheus server metrics aggregation/export within ServerMetricsManager |
data_exporter | DataExporterType | File format exporters (CSV, JSON, Parquet) |
console_exporter | ConsoleExporterType | Terminal output exporters |
| Category | Enum | Description |
|---|
accuracy_benchmark | AccuracyBenchmarkType | Accuracy benchmark problem sets (MMLU, AIME, HellaSwag, BigBench, etc.) |
accuracy_grader | AccuracyGraderType | Grading strategies for accuracy evaluation (exact match, math, multiple choice, code execution) |
| Category | Enum | Description |
|---|
ui | UIType | UI implementations (dashboard, simple, none) |
url_selection_strategy | URLSelectionStrategy | Request distribution (round-robin) |
| Category | Enum | Description |
|---|
service | ServiceType | Core AIPerf services |
service_manager | ServiceRunType | Service orchestration. The built-in multiprocessing service-manager plugin is registered; Kubernetes execution is referenced by future-facing code paths but is not a registered service-manager plugin in this checkout. |
| Category | Enum | Description |
|---|
plot | PlotType | Chart types (scatter, histogram, timeline, etc.) |
gpu_telemetry_collector | GPUTelemetryCollectorType | GPU metric collection (DCGM, pynvml) |
| Category | Enum | Description |
|---|
communication | CommunicationBackend | ZMQ backends (IPC, TCP, dual-bind) |
communication_client | CommClientType | Socket patterns (PUB, SUB, PUSH, PULL) |
zmq_proxy | ZMQProxyType | Message routing proxies |
| Category | Enum | Description |
|---|
search_recipe | SearchRecipeType | Named presets that compile to AdaptiveSearchSweep or grid sweep parameters; selected via --search-recipe |
search_recipe_post_process | SearchRecipePostProcessType | Stateless handlers emitting derived artifacts (curves, knee points) into sweep_aggregate/ after SweepAnalyzer.compute() |
convergence_criterion | ConvergenceCriterionType | Decides when metrics have stabilized across repeated runs; selected via --convergence-mode |
search_planner | SearchPlannerType | Drives the adaptive outer loop via ask()/tell(); selected via --search-planner |
from aiperf.plugin import plugins
from aiperf.plugin.enums import PluginType, EndpointType
# Get class by name, enum, or full path
ChatEndpoint = plugins.get_class(PluginType.ENDPOINT, "chat")
ChatEndpoint = plugins.get_class(PluginType.ENDPOINT, EndpointType.CHAT)
ChatEndpoint = plugins.get_class(PluginType.ENDPOINT, "aiperf.endpoints.openai_chat:ChatEndpoint")
# Iterate plugins
for entry, cls in plugins.iter_all(PluginType.ENDPOINT):
print(f"{entry.name}: {entry.class_path}")
# Get metadata (raw dict or typed)
metadata = plugins.get_metadata("endpoint", "chat")
endpoint_meta = plugins.get_endpoint_metadata("chat") # Returns EndpointMetadata
| Function | Returns | Use Case |
|---|
get_class(category, name) | type | Get plugin class |
iter_all(category) | Iterator[tuple[PluginEntry, type]] | List all plugins |
get_metadata(category, name) | dict | Raw metadata |
get_endpoint_metadata(name) | EndpointMetadata | Typed endpoint config |
get_transport_metadata(name) | TransportMetadata | Typed transport config |
get_plot_metadata(name) | PlotMetadata | Typed plot config |
get_service_metadata(name) | ServiceMetadata | Typed service config |
get_gpu_telemetry_collector_metadata(name) | GPUTelemetryCollectorMetadata | Typed GPU collector config |
Tip
Contributing directly to AIPerf? You only need two things:
- Add your class under
src/aiperf/
- Register it in
src/aiperf/plugin/plugins.yaml
The pyproject.toml entry points and separate package install below are only needed for external/third-party plugins.
Quick Start (4 steps):
| Step | File | Action |
|---|
| 1 | my_endpoint.py | Create class extending BaseEndpoint |
| 2 | plugins.yaml | Register with class path, description, and metadata |
| 3 | pyproject.toml | Add entry point: my-package = "my_package:plugins.yaml" |
| 4 | Terminal | uv pip install -e . && aiperf plugins endpoint my_custom |
# my_package/endpoints/custom_endpoint.py
class MyCustomEndpoint(BaseEndpoint):
def format_payload(self, request_info: RequestInfo) -> dict[str, Any]:
turn = request_info.turns[-1]
texts = [content for text in turn.texts for content in text.contents if content]
return {"prompt": texts[0] if texts else ""}
def parse_response(self, response: InferenceServerResponse) -> ParsedResponse | None:
if json_obj := response.get_json():
return ParsedResponse(perf_ns=response.perf_ns, data=TextResponseData(text=json_obj.get("text", "")))
return None
# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/plugins.schema.json
# my_package/plugins.yaml
schema_version: "1.0"
endpoint:
my_custom:
class: my_package.endpoints.custom_endpoint:MyCustomEndpoint
description: Custom endpoint for my API.
metadata: { endpoint_path: /v1/generate, supports_streaming: true, produces_tokens: true, tokenizes_input: true, metrics_title: My Custom Metrics }
Note
Extend base classes (BaseEndpoint, etc.) to get logging, helpers, and default implementations. Only implement core methods.
Defines plugin categories with their protocols and metadata schemas:
# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/categories.schema.json
schema_version: "1.0"
endpoint:
protocol: aiperf.endpoints.protocols:EndpointProtocol
metadata_class: aiperf.plugin.schema.schemas:EndpointMetadata
enum: EndpointType
description: |
Endpoints define how to format requests and parse responses for different APIs.
internal: false # Set to true for infrastructure categories
Registers plugin implementations:
# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/plugins.schema.json
schema_version: "1.0"
endpoint:
chat:
class: aiperf.endpoints.openai_chat:ChatEndpoint
description: OpenAI Chat Completions endpoint.
priority: 0 # Higher priority wins conflicts
metadata:
endpoint_path: /v1/chat/completions
supports_streaming: true
produces_tokens: true
tokenizes_input: true
metrics_title: LLM Metrics
Category-specific metadata is validated against Pydantic models in aiperf.plugin.schema.schemas:
| Model | Key Fields |
|---|
EndpointMetadata | endpoint_path, supports_streaming, produces_tokens, tokenizes_input, metrics_title + optional streaming/service/multimodal/polling fields |
TransportMetadata | transport_type, url_schemes |
PlotMetadata | display_name, category |
ServiceMetadata | required, auto_start, disable_gc, replicable |
GPUTelemetryCollectorMetadata | is_local |
| Command | Output |
|---|
aiperf plugins | Installed packages with versions and plugin counts |
aiperf plugins --all | All categories with registered plugins |
aiperf plugins endpoint | All endpoint types with descriptions |
aiperf plugins endpoint chat | Details: class path, package, metadata |
aiperf plugins --validate | Validates class paths and existence |
$ aiperf plugins endpoint chat
╭───────────────── endpoint:chat ─────────────────╮
│ Type: chat │
│ Category: endpoint │
│ Package: aiperf │
│ Class: aiperf.endpoints.openai_chat:ChatEndpoint│
│ │
│ OpenAI Chat Completions endpoint. Supports │
│ multi-modal inputs and streaming responses. │
╰─────────────────────────────────────────────────╯
| Priority | Rule |
|---|
| 1 | Higher priority value wins |
| 2 | External packages beat built-in (equal priority) |
| 3 | First registered wins (with warning) |
Tip
Shadowed plugins remain accessible via full class path: plugins.get_class("endpoint", "my_pkg.endpoints:MyEndpoint")
# Runtime registration (testing)
plugins.register("endpoint", "test", TestEndpoint, priority=10)
plugins.reset_registry() # Reset to initial state
# Dynamic enum generation
MyEndpointType = plugins.create_enum(PluginType.ENDPOINT, "MyEndpointType", module=__name__)
# Validation without importing
errors = plugins.validate_all(check_class=True) # {category: [(name, error), ...]}
# Reverse lookup
name = plugins.find_registered_name(PluginType.ENDPOINT, ChatEndpoint) # "chat"
# Package metadata
pkg = plugins.get_package_metadata("aiperf") # PackageInfo(version, author, ...)
Type Safety: get_class() returns typed results (e.g., type[EndpointProtocol]) with IDE autocomplete.
| Name | Class | Description |
|---|
chat | ChatEndpoint | OpenAI Chat Completions API |
chat_embeddings | ChatEmbeddingsEndpoint | vLLM multimodal embeddings via chat API |
completions | CompletionsEndpoint | OpenAI Completions API |
cohere_rankings | CohereRankingsEndpoint | Cohere Reranking API |
embeddings | EmbeddingsEndpoint | OpenAI Embeddings API |
hf_tei_rankings | HFTeiRankingsEndpoint | HuggingFace TEI Rankings |
huggingface_generate | HuggingFaceGenerateEndpoint | HuggingFace TGI |
image_edit | ImageEditEndpoint | OpenAI Image Edit (image-to-image) API; multipart upload of reference image + prompt to /v1/images/edits. Compatible with SGLang FLUX.2 unified diffusion serving. |
image_generation | ImageGenerationEndpoint | OpenAI Image Generation API |
image_retrieval | ImageRetrievalEndpoint | Image retrieval API |
nim_embeddings | NIMEmbeddingsEndpoint | NVIDIA NIM Embeddings |
nim_rankings | NIMRankingsEndpoint | NVIDIA NIM Rankings |
responses | ResponsesEndpoint | OpenAI Responses API |
solido_rag | SolidoEndpoint | Solido RAG Pipeline |
template | TemplateEndpoint | Template for custom endpoints |
video_generation | VideoGenerationEndpoint | Text-to-video generation API |
| Name | Class | Description |
|---|
fixed_schedule | FixedScheduleStrategy | Send requests at exact timestamps |
request_rate | RequestRateStrategy | Send requests at specified rate |
user_centric_rate | UserCentricStrategy | Each session acts as separate user |
| Name | Class | Description |
|---|
constant | ConstantIntervalGenerator | Fixed intervals between requests |
poisson | PoissonIntervalGenerator | Poisson process arrivals |
gamma | GammaIntervalGenerator | Gamma distribution with tunable smoothness |
concurrency_burst | ConcurrencyBurstIntervalGenerator | Send ASAP up to concurrency limit |
| Name | Class | Description |
|---|
synthetic | SyntheticDatasetComposer | Generate synthetic conversations |
custom | CustomDatasetComposer | Load from JSONL files |
synthetic_rankings | SyntheticRankingsDatasetComposer | Generate ranking tasks |
| Name | Class | Description |
|---|
dashboard | AIPerfDashboardUI | Rich terminal dashboard |
simple | TQDMProgressUI | Simple tqdm progress bar |
none | NoUI | Headless execution |
| Name | Class | Description |
|---|
mmlu | MMLUBenchmark | Massive Multitask Language Understanding |
aime | AIMEBenchmark | American Invitational Mathematics Examination |
aime24 | AIME24Benchmark | AIME 2024 competition problems |
aime25 | AIME25Benchmark | AIME 2025 competition problems |
hellaswag | HellaSwagBenchmark | HellaSwag commonsense reasoning |
bigbench | BigBenchBenchmark | BIG-Bench benchmark tasks |
math_500 | Math500Benchmark | MATH-500 problem set |
gpqa_diamond | GPQADiamondBenchmark | GPQA Diamond graduate-level science |
lcb_codegeneration | LCBCodeGenerationBenchmark | LiveCodeBench code generation |
| Name | Class | Description |
|---|
exact_match | ExactMatchGrader | Exact string matching |
math | MathGrader | Mathematical expression evaluation |
multiple_choice | MultipleChoiceGrader | Multiple choice answer extraction |
code_execution | CodeExecutionGrader | Code execution and output comparison |
TypeNotFoundError: Type 'my_plugin' not found for category 'endpoint'.
Solutions:
- Verify the plugin is registered in
plugins.yaml
- Check the entry point is defined in
pyproject.toml
- Reinstall the package in the active environment:
uv pip install -e .
- Run
aiperf plugins --validate to check for errors
ImportError: Failed to import module for endpoint:my_plugin
Solutions:
- Verify the class path format:
module.path:ClassName
- Check all dependencies are installed
- Verify the module is importable:
python -c "import module.path"
AttributeError: Class 'MyClass' not found
Solutions:
- Verify the class name matches exactly (case-sensitive)
- Ensure the class is exported from the module
- Run
aiperf plugins --validate for detailed error
If your plugin is being shadowed by another:
- Use higher priority:
priority: 10 in plugins.yaml
- Access by full class path:
plugins.get_class("endpoint", "my_pkg.endpoints:MyEndpoint")
- Check
aiperf plugins to see which packages are loaded