AIPerf Plugin System

May 15, 2026 · View on GitHub

The AIPerf plugin system provides a flexible, extensible architecture for customizing benchmark behavior. It uses YAML-based configuration with lazy loading, priority-based conflict resolution, and dynamic enum generation.

Overview
- Terminology
- Key Components
Architecture
Plugin Categories
Using Plugins
Creating Custom Plugins
Plugin Configuration
CLI Commands
Advanced Topics

Overview

The plugin system enables:

Extensibility: Add custom endpoints, exporters, and timing strategies without modifying core code
Lazy Loading: Classes load on first access, avoiding circular imports
Conflict Resolution: Higher priority plugins override lower priority ones
Type Safety: Auto-generated enums provide IDE autocomplete
Validation: Validate plugins without importing them

Terminology

Term	Description	Code Type
Registry	Global singleton holding all plugins	`_PluginRegistry`
Package	Python package providing plugins	`PackageInfo`
Manifest	`plugins.yaml` declaring plugins	`PluginsManifest`
Category	Plugin type (e.g., `endpoint`, `transport`)	`PluginType` enum
Entry	Single registered plugin (name, class_path, priority, metadata)	`PluginEntry`
Class	Python class implementing a plugin (lazy-loaded)	`type`
Metadata	Typed configuration (e.g., `EndpointMetadata`)	Pydantic model

Hierarchy:

Registry (singleton)
└── Package (1+) ─── discovered via entry points
    └── Manifest (1+ per package) ─── plugins.yaml files
        └── Category (1+)
            └── Entry (1+) ─── PluginEntry
                ├── Class ─── lazy-loaded Python class
                └── Metadata ─── optional typed config

Key Components

Component	File	Purpose
Plugin Registry	`src/aiperf/plugin/plugins.py`	Singleton managing discovery and loading
Plugin Entry	`src/aiperf/plugin/types.py`	Lazy-loading entry with metadata
Categories	`src/aiperf/plugin/categories.yaml`	Category definitions with protocols
Built-in Plugins	`src/aiperf/plugin/plugins.yaml`	Built-in plugin registrations
Schemas	`src/aiperf/plugin/schema/schemas.py`	Pydantic models for validation
Enums	`src/aiperf/plugin/enums.py`	Auto-generated enums from registry
CLI	`src/aiperf/cli_commands/plugins.py`	Plugin exploration commands

Architecture

Discovery Flow

Entry Points → plugins.yaml → Pydantic Validation → Registry
                                                      ↓
                              get_class() → Import Module → Cache

Phase	Action
1. Discovery	Scan `aiperf.plugins` entry points for `plugins.yaml` files
2. Loading	Parse YAML, validate with Pydantic, register with conflict resolution
3. Access	`get_class()` imports module, caches class for reuse

Registry Singleton Pattern

The plugin registry follows the singleton pattern with module-level exports:

from aiperf.plugin import plugins
from aiperf.plugin.enums import PluginType

# Get a plugin class by name
EndpointClass = plugins.get_class(PluginType.ENDPOINT, "chat")

# Iterate all plugins in a category
for entry, cls in plugins.iter_all(PluginType.ENDPOINT):
    print(f"{entry.name}: {entry.description}")

Plugin Categories

AIPerf supports 33 plugin categories organized by function, including api_router and public_dataset_loader:

Timing Categories

Category	Enum	Description
`timing_strategy`	`TimingMode`	Request scheduling strategies (fixed schedule, request rate, user-centric)
`arrival_pattern`	`ArrivalPattern`	Inter-arrival time distributions (constant, Poisson, gamma, concurrency burst)
`ramp`	`RampType`	Value ramping strategies (linear, exponential, Poisson)

Dataset Categories

Category	Enum	Description
`dataset_backing_store`	`DatasetBackingStoreType`	Server-side dataset storage
`dataset_client_store`	`DatasetClientStoreType`	Worker-side dataset access
`dataset_sampler`	`DatasetSamplingStrategy`	Sampling strategies (random, sequential, shuffle)
`dataset_composer`	`ComposerType`	Dataset generation (synthetic, custom, rankings)
`custom_dataset_loader`	`CustomDatasetType`	JSONL format loaders
`public_dataset_loader`	`PublicDatasetType`	Shared benchmark dataset fetchers (HTTP, HuggingFace)

Endpoint and Transport Categories

Category	Enum	Description
`api_router`	`APIRouterType`	Lifecycle-managed HTTP/WebSocket routers exposed via `BaseRouter`
`endpoint`	`EndpointType`	API endpoint implementations (chat, completions, embeddings, etc.)
`transport`	`TransportType`	Network transport (HTTP via aiohttp)

Processing Categories

Category	Enum	Description
`record_processor`	`RecordProcessorType`	Per-record metric computation
`results_processor`	`ResultsProcessorType`	Aggregated results computation
`gpu_telemetry_processor`	`GPUTelemetryProcessorType`	Side-channel GPU telemetry aggregation/export within `GPUTelemetryManager`
`server_metrics_processor`	`ServerMetricsProcessorType`	Side-channel Prometheus server metrics aggregation/export within `ServerMetricsManager`
`data_exporter`	`DataExporterType`	File format exporters (CSV, JSON, Parquet)
`console_exporter`	`ConsoleExporterType`	Terminal output exporters

Accuracy Categories

Category	Enum	Description
`accuracy_benchmark`	`AccuracyBenchmarkType`	Accuracy benchmark problem sets (MMLU, AIME, HellaSwag, BigBench, etc.)
`accuracy_grader`	`AccuracyGraderType`	Grading strategies for accuracy evaluation (exact match, math, multiple choice, code execution)

UI and Selection Categories

Category	Enum	Description
`ui`	`UIType`	UI implementations (dashboard, simple, none)
`url_selection_strategy`	`URLSelectionStrategy`	Request distribution (round-robin)

Service Categories

Category	Enum	Description
`service`	`ServiceType`	Core AIPerf services
`service_manager`	`ServiceRunType`	Service orchestration. The built-in `multiprocessing` service-manager plugin is registered; Kubernetes execution is referenced by future-facing code paths but is not a registered service-manager plugin in this checkout.

Visualization and Telemetry Categories

Category	Enum	Description
`plot`	`PlotType`	Chart types (scatter, histogram, timeline, etc.)
`gpu_telemetry_collector`	`GPUTelemetryCollectorType`	GPU metric collection (DCGM, pynvml)

Infrastructure Categories (Internal)

Category	Enum	Description
`communication`	`CommunicationBackend`	ZMQ backends (IPC, TCP, dual-bind)
`communication_client`	`CommClientType`	Socket patterns (PUB, SUB, PUSH, PULL)
`zmq_proxy`	`ZMQProxyType`	Message routing proxies

Sweep / Adaptive Search Categories

Category	Enum	Description
`search_recipe`	`SearchRecipeType`	Named presets that compile to AdaptiveSearchSweep or grid sweep parameters; selected via `--search-recipe`
`search_recipe_post_process`	`SearchRecipePostProcessType`	Stateless handlers emitting derived artifacts (curves, knee points) into `sweep_aggregate/` after `SweepAnalyzer.compute()`
`convergence_criterion`	`ConvergenceCriterionType`	Decides when metrics have stabilized across repeated runs; selected via `--convergence-mode`
`search_planner`	`SearchPlannerType`	Drives the adaptive outer loop via `ask()`/`tell()`; selected via `--search-planner`

Using Plugins

from aiperf.plugin import plugins
from aiperf.plugin.enums import PluginType, EndpointType

# Get class by name, enum, or full path
ChatEndpoint = plugins.get_class(PluginType.ENDPOINT, "chat")
ChatEndpoint = plugins.get_class(PluginType.ENDPOINT, EndpointType.CHAT)
ChatEndpoint = plugins.get_class(PluginType.ENDPOINT, "aiperf.endpoints.openai_chat:ChatEndpoint")

# Iterate plugins
for entry, cls in plugins.iter_all(PluginType.ENDPOINT):
    print(f"{entry.name}: {entry.class_path}")

# Get metadata (raw dict or typed)
metadata = plugins.get_metadata("endpoint", "chat")
endpoint_meta = plugins.get_endpoint_metadata("chat")  # Returns EndpointMetadata

Function	Returns	Use Case
`get_class(category, name)`	`type`	Get plugin class
`iter_all(category)`	`Iterator[tuple[PluginEntry, type]]`	List all plugins
`get_metadata(category, name)`	`dict`	Raw metadata
`get_endpoint_metadata(name)`	`EndpointMetadata`	Typed endpoint config
`get_transport_metadata(name)`	`TransportMetadata`	Typed transport config
`get_plot_metadata(name)`	`PlotMetadata`	Typed plot config
`get_service_metadata(name)`	`ServiceMetadata`	Typed service config
`get_gpu_telemetry_collector_metadata(name)`	`GPUTelemetryCollectorMetadata`	Typed GPU collector config

Creating Custom Plugins

Tip

Contributing directly to AIPerf? You only need two things:

Add your class under src/aiperf/
Register it in src/aiperf/plugin/plugins.yaml

The pyproject.toml entry points and separate package install below are only needed for external/third-party plugins.

Quick Start (4 steps):

Step	File	Action
1	`my_endpoint.py`	Create class extending `BaseEndpoint`
2	`plugins.yaml`	Register with class path, description, and metadata
3	`pyproject.toml`	Add entry point: `my-package = "my_package:plugins.yaml"`
4	Terminal	`uv pip install -e . && aiperf plugins endpoint my_custom`

Minimal Endpoint Example

# my_package/endpoints/custom_endpoint.py
class MyCustomEndpoint(BaseEndpoint):
    def format_payload(self, request_info: RequestInfo) -> dict[str, Any]:
        turn = request_info.turns[-1]
        texts = [content for text in turn.texts for content in text.contents if content]
        return {"prompt": texts[0] if texts else ""}

    def parse_response(self, response: InferenceServerResponse) -> ParsedResponse | None:
        if json_obj := response.get_json():
            return ParsedResponse(perf_ns=response.perf_ns, data=TextResponseData(text=json_obj.get("text", "")))
        return None

# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/plugins.schema.json
# my_package/plugins.yaml
schema_version: "1.0"
endpoint:
  my_custom:
    class: my_package.endpoints.custom_endpoint:MyCustomEndpoint
    description: Custom endpoint for my API.
    metadata: { endpoint_path: /v1/generate, supports_streaming: true, produces_tokens: true, tokenizes_input: true, metrics_title: My Custom Metrics }

Note

Extend base classes (BaseEndpoint, etc.) to get logging, helpers, and default implementations. Only implement core methods.

Plugin Configuration

categories.yaml Schema

Defines plugin categories with their protocols and metadata schemas:

# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/categories.schema.json
schema_version: "1.0"

endpoint:
  protocol: aiperf.endpoints.protocols:EndpointProtocol
  metadata_class: aiperf.plugin.schema.schemas:EndpointMetadata
  enum: EndpointType
  description: |
    Endpoints define how to format requests and parse responses for different APIs.
  internal: false  # Set to true for infrastructure categories

plugins.yaml Schema

Registers plugin implementations:

# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/plugins.schema.json
schema_version: "1.0"

endpoint:
  chat:
    class: aiperf.endpoints.openai_chat:ChatEndpoint
    description: OpenAI Chat Completions endpoint.
    priority: 0  # Higher priority wins conflicts
    metadata:
      endpoint_path: /v1/chat/completions
      supports_streaming: true
      produces_tokens: true
      tokenizes_input: true
      metrics_title: LLM Metrics

Metadata Schemas

Category-specific metadata is validated against Pydantic models in aiperf.plugin.schema.schemas:

Model	Key Fields
`EndpointMetadata`	`endpoint_path`, `supports_streaming`, `produces_tokens`, `tokenizes_input`, `metrics_title` + optional streaming/service/multimodal/polling fields
`TransportMetadata`	`transport_type`, `url_schemes`
`PlotMetadata`	`display_name`, `category`
`ServiceMetadata`	`required`, `auto_start`, `disable_gc`, `replicable`
`GPUTelemetryCollectorMetadata`	`is_local`

CLI Commands

Command	Output
`aiperf plugins`	Installed packages with versions and plugin counts
`aiperf plugins --all`	All categories with registered plugins
`aiperf plugins endpoint`	All endpoint types with descriptions
`aiperf plugins endpoint chat`	Details: class path, package, metadata
`aiperf plugins --validate`	Validates class paths and existence

$ aiperf plugins endpoint chat
╭───────────────── endpoint:chat ─────────────────╮
│ Type: chat                                      │
│ Category: endpoint                              │
│ Package: aiperf                                 │
│ Class: aiperf.endpoints.openai_chat:ChatEndpoint│
│                                                 │
│ OpenAI Chat Completions endpoint. Supports      │
│ multi-modal inputs and streaming responses.     │
╰─────────────────────────────────────────────────╯

Advanced Topics

Conflict Resolution

Priority	Rule
1	Higher `priority` value wins
2	External packages beat built-in (equal priority)
3	First registered wins (with warning)

Tip

Shadowed plugins remain accessible via full class path: plugins.get_class("endpoint", "my_pkg.endpoints:MyEndpoint")

API Reference

# Runtime registration (testing)
plugins.register("endpoint", "test", TestEndpoint, priority=10)
plugins.reset_registry()  # Reset to initial state

# Dynamic enum generation
MyEndpointType = plugins.create_enum(PluginType.ENDPOINT, "MyEndpointType", module=__name__)

# Validation without importing
errors = plugins.validate_all(check_class=True)  # {category: [(name, error), ...]}

# Reverse lookup
name = plugins.find_registered_name(PluginType.ENDPOINT, ChatEndpoint)  # "chat"

# Package metadata
pkg = plugins.get_package_metadata("aiperf")  # PackageInfo(version, author, ...)

Type Safety: get_class() returns typed results (e.g., type[EndpointProtocol]) with IDE autocomplete.

Built-in Plugins Reference

Endpoints

Name	Class	Description
`chat`	`ChatEndpoint`	OpenAI Chat Completions API
`chat_embeddings`	`ChatEmbeddingsEndpoint`	vLLM multimodal embeddings via chat API
`completions`	`CompletionsEndpoint`	OpenAI Completions API
`cohere_rankings`	`CohereRankingsEndpoint`	Cohere Reranking API
`embeddings`	`EmbeddingsEndpoint`	OpenAI Embeddings API
`hf_tei_rankings`	`HFTeiRankingsEndpoint`	HuggingFace TEI Rankings
`huggingface_generate`	`HuggingFaceGenerateEndpoint`	HuggingFace TGI
`image_edit`	`ImageEditEndpoint`	OpenAI Image Edit (image-to-image) API; multipart upload of reference image + prompt to `/v1/images/edits`. Compatible with SGLang FLUX.2 unified diffusion serving.
`image_generation`	`ImageGenerationEndpoint`	OpenAI Image Generation API
`image_retrieval`	`ImageRetrievalEndpoint`	Image retrieval API
`nim_embeddings`	`NIMEmbeddingsEndpoint`	NVIDIA NIM Embeddings
`nim_rankings`	`NIMRankingsEndpoint`	NVIDIA NIM Rankings
`responses`	`ResponsesEndpoint`	OpenAI Responses API
`solido_rag`	`SolidoEndpoint`	Solido RAG Pipeline
`template`	`TemplateEndpoint`	Template for custom endpoints
`video_generation`	`VideoGenerationEndpoint`	Text-to-video generation API

Timing Strategies

Name	Class	Description
`fixed_schedule`	`FixedScheduleStrategy`	Send requests at exact timestamps
`request_rate`	`RequestRateStrategy`	Send requests at specified rate
`user_centric_rate`	`UserCentricStrategy`	Each session acts as separate user

Arrival Patterns

Name	Class	Description
`constant`	`ConstantIntervalGenerator`	Fixed intervals between requests
`poisson`	`PoissonIntervalGenerator`	Poisson process arrivals
`gamma`	`GammaIntervalGenerator`	Gamma distribution with tunable smoothness
`concurrency_burst`	`ConcurrencyBurstIntervalGenerator`	Send ASAP up to concurrency limit

Dataset Composers

Name	Class	Description
`synthetic`	`SyntheticDatasetComposer`	Generate synthetic conversations
`custom`	`CustomDatasetComposer`	Load from JSONL files
`synthetic_rankings`	`SyntheticRankingsDatasetComposer`	Generate ranking tasks

UI Types

Name	Class	Description
`dashboard`	`AIPerfDashboardUI`	Rich terminal dashboard
`simple`	`TQDMProgressUI`	Simple tqdm progress bar
`none`	`NoUI`	Headless execution

Accuracy Benchmarks

Name	Class	Description
`mmlu`	`MMLUBenchmark`	Massive Multitask Language Understanding
`aime`	`AIMEBenchmark`	American Invitational Mathematics Examination
`aime24`	`AIME24Benchmark`	AIME 2024 competition problems
`aime25`	`AIME25Benchmark`	AIME 2025 competition problems
`hellaswag`	`HellaSwagBenchmark`	HellaSwag commonsense reasoning
`bigbench`	`BigBenchBenchmark`	BIG-Bench benchmark tasks
`math_500`	`Math500Benchmark`	MATH-500 problem set
`gpqa_diamond`	`GPQADiamondBenchmark`	GPQA Diamond graduate-level science
`lcb_codegeneration`	`LCBCodeGenerationBenchmark`	LiveCodeBench code generation

Accuracy Graders

Name	Class	Description
`exact_match`	`ExactMatchGrader`	Exact string matching
`math`	`MathGrader`	Mathematical expression evaluation
`multiple_choice`	`MultipleChoiceGrader`	Multiple choice answer extraction
`code_execution`	`CodeExecutionGrader`	Code execution and output comparison

Troubleshooting

Plugin Not Found

TypeNotFoundError: Type 'my_plugin' not found for category 'endpoint'.

Solutions:

Verify the plugin is registered in plugins.yaml
Check the entry point is defined in pyproject.toml
Reinstall the package in the active environment: uv pip install -e .
Run aiperf plugins --validate to check for errors

Module Import Errors

ImportError: Failed to import module for endpoint:my_plugin

Solutions:

Verify the class path format: module.path:ClassName
Check all dependencies are installed
Verify the module is importable: python -c "import module.path"

Class Not Found

AttributeError: Class 'MyClass' not found

Solutions:

Verify the class name matches exactly (case-sensitive)
Ensure the class is exported from the module
Run aiperf plugins --validate for detailed error

Conflict Resolution Issues

If your plugin is being shadowed by another:

Use higher priority: priority: 10 in plugins.yaml
Access by full class path: plugins.get_class("endpoint", "my_pkg.endpoints:MyEndpoint")
Check aiperf plugins to see which packages are loaded