IPP Plugins Reference
June 28, 2026 · View on GitHub
Table of Contents
- Overview
- Plugin Categories
- Processing Pipeline
- Request Handling Plugins
- Profile Picker Plugins
- Model Selector Plugins
- Data Layer Plugins
- Response Handling Plugins
- Pre- and Post-Processors
- Configuration Example
- References
Overview
All IPP behavior is implemented as plugins. The framework defines the pipeline and the extension
points; concrete behavior lives in plugins that are selected, parameterized, and ordered in a
PayloadProcessorConfig. Each plugin has a registered type name (a constant in
its source) and a configurable instance name; because instances are named, the same plugin type
can be configured more than once with different parameters.
This document lists every in-tree plugin, grouped by category, with its registered type name, purpose, parameters, and a link to its source. For the conceptual model — profiles, the ext-proc lifecycle, model selection, and the data layer — see the Architecture document.
Plugin Categories
A plugin belongs to exactly one category, determined by the framework interface it implements. The config loader routes each plugin to the right extension point based on that interface.
| Category | Purpose | When executed |
|---|---|---|
| Request Handling | Inspect and mutate the request (headers, body) before it is routed. | During a profile's request stage, before the model server is reached. |
| Response Handling | Inspect and mutate the response on its way back to the client. | During a profile's response stage, after the model server replies. |
| Model Selector — Filter | Remove candidate models that cannot serve the request. | First phase of the ModelSelector pipeline (inside model-selector). |
| Model Selector — Scorer | Score the remaining candidate models, conventionally in [0, 1]. | Second phase of the ModelSelector pipeline; scores combine via per-reference weight. |
| Model Selector — Picker | Select exactly one final model from the scored candidates. | Third phase of the ModelSelector pipeline; exactly one picker runs. |
| Profile Picker | Choose which profile runs for a request. | Globally, before the profile's request plugins. |
| Data Layer | Maintain cross-request state (collectors, extractors, datasources) consumed by Scorers (and Filters). | Continuously in the background, decoupled from any single request. |
Processing Pipeline
IPP executes plugins in a fixed sequence of stages:
ProfilePicker → Profile Request Plugins → [Model Server] → Profile Response Plugins
(The config API also defines global preProcessing / postProcessing stages; these are reserved
extension points and are not yet invoked by the request path — see Architecture.)
Plugins are declared once under the top-level plugins list of the PayloadProcessorConfig
(each with a type, an optional name, and optional parameters) and then referenced by name
elsewhere in the config via pluginRef:
profiles[].plugins.request[]andprofiles[].plugins.response[]reference request- and response-handling plugins. A profile'srequestlist may also reference model-selector plugins (Filter / Scorer / Picker); the loader routes each reference by the interface the plugin implements. Scorer references carry aweight.profilePicker.pluginRefreferences the profile picker. When exactly one profile is defined and no picker is configured,single-profile-pickeris enabled automatically.preProcessing.plugins[]andpostProcessing.plugins[]reference pre- and post-processors.datalayer.collectors[],datalayer.extractors[], anddatalayer.datasources[]reference data-layer plugins. These are not part of any profile's request list.
For the conceptual model behind profiles, model selection, and the data layer, see Architecture. For the full configuration schema, see Configuration.
Request Handling Plugins
Request-handling plugins implement the RequestProcessor interface and process the request body and
headers before routing.
body-field-to-header
Extracts a single field from the JSON request body and sets its value as an HTTP header. If the
field is absent or empty, the plugin records a metric and skips without error. This is the generic
building block behind model-aware routing — for example, copying the model body field into the
X-Gateway-Model-Name header.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
fieldName | string | yes | Name of the request-body field to extract. |
headerName | string | yes | Name of the HTTP header to set with the extracted value. |
Source: pkg/framework/plugins/requesthandling/bodyfieldtoheader/
base-model-to-header
Maps the request's model name — including LoRA adapter names — to its base model and writes the
result to the X-Gateway-Base-Model-Name routing header. The adapter-to-base-model mapping is
maintained by a ConfigMap reconciler that watches IPP-managed model-mapping ConfigMaps, so the
mapping updates at runtime without restarts. If the request has no model field the plugin skips; if
the model is neither a known adapter nor a registered base model, the header is set to the empty
string. This plugin powers multi-pool routing.
Parameters: None. The plugin is wired to the controller-runtime client and reconciler via its plugin handle; the model mappings come from labeled ConfigMaps (see Configuration), not from plugin parameters.
Source: pkg/framework/plugins/requesthandling/basemodelextractor/
model-selector
Entry point for the ModelSelector framework. When present in a profile's request list, it runs
the Filter → Score → Pick pipeline over the candidate models in the datastore, then writes the
selected model name back into the request body's model field so the rest of the pipeline proceeds
as if the client had requested that model directly. The Filter, Scorer, and Picker plugins for this
profile are declared in the same profile request list and wired into the selector by the config
loader; if no picker is referenced, max-score-picker is used by default. If no
candidate models are available, the plugin returns an error.
Parameters: None. The pipeline is assembled from the profile's other model-selector references, not from parameters on this plugin.
Source: pkg/framework/plugins/requesthandling/modelselector/
Profile Picker Plugins
A profile picker chooses which profile runs for a request. It implements the ProfilePicker
interface and is referenced via the top-level profilePicker field.
single-profile-picker
Selects the single configured profile for every request. It is enabled by default when exactly one profile is defined and no profile picker is configured, so you typically do not declare it explicitly. Useful when an IPP deployment runs a single processing path.
Parameters: None.
Source: README
Model Selector Plugins
Model-selector plugins implement the ModelSelector framework's Filter, Scorer, and Picker
interfaces. They are referenced inside a profile's request list alongside the
model-selector plugin, and the loader routes each reference to the correct phase
by interface. See the ModelSelector proposal for the framework design.
Filters
Filters remove candidate models that cannot serve a request; if filtering leaves zero candidates the framework returns an error to the client.
There are no in-tree filter plugins. Filtering is a framework extension point — implement the
Filter interface and register it to add one (see Creating a Plugin).
Scorers
Scorers assign each remaining candidate model a score, conventionally in [0, 1]. Multiple scorers
combine via the per-reference weight set in the profile (a scorer reference requires a
weight).
cost-scorer
Scores candidate models by price so that cheaper models score higher. Each model carries a price
attribute (USD per 1M tokens); the score is computed with inverted sum normalization,
1 - price / sum(prices). A single candidate receives a neutral score of 0.5, and when every
candidate's price is zero all candidates receive 1.0. Use it for cost-aware model selection.
Parameters: None. Prices are read from each model's price attribute in the datastore, not from
plugin parameters.
Source: pkg/framework/plugins/modelselector/scorer/costaware/
Note
cost-scorer ships in-tree as a reference implementation but is not registered in the default
runner. To use it, register its factory in registerInTreePlugins (see
Creating a Plugin) or register it from your own runner build.
inflight-requests-scorer
Scores candidate models by current in-flight request load so that the least-loaded model scores
highest. With per-model in-flight counts count, the score is (max - count) / (max - min); the
least-loaded model scores 1.0 and the most-loaded scores 0.0. Models with no in-flight-request
attribute are treated as idle (zero), and if all candidates share the same count they all score
1.0. It consumes the in-flight counts produced by
request-metadata-extractor, so configure that data-layer extractor
alongside this scorer.
Parameters: None. In-flight counts are read from each model's request-metadata attribute in the
datastore.
Source: pkg/framework/plugins/modelselector/scorer/inflightrequests/
Pickers
A picker selects the single final model from the scored candidates. Exactly one picker runs per
model-selection profile; if none is referenced, max-score-picker is added by
default.
max-score-picker
Selects the candidate with the highest score, shuffling first so that ties are broken at random. It maximizes adherence to the scoring objective (e.g. lowest cost or lowest load) but is susceptible to hot-spotting when many concurrent requests produce identical scores for the same model.
Parameters: None.
Source: README
random-picker
Selects a candidate uniformly at random, ignoring all scores. It gives a strictly uniform load distribution — immune to hot-spotting, but unable to leverage cost or load signals.
Parameters: None.
Source: README
weighted-random-picker
Selects a candidate at random with probability proportional to its score, using the A-Res
(Algorithm for Reservoir Sampling) algorithm for mathematically correct weighted sampling. It
balances the trade-off between max-score-picker and random-picker, favoring higher-scoring models
while retaining exploration to avoid extreme hot-spotting. If every candidate scores zero or less, it
falls back to random-picker for uniform selection.
Parameters: None.
Source: README
Data Layer Plugins
Data-layer plugins maintain cross-request state consumed by Scorers (and Filters). They run
continuously in the background (extractors on request/response events, collectors on a timer,
datasources as watchers), decoupled from any single request, and are referenced under the top-level
datalayer section as collectors, extractors, or datasources — never in a profile's
request list. See Data Layer for the conceptual model.
request-metadata-extractor
An extractor that tracks in-flight request counts and token sums per model. On each request event
it increments the model's request count and adds the request's max_tokens to its token sum; on the
corresponding response event it decrements both (flooring at zero). The result is written to each
model's request-metadata attribute, which inflight-requests-scorer
consumes. Reference it under datalayer.extractors.
Parameters: None. The extractor is wired to the shared datastore via its plugin handle.
Source: pkg/framework/plugins/datalayer/requestmetadata/
model-config-datasource
A datasource that imports the set of known model names into the datastore from a JSON config
file, keeping the datastore in sync as the file changes. It watches the file's parent directory (to
handle atomic, rename-based replacements such as Kubernetes ConfigMap remounts), registers every
listed model, and removes any datastore model no longer present in the file. This populates the
candidate-model set that model-selector reads. Reference it under
datalayer.datasources.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
modelsPath | string | yes | Path to a JSON file with the schema {"models": [{"name": "..."}]}. Must be an existing file (not a directory). |
Source: pkg/framework/plugins/datalayer/modelconfigcollector/
Note
model-config-datasource ships in-tree as a reference implementation but is not registered in the
default runner. To use it, register its factory in registerInTreePlugins (see
Creating a Plugin) or register it from your own runner build.
Response Handling Plugins
Response-handling plugins implement the ResponseProcessor interface and run during a profile's
response stage. There are no in-tree response-handling plugins — this is a framework
extension point. To add one, implement ResponseProcessor and register it (see
Creating a Plugin).
Pre- and Post-Processors
PreProcessing and PostProcessing are reserved global extension points in the config API (before profile selection and after the response plugins). They are not yet invoked by the request path, and there are no in-tree pre- or post-processors.
Configuration Example
A complete PayloadProcessorConfig that performs cost- and load-aware model selection. Plugins are
declared once under the top-level plugins list and referenced by name. The model-selection profile
references model-selector together with two weighted scorers and the weighted-random-picker, plus
the two header plugins. The request-metadata-extractor and model-config-datasource are wired under
the top-level datalayer section — not in the profile's request list.
Note
This example uses cost-scorer and model-config-datasource, which are in-tree but not registered
in the default runner (see their notes above). Register their factories before applying this config,
or drop them to run with the default plugin set.
apiVersion: llm-d.ai/v1alpha1
kind: PayloadProcessorConfig
plugins:
- type: body-field-to-header
parameters:
fieldName: model
headerName: X-Gateway-Model-Name
- type: base-model-to-header
- type: model-selector
- type: cost-scorer
- type: inflight-requests-scorer
- type: weighted-random-picker
- type: request-metadata-extractor
- type: model-config-datasource
parameters:
modelsPath: /etc/ipp/models.json
profiles:
- name: model-selection
plugins:
request:
- pluginRef: body-field-to-header
- pluginRef: base-model-to-header
- pluginRef: model-selector
- pluginRef: cost-scorer
weight: 1.0
- pluginRef: inflight-requests-scorer
weight: 2.0
- pluginRef: weighted-random-picker
datalayer:
extractors:
- pluginRef: request-metadata-extractor
datasources:
- pluginRef: model-config-datasource
With a single profile and no profilePicker configured, IPP auto-enables
single-profile-picker. The model-config-datasource populates the
candidate models, request-metadata-extractor maintains their in-flight counts, and the two scorers
combine by weight (here the load signal is weighted twice the cost signal) before
weighted-random-picker chooses the final model.
For the full schema, Helm values, ConfigMaps, and proxy integration, see Configuration.
References
- Architecture — The conceptual model: ext-proc, profiles, model selection, and the data layer.
- Configuration — Full configuration reference for the
PayloadProcessorConfigAPI. - Creating a Plugin — Tutorial for writing and registering a custom plugin.
- ModelSelector proposal — Design of the model-selection framework.