Dexbotic Inference API
June 18, 2026 ยท View on GitHub
Dexbotic inference servers keep the legacy /process_frame route and add a
v1 API for policy-based VLA/VLM serving. The v1 API gives clients one stable
contract for:
- action inference through
/v1/infer - episode reset through
/v1/reset - model capability discovery through
/v1/capabilities - optional chat-style generation through
/v1/chat/completions
This lets benchmark and deployment clients switch model wrappers without rewiring HTTP routes or request schemas.
Start an inference server
Each benchmark entry still owns model loading and checkpoint configuration. For example:
python playground/benchmarks/libero/libero_dm0.py --task inference
Most project-specific launch scripts set the checkpoint path in Python before
calling exp.inference():
from playground.benchmarks.libero.libero_dm0 import DM0Exp
exp = DM0Exp()
exp.inference_config.model_name_or_path = "/path/to/checkpoint"
exp.inference_config.port = 7891
exp.inference()
The server listens on 0.0.0.0:<port> and registers both legacy and v1 routes.
DM0 realtime inference
DM0's optional realtime backend keeps the same v1 API while replacing the core
action-generation call with a Triton-backed optimized runtime. Updated on
2026-06-18, the realtime path shows about 5x core inference speedup on the
libero DM0 probe benchmark. See DM0 realtime inference
for launch instructions, benchmark numbers, timing scope, and backend-specific
constraints.
Routes
GET /health
Lightweight readiness check.
curl http://localhost:7891/health
Response:
{"status": "ok"}
GET /v1/capabilities
Returns the model's declared inference contract. Call this before wiring a new environment or client.
curl http://localhost:7891/v1/capabilities
Example response:
{
"model_family": "DM0InferenceConfig",
"vla": true,
"vlm": false,
"modalities": {
"images": {
"format": "image/{slot_index}",
"slots": [
{"slot": 1, "name": "front", "required": true},
{"slot": 2, "name": "left_wrist", "required": true},
{"slot": 3, "name": "right_wrist", "required": false}
]
},
"state": {"used": false, "required": false, "dim": null},
"prompt": {"required": true}
},
"action_spec": {
"action_dim": 7,
"chunk_size": null,
"action_mode": "absolute"
},
"max_batch_size": 1,
"sampling_defaults": {"num_steps": 10, "cfg_scale": 1.0}
}
Important fields:
modalities.images.slots: image slots expected by the policy. Slot names come fromcamera_order;nullslots are zero-padded by the model wrapper.modalities.state: whether proprio/state is used or required.action_spec.action_mode: whether returned actions are already absolute or should be interpreted as relative/delta actions by the caller.vlaandvlm: whether the server supports action inference and text generation.
POST /v1/infer
Runs one policy inference request.
Request schema:
{
"observation": {
"prompt": "pick up a cube and move it to the green point",
"images": {
"1": "<front camera base64 encoded image>",
"2": "<left wrist camera base64 encoded image>",
"3": "<right wrist camera base64 encoded image>"
},
"state": [0.0, 0.0, 0.0]
},
"sampling": {
"num_steps": 10,
"cfg_scale": 1.5,
"seed": 42
}
}
Response schema:
{
"actions": [
[0.01, 0.02, 0.03, 0.0, 0.0, 0.0, 1.0]
],
"metadata": {
"latency_ms": 58.2
}
}
Notes:
imageskeys must be 1-based numeric strings:"1","2", ...- Images are base64-encoded PNG/JPEG bytes.
stateis optional unless/v1/capabilitiessays it is required. When required, the HTTP layer only checks that thestatekey is present; concrete policies decide how to consume or validate its value.samplingis optional. Supported fields arenum_steps,cfg_scale, andseed; any other fields are ignored before dispatching to the policy.- The response always uses
actions; legacy/process_frameusesresponse.
POST /v1/reset
Marks an episode boundary.
curl -X POST http://localhost:7891/v1/reset
Response:
{"status": "ok"}
Most current policies are stateless and treat reset as a no-op. It matters for stateful policies such as memory-based models that keep cross-step or cross-episode context. Evaluators can safely call it at the beginning of every episode.
POST /v1/chat/completions
OpenAI-style chat endpoint for models whose policy declares VLM generation
support. VLA-only models return 501.
{
"model": "checkpoint-name",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe the image briefly."},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,<base64 encoded image>"
}
}
]
}
],
"temperature": 0,
"max_tokens": 16
}
For image-conditioned VLM generation, send each image as an OpenAI-compatible
image_url content part. The current server parser accepts data URLs such as
data:image/png;base64,... and combines them with the text parts before calling
policy.generate(). Pure text content is also accepted, but questions about
the visual scene should include at least one image.
Python client
Use DexClient for both legacy and v1 inference.
import cv2
from dexbotic.client import DexClient
client = DexClient(
base_url="http://localhost:7891",
api_style="v1",
use_delta=False,
sampling={"num_steps": 10, "cfg_scale": 1.5},
)
obs = {
"image": [
cv2.cvtColor(cv2.imread("front.png"), cv2.COLOR_BGR2RGB),
cv2.cvtColor(cv2.imread("left_wrist.png"), cv2.COLOR_BGR2RGB),
cv2.cvtColor(cv2.imread("right_wrist.png"), cv2.COLOR_BGR2RGB),
],
"state": [0.0] * 7,
}
client.reset()
action = client.act(obs, "put the moka pot on the stove")
For legacy clients:
client = DexClient(base_url="http://localhost:7891", api_style="legacy")
use_delta is a client-side post-processing option. If it is true, the client
accumulates returned actions as deltas against the previous action. Choose this
based on the environment's control mode and the model's action_mode.
Direct HTTP example
import base64
import requests
with open("front.png", "rb") as f:
image_1 = base64.b64encode(f.read()).decode("utf-8")
with open("left_wrist.png", "rb") as f:
image_2 = base64.b64encode(f.read()).decode("utf-8")
with open("right_wrist.png", "rb") as f:
image_3 = base64.b64encode(f.read()).decode("utf-8")
payload = {
"observation": {
"prompt": "pick up the cube",
"images": {
"1": image_1,
"2": image_2,
"3": image_3,
},
"state": [0.0] * 7,
}
}
resp = requests.post("http://localhost:7891/v1/infer", json=payload, timeout=30)
resp.raise_for_status()
print(resp.json()["actions"])
Benchmark configuration
Dexbotic benchmark configs select the protocol with api_style.
base_url: "http://host:7891"
api_style: v1
Use api_style: legacy to keep the old /process_frame route.
Some environments also need action semantics configured. For example, ManiSkill2 uses delta end-effector control and its VLA agent requires:
use_delta: true
Always check /v1/capabilities for action_mode and state.required before
reusing a checkpoint in a different benchmark.
Implementing v1 for a model
To expose /v1/infer, an inference config must build a policy:
class MyInferenceConfig(InferenceConfig):
def _build_policy(self):
return MyPolicy(
model=self.model,
tokenizer=self.tokenizer,
norm_stats=self.norm_stats,
camera_order=self.camera_order,
)
The policy should subclass BasePolicy and implement select_action():
from dexbotic.policy.base_policy import BasePolicy
from dexbotic.policy.types import ActionOutput
class MyPolicy(BasePolicy):
action_mode = "relative"
state_used = False
state_required = False
def select_action(self, observation, sampling_config=None):
# observation contains prompt, internal image/0, image/1, ..., and optional state.
# HTTP image keys are public 1-based slots: "1", "2", ...
actions = ...
return [ActionOutput(actions=actions)]
Policy declarations drive /v1/capabilities:
action_mode:absolute,relative, orunknownstate_used: whether the model consumes statestate_required: whether requests must include astatekeystate_dim: expected state dimension if knownmax_batch_size: maximum supported batch sizesupports_vlm(): whether/v1/chat/completionsis available
If a model has episode-level memory, override reset().
Model notes
Current policy wrappers cover the main VLA inference paths used by the benchmark integration. The table below shows the default declarations made by the current serving wrappers; these values are not intrinsic properties of a model architecture.
| Model family | Default v1 action mode | State |
|---|---|---|
| Pi0 / Pi05-style policy | absolute | optional, used when provided |
| DM0 | absolute | not required by current policy |
| OFT / OFT-discrete | relative | not required by current policy |
| CogACT | relative | not required |
| DiscreteVLA / GR00T-N1 | relative | not required |
| MemVLA | relative | reset is meaningful for memory state |
action_mode is a serving contract: it declares how the current policy wrapper
interprets and returns actions after model-specific post-processing. It is not
read automatically from the checkpoint. OFT and CogACT default to relative
because the existing legacy inference paths and benchmark adapters have used
their outputs with relative/delta semantics. If a checkpoint was trained and
post-processed to produce absolute actions, the policy declaration or benchmark
configuration must be changed accordingly.
A checkpoint still needs to match the environment it is evaluated on: camera setup, action space, normalization/denormalization statistics, gripper convention, and control mode are checkpoint- and benchmark-specific.
Navigation-oriented models such as NaviLA, MuVLA, and UniNaVid use different interaction patterns and were not folded into this VLA policy interface in this round. They should be integrated with a separate policy contract when their inputs, memory reset semantics, and output actions are standardized.
Compatibility
/process_frameis unchanged for existing clients.- v1 uses JSON and base64 images instead of multipart form data.
- The server runs Flask with
threaded=False; concurrent benchmark jobs are serialized by the service process. seedinsamplingsets Python, NumPy, and Torch random seeds for that request. Avoid relying on it for concurrent multi-threaded serving.