DeepDetect Python CLI Specification

June 12, 2026 · View on GitHub

Summary

The Python CLI provides ready-to-use image training and inference workflows on top of the in-process deepdetect bindings. Version 1 is intentionally scoped to two model profiles:

yolox: object detection with bounding boxes.
segformer: semantic segmentation.

The canonical command shape is task first:

deepdetect train yolox ...
deepdetect train segformer ...
deepdetect infer yolox ...
deepdetect infer segformer ...
deepdetect job status RUN_DIR
deepdetect inspect models

All commands are discoverable through --help. Long-running training emits structured events that are easy for humans to watch and easy for agents to parse. YAML config files are supported, and explicit CLI flags override config values.

Configuration

Each train and inference command accepts:

--config CONFIG.yaml
--set key=value
--output-format jsonl|json|text

Configuration precedence is:

Model profile defaults.
YAML config file.
CLI flags.
--set overrides.

YAML is intended for repeatable runs. CLI flags are intended for discoverable interactive use and agent function-call arguments. Config files and --set map to top-level CLI option names such as base_lr=0.001, iter_size=2, batch_size=4, or visdom=true. Training YAML files may also define an augmentation mapping. This mapping is deep-merged into the selected model profile's training augmentation defaults without adding one command-line flag per augmentation parameter. Class weighting is available as a YAML-only top-level class_weights list.

Example default-style configs are provided next to this document:

yolox-default.yaml
segformer-default.yaml

They include training keys plus inference-only keys that are ignored by training commands. Replace the dataset and model paths before use:

deepdetect train yolox --config bindings/python/deepdetect/cli/yolox-default.yaml
deepdetect infer yolox image.jpg --config bindings/python/deepdetect/cli/yolox-default.yaml

Training Commands

Required inputs may come from CLI flags or config:

deepdetect train yolox \
  --train-data train.txt \
  --test-data test0.txt test1.txt \
  --weights yolox-nano_cls2.pt \
  --repository runs/yolox-model \
  --nclasses 2

deepdetect train segformer \
  --train-data train.txt \
  --test-data test0.txt test1.txt \
  --weights segformer-b0-cls2.pt \
  --repository runs/segformer-model \
  --nclasses 13

Shared training options:

--test-data PATH [PATH ...]: one or more test dataset lists. The first test set is reported as test0, the second as test1, and so on when the backend emits per-test-set metrics.
--gpu / --no-gpu: request or disable GPU execution.
--gpuid ID [ID ...]: select one or more GPU ids. Comma-separated values are also accepted, for example --gpuid 0,1. Use --gpuid -1 for all GPUs. Supplying --gpuid implies --gpu unless --no-gpu is explicitly passed, which is rejected.
--width, --height: image input size. Defaults stay profile-specific: YOLOX uses 640x640, SegFormer uses 480x480.
--run-name: name for the training run. Defaults strictly to the directory name from --repository. No date, model prefix, or random suffix is added unless an explicit --run-name is provided.
--resume latest|best: resume training from --repository instead of staging --weights. latest uses the newest checkpoint and solver state found in the repository. best reads best_model.txt and resumes the matching checkpoint-N.* and solver-N.pt files.
--iterations, --batch-size, --iter-size, --base-lr, --test-interval: common solver controls. --iter-size controls gradient accumulation before each optimizer update.
--service-name: DeepDetect service name.
--job-dir: directory for CLI run manifests. Defaults to --repository. When omitted, run.json and metrics.jsonl are written directly in the model repository. When explicitly set, the run files are written under <job-dir>/<run-name>/.
--poll-interval, --timeout: async monitoring controls.
--terminal verbose|live: choose stdout behavior for async training. verbose is the default and preserves --output-format. live uses a fixed-size Rich display when stdout is a TTY and falls back to JSONL when it is not.
--sync / --no-sync: use blocking DeepDetect training or the default async job.
--dataset-check full|none: choose dataset validation mode before training. full is the default and performs the model-specific validation described below. none skips dataset validation for fast startup on datasets that have already been checked.

Training YAML files can override augmentation values for both YOLOX and SegFormer:

augmentation:
  mirror: true
  rotate: true
  crop_size: 384
  cutout: 0.25
  geometry:
    prob: 0.3
    zoom_in: false
  noise:
    prob: 0.02
  distort:
    prob: 0.02
class_weights: [1.0, 0.5, 2.0]

Only keys that are provided need to be listed; nested values are merged with the profile defaults. class_weights is passed to DeepDetect as mllib_parameters.class_weights.

Training creates a run directory containing:

run.json: run ID, command, model, service name, config snapshot, job ID when available, status, and latest status body.
metrics.jsonl: metric events extracted from DeepDetect status payloads.

Training also writes config.yaml at the root of --repository. This file is the effective training configuration after profile defaults, YAML config, explicit CLI arguments, and --set overrides have all been applied.

On resume, --weights is optional because the model state comes from --repository. When Visdom is enabled, the CLI replays previous metric history from --repository/metrics.json into the run environment before streaming new points, then ignores already-seen history points from the first live status poll.

The CLI emits events such as run_started, dataset_check, training_status, metric, and run_finished. Dataset checks are deliberately structured as extension points. With --dataset-check full, list files are checked for basic structure; YOLOX validates image paths, bbox sidecar paths, bbox line format, class ranges, and numeric bbox values; SegFormer validates mask values unless --skip-mask-validation is passed. With --dataset-check none, the CLI emits a skipped dataset-check event and leaves dataset validation to the DeepDetect backend.

Visdom Metric Sink

Training can also stream scalar losses and metrics to Visdom:

deepdetect train yolox ... --visdom --run-name ring-hand-yolox

Visdom options:

--visdom / --no-visdom: enable or disable Visdom streaming.
--visdom-server: server URL, default http://localhost.
--visdom-port: server port, default 8097.
--visdom-base-url: base URL, default /.
--visdom-offline-ok / --no-visdom-offline-ok: continue with a warning if Visdom is unavailable, default enabled.
--visdom-save / --no-visdom-save: call vis.save([run_name]) on close.
--visdom-results / --no-visdom-results: upload sampled test-set prediction overlays, default enabled when Visdom is enabled.
--visdom-results-count: number of sampled result images per test set, default 10. Use 0 to disable result image uploads.
--visdom-results-seed: random seed for deterministic result-image resampling.

The Visdom environment name is the training run_name. Loss-like metrics whose names contain loss are each sent to their own line plot. mAP metrics are split by metric name, for example map, map-05, map-50, and map-90 use separate windows. Per-test-set metrics such as map-50_test0 and map-50_test1 are sent to the same map-50 window with traces named test0 and test1. Other numeric metrics are grouped into scale-compatible line plots. Non-numeric metrics are ignored with a one-time warning. Non-finite numeric values such as NaN and Inf are silently skipped for Visdom. Result images are resampled from every test set at each evaluation interval. Detection results draw predicted boxes and labels on input-sized images; segmentation results show mask overlays on input-sized images.

Inference Commands

Inference accepts one or more images and supports batches:

deepdetect infer yolox image1.jpg image2.jpg \
  --weights yolox-nano_cls2.pt \
  --repository runs/yolox-model \
  --batch-size 2 \
  --confidence-threshold 0.25 \
  --benchmark

deepdetect infer segformer image1.png image2.png \
  --weights segformer-b0-cls2.pt \
  --repository runs/segformer-model \
  --batch-size 2 \
  --visualize \
  --output outputs/

Shared inference options:

--gpu / --no-gpu: request or disable GPU execution.
--gpuid ID [ID ...]: select one or more GPU ids. Comma-separated values are also accepted, for example --gpuid 0,1. Use --gpuid -1 for all GPUs. Supplying --gpuid implies --gpu unless --no-gpu is explicitly passed, which is rejected.
--width, --height: image input size. Defaults stay profile-specific: YOLOX uses 640x640, SegFormer uses 480x480.
--batch-size: number of images per DeepDetect predict call.
--benchmark / --no-benchmark: emit or suppress benchmark statistics.
--warmup: run unmeasured warmup predictions before benchmarking.
--visualize / --no-visualize: produce or suppress visual outputs.
--output: file or directory for visual outputs.

Prediction events include the image path, per-image wall-clock inference time, and the raw DeepDetect prediction payload, including available confidence values.

Monitoring

Version 1 uses DeepDetect async training plus structured stdout as the primary agent monitoring path. Agents can monitor the active process by reading JSONL events and can inspect the latest persisted status with:

deepdetect job status MODEL_REPOSITORY

An embedded REST status endpoint is intentionally deferred. A future HTTP adapter should reuse the same run manifest and DeepDetect job polling logic.

Extension Points

The implementation is split into small modules so future model families and agent interfaces can be added without changing the command contract:

Model profiles define service, train, and predict parameter defaults.
Dataset checks return structured results and may be expanded into richer validators.
Metric sinks accept metric events and can later forward them to TensorBoard, MLflow, Weights & Biases, or a DeepDetect-specific dashboard.
Visualization helpers are optional and only run when requested.