Inference extension cli

May 29, 2026 · View on GitHub

RBG (RoleBasedGroup) CLI extension for LLM inference workload management on Kubernetes.

Features

  • Service Management — Deploy, list, delete LLM inference services, interactive chat
  • Model Management — List and pull models from various sources (HuggingFace, ModelScope, etc.)
  • Benchmark — Run LLM inference benchmarks, view logs and results via dashboard
  • Auto-Benchmark — Automated benchmark orchestration with SLA evaluation, parameter search (Optuna), convergence analysis
  • Configuration — Manage engine, storage, and data source plugin configurations
  • Generate — Generate RBG deployment manifests from templates
  • Visualization — Web dashboards for experiment overview, parameter comparison, and convergence analysis

Project Structure

cmd/
├── llmctl/             # CLI entry point (llmctl binary)
└── autobenchmark/      # Auto-benchmark controller binary
pkg/
├── autobenchmark/      # Auto-benchmark core logic (controller, config, search, lifecycle)
├── config/             # Shared configuration
├── plugin/             # Plugin system (engine, source, storage)
└── util/               # Utilities
ui/
├── auto-benchmark/     # Auto-benchmark dashboard (React)
└── benchmark/          # Benchmark viewer (React)
tools/
├── genai/              # genai-bench Docker image build
└── optuna/             # Optuna bridge for parameter search
doc/
└── usage/              # CLI usage documentation

Prerequisites

  • Go 1.26+
  • Docker (for image builds)
  • Access to a Kubernetes cluster with RBG installed

Build

# Build CLI binary (current platform)
make build-cli

# Build for all platforms (linux/darwin, amd64/arm64)
make build-cli-all

# Build all binaries (CLI + autobenchmark controller + dashboard)
make build-all

# Install to GOPATH/bin
make install

Docker Images

# Build all images
make docker-build

# Build individual images
make docker-build-autobenchmark-ctl
make docker-build-benchmark-dashboard
make docker-build-autobenchmark-dashboard

# Multi-arch build and push (linux/amd64 + linux/arm64)
make docker-buildx

# Push images
make docker-push

Override registry:

IMG_REPO=your-registry.com/namespace make docker-build

Development

# Run tests
make test

# Format code
make fmt

# Lint
make lint

# Tidy modules
make tidy

Usage

See doc/usage/ for detailed CLI documentation.

# Basic usage
llmctl --help

# Service management
llmctl svc run <name> <model-id> [--engine vllm]
llmctl svc list
llmctl svc delete <name>

# Benchmark
llmctl benchmark run <rbg-name> [--config <config.yaml>]
llmctl benchmark list <rbg-name>
llmctl benchmark dashboard

# Model operations
llmctl model list
llmctl model pull <model-id>

# Configuration
llmctl config view
llmctl config get-engines
llmctl config get-sources
llmctl config get-storages

License

Apache License 2.0