RoleBasedGroup (RBG) ๐
June 16, 2026 ยท View on GitHub
English | ็ฎไฝไธญๆ
๐ฏ A Kubernetes API for orchestrating distributed, stateful AI inference workloads with multi-role collaboration and built-in service discovery.
๐ Official Website: rolebasedgroup.github.io
๐๏ธ Architecture

๐ฐ Latest News
| Date | Release | Highlights |
|---|---|---|
| 2026-06-11 | v0.7.0 | v1alpha2 API stable release, conversion webhooks, CLI multi-node LLM serving, pod port allocator, coordinated policies, gang scheduling |
| 2026-02-18 | v0.6.0 | Coordinated scaling, stateful InstanceSet |
| 2025-12-03 | v0.5.0 | Native InstanceSet, in-place updates, Mooncake integration |
| 2025-09-23 | v0.4.0 | RBGS scaling, Volcano podgroup support |
๐ค Why RBG?
Traditional Kubernetes primitives (StatefulSets / Deployments) struggle with LLM inference services that:
| Challenge | Description |
|---|---|
| Multi-role topologies | gateway โ router โ prefill โ decode |
| Performance-sensitive | GPU/network topology matters |
| Atomic operations | deploy, upgrade, scale, failover across roles |
RBG treats an inference service as a role-based group โ a topologized, stateful, coordinated multi-role organism managed as a single unit.
๐ฏ Key Concepts
| Concept | Description |
|---|---|
| Role | Basic scheduling and rollout unit. Each role (prefill, decode) has its own spec, lifecycle and policies. |
| RoleBasedGroup | A group of roles forming one logical service (e.g., one LLM inference deployment). |
| RoleInstance | A collection of Pods with tightly bound lifecycle. Supports in-place updates and controls upgrades/status for the Pod group. |
| CoordinatedPolicy | A separate CRD for coordinating operations across roles. Controls maxSkew and progression during rolling updates and scaling. |
โจ Key Features โ SCOPE
| Capability | Description |
|---|---|
| Stable | Topology-aware deterministic operations with unique RoleID injection |
| Coordination | Cross-role policy engine: deployment pairing, coordinated upgrades, linked recovery |
| Orchestration | Role dependencies, precise startup sequences, topology self-aware service discovery |
| Performance | Hardware affinity scheduling: GPU-NVLink โ PCIe โ RDMA โ VPC |
| Extensible | Declarative APIs and plugin mechanisms for future architectures |
๐ Getting Started
๐ฆ Installation
Install from GitHub Releases (latest version):
VERSION=$(curl -sL https://api.github.com/repos/sgl-project/rbg/releases/latest | grep '"tag_name"' | sed -E 's/.*"v([^"]+)".*/\1/')
helm upgrade --install rbgs https://github.com/sgl-project/rbg/releases/download/v$VERSION/rbgs-$VERSION.tgz \
--namespace rbgs-system --create-namespace --wait
For detailed instructions, see Installation Guide.
๐ฎ Quick Start
Deploy a basic RoleBasedGroup with two roles and startup dependencies:
apiVersion: workloads.x-k8s.io/v1alpha2
kind: RoleBasedGroup
metadata:
name: nginx-cluster
spec:
roles:
- name: frontend
replicas: 1
standalonePattern:
template:
spec:
containers:
- name: nginx
image: nginx:1.14.1
ports:
- containerPort: 80
- name: backend
replicas: 3
dependencies: ["frontend"] # backend starts after frontend is ready
standalonePattern:
template:
spec:
containers:
- name: nginx
image: nginx:1.14.1
ports:
- containerPort: 8080
Deployment Patterns
| Pattern | Used For | Description |
|---|---|---|
| standalonePattern | Single-node deployment | Single pod per instance |
| leaderWorkerPattern | Multi-node distributed deployment | Leader + workers for tensor parallelism |
RoleTemplates
Reduce configuration duplication with reusable templates:
spec:
roleTemplates:
- name: base-template
template:
spec:
containers:
- name: nginx
image: nginx:1.14.1
roles:
- name: frontend
replicas: 2
standalonePattern:
templateRef:
name: base-template
- name: backend
replicas: 3
standalonePattern:
templateRef:
name: base-template
patch: # role-specific overrides
spec:
containers:
- name: nginx
resources:
requests:
memory: "128Mi"
๐ฅ๏ธ CLI Tool
kubectl-rbg is a CLI tool for managing RBG resources and LLM deployments.
Installation
# Build from source
make build-cli
chmod +x bin/kubectl-rbg
sudo mv bin/kubectl-rbg /usr/local/bin/
๐ง Inference Examples
Prefill/Decode Disaggregated
SGLang PD-disaggregated examples in examples/inference/:
| Example | Pattern | Description |
|---|---|---|
| pd-disagg-standalone.yaml | standalonePattern | Single pod per role, suitable for single-GPU instances |
| pd-disagg-leader-worker.yaml | leaderWorkerPattern | Multi-GPU tensor parallelism for decode role |
Aggregated Inference
SGLang aggregated examples:
| Example | Pattern | Description |
|---|---|---|
| agg-standalone.yaml | standalonePattern | Single-GPU aggregated inference |
| agg-leader-worker.yaml | leaderWorkerPattern | Multi-GPU tensor parallelism |
๐ Ecosystem Integration
RBG integrates with ecosystem components for production LLM inference:
NVIDIA Dynamo
NVIDIA Dynamo is an open-source, datacenter-scale inference stack that orchestrates multi-node AI workloads above inference engines like vLLM and SGLang:
| Example | Description |
|---|---|
| dynamo/pd-disagg.yaml | PD-disaggregated with Dynamo SGLang runtime |
| dynamo/pd-disagg-multi-nodes.yaml | Multi-node PD-disaggregated |
| dynamo/agg.yaml | Aggregated inference with Dynamo |
| dynamo/agg-multi-nodes.yaml | Multi-node aggregated |
Mooncake
Mooncake is a disaggregated architecture for LLM serving, providing KV cache transfer and reuse across distributed inference:
| Example | Description |
|---|---|
| mooncake-store/pd-disagg-kvcache-reuse-with-mooncake.yaml | PD-disaggregated with KV cache reuse |
| mooncake-store/agg-kvcache-reuse-with-mooncake.yaml | Aggregated with KV cache reuse |
| mooncake-transfer-engine/sgl-pd-disagg-with-mooncake-te.yaml | SGLang PD-disaggregated with transfer engine |
| mooncake-transfer-engine/vllm-pd-disagg-with-mooncake-te.yaml | vLLM PD-disaggregated with transfer engine |
๐ Examples Directory
๐งฑ Basic Examples (examples/basic/)
| Path | Description |
|---|---|
rbg/base.yaml | Basic RoleBasedGroup with role dependencies |
rbg/dependency/ | Role dependency configurations |
rbg/patterns/ | Deployment patterns: standalone, leader-worker, custom-components |
rbg/scheduling/ | Gang scheduling: Volcano, scheduler-plugins |
rbg/update-strategy/ | Rolling update with partition support |
rbg/restart-policy/ | Restart policy configurations |
rbg/scaling/ | Scaling adapter with HPA integration |
rbg/role-template/ | RoleTemplates for reducing duplication |
coordinated-policy/ | Coordinated rollout and scaling policies |
engine-runtime/ | Engine runtime profile configurations |
๐ง Inference Examples (examples/inference/)
| Path | Description |
|---|---|
agg-standalone.yaml | Aggregated SGLang (standalone pattern) |
agg-leader-worker.yaml | Aggregated (leader-worker pattern) |
pd-disagg-standalone.yaml | Prefill/Decode disaggregated (standalone) |
pd-disagg-leader-worker.yaml | Prefill/Decode disaggregated (leader-worker) |
ecosystem/ | NATS, etcd, Dynamo, Mooncake integration |
ecosystem/dynamo/ | NVIDIA Dynamo runtime examples |
ecosystem/mooncake/ | Mooncake KV cache transfer engine |
๐ Documentation
| Source | Link |
|---|---|
| Official Docs | rolebasedgroup.github.io |
| Local Docs | doc/TOC.md |
Version Compatibility
| RBG Version | Kubernetes | LeaderWorkerSet |
|---|---|---|
| v0.7.0 | >=v1.22.x | Not Required |
| v0.6.0 | >=v1.28.x | >=v0.7.0 |
| v0.5.0 | >=v1.28.x | >=v0.6.0 |
| v0.4.0 | >=v1.28.x | >=v0.7.0 |
๐ Ecosystem Projects
The rolebasedgroup GitHub organization hosts companion projects that extend RBG with autoscaling, CLI tooling, AI agent integration, and documentation:
| Project | Description |
|---|---|
| rbg-planner | Engine-agnostic, SLA-driven autoscaler for LLM inference on Kubernetes. Supports SGLang, vLLM, NVIDIA Dynamo via pluggable metrics adapters. Uses ARIMA-based load prediction and automatic SLA profiling to scale prefill/decode roles to meet TTFT/ITL latency targets. |
| inference-engine-runtime | Python-based sidecar runtime for AI inference engines. Provides LoRA adapter management, unified Prometheus metrics, and distributed topology management for SGLang and vLLM engines. |
| inference-ext-cli | RBG CLI extension (llmctl) for LLM inference workload management. Provides service/model management, benchmark orchestration, automated parameter search (Optuna), convergence analysis, and web dashboards for experiment visualization. |
| rbg-agent-guide | AI agent skill guides for RBG operations. Provides deployment skills for AI coding assistants (e.g., Claude Code) to help users deploy LLM models to Kubernetes using RBG CRD and CLI. |
| rolebasedgroup.github.io | Official RBG documentation website built with Docusaurus, deployed at rolebasedgroup.github.io. |
๐ค Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Verify copyright headers
make copyright-check
# Add missing headers
make copyright-fix
๐ฌ Community
| Channel | Link |
|---|---|
| Slack | #rbg channel |
| Issues | GitHub Issues |
| Discussions | Community Discussions |
๐ Code of Conduct
This project follows the Kubernetes Code of Conduct.
๐ Acknowledgment
RBG is inspired by and reuses code from LeaderWorkerSet (LWS).