RoleBasedGroup (RBG) ๐Ÿš€

June 16, 2026 ยท View on GitHub

English | ็ฎ€ไฝ“ไธญๆ–‡

License GitHub release Go Report Card

๐ŸŽฏ A Kubernetes API for orchestrating distributed, stateful AI inference workloads with multi-role collaboration and built-in service discovery.

๐ŸŒ Official Website: rolebasedgroup.github.io


๐Ÿ—๏ธ Architecture

RBG Architecture


๐Ÿ“ฐ Latest News

DateReleaseHighlights
2026-06-11v0.7.0v1alpha2 API stable release, conversion webhooks, CLI multi-node LLM serving, pod port allocator, coordinated policies, gang scheduling
2026-02-18v0.6.0Coordinated scaling, stateful InstanceSet
2025-12-03v0.5.0Native InstanceSet, in-place updates, Mooncake integration
2025-09-23v0.4.0RBGS scaling, Volcano podgroup support

๐Ÿค” Why RBG?

Traditional Kubernetes primitives (StatefulSets / Deployments) struggle with LLM inference services that:

ChallengeDescription
Multi-role topologiesgateway โ†’ router โ†’ prefill โ†’ decode
Performance-sensitiveGPU/network topology matters
Atomic operationsdeploy, upgrade, scale, failover across roles

RBG treats an inference service as a role-based group โ€” a topologized, stateful, coordinated multi-role organism managed as a single unit.


๐ŸŽฏ Key Concepts

ConceptDescription
RoleBasic scheduling and rollout unit. Each role (prefill, decode) has its own spec, lifecycle and policies.
RoleBasedGroupA group of roles forming one logical service (e.g., one LLM inference deployment).
RoleInstanceA collection of Pods with tightly bound lifecycle. Supports in-place updates and controls upgrades/status for the Pod group.
CoordinatedPolicyA separate CRD for coordinating operations across roles. Controls maxSkew and progression during rolling updates and scaling.

โœจ Key Features โ€” SCOPE

CapabilityDescription
StableTopology-aware deterministic operations with unique RoleID injection
CoordinationCross-role policy engine: deployment pairing, coordinated upgrades, linked recovery
OrchestrationRole dependencies, precise startup sequences, topology self-aware service discovery
PerformanceHardware affinity scheduling: GPU-NVLink โ†’ PCIe โ†’ RDMA โ†’ VPC
ExtensibleDeclarative APIs and plugin mechanisms for future architectures

๐Ÿš€ Getting Started

๐Ÿ“ฆ Installation

Install from GitHub Releases (latest version):

VERSION=$(curl -sL https://api.github.com/repos/sgl-project/rbg/releases/latest | grep '"tag_name"' | sed -E 's/.*"v([^"]+)".*/\1/')
helm upgrade --install rbgs https://github.com/sgl-project/rbg/releases/download/v$VERSION/rbgs-$VERSION.tgz \
            --namespace rbgs-system --create-namespace --wait

For detailed instructions, see Installation Guide.

๐ŸŽฎ Quick Start

Deploy a basic RoleBasedGroup with two roles and startup dependencies:

apiVersion: workloads.x-k8s.io/v1alpha2
kind: RoleBasedGroup
metadata:
  name: nginx-cluster
spec:
  roles:
    - name: frontend
      replicas: 1
      standalonePattern:
        template:
          spec:
            containers:
              - name: nginx
                image: nginx:1.14.1
                ports:
                  - containerPort: 80

    - name: backend
      replicas: 3
      dependencies: ["frontend"]  # backend starts after frontend is ready
      standalonePattern:
        template:
          spec:
            containers:
              - name: nginx
                image: nginx:1.14.1
                ports:
                  - containerPort: 8080

Deployment Patterns

PatternUsed ForDescription
standalonePatternSingle-node deploymentSingle pod per instance
leaderWorkerPatternMulti-node distributed deploymentLeader + workers for tensor parallelism

RoleTemplates

Reduce configuration duplication with reusable templates:

spec:
  roleTemplates:
    - name: base-template
      template:
        spec:
          containers:
            - name: nginx
              image: nginx:1.14.1

  roles:
    - name: frontend
      replicas: 2
      standalonePattern:
        templateRef:
          name: base-template

    - name: backend
      replicas: 3
      standalonePattern:
        templateRef:
          name: base-template
          patch:  # role-specific overrides
            spec:
              containers:
                - name: nginx
                  resources:
                    requests:
                      memory: "128Mi"

๐Ÿ–ฅ๏ธ CLI Tool

kubectl-rbg is a CLI tool for managing RBG resources and LLM deployments.

Installation

# Build from source
make build-cli
chmod +x bin/kubectl-rbg
sudo mv bin/kubectl-rbg /usr/local/bin/

๐Ÿง  Inference Examples

Prefill/Decode Disaggregated

SGLang PD-disaggregated examples in examples/inference/:

ExamplePatternDescription
pd-disagg-standalone.yamlstandalonePatternSingle pod per role, suitable for single-GPU instances
pd-disagg-leader-worker.yamlleaderWorkerPatternMulti-GPU tensor parallelism for decode role

Aggregated Inference

SGLang aggregated examples:

ExamplePatternDescription
agg-standalone.yamlstandalonePatternSingle-GPU aggregated inference
agg-leader-worker.yamlleaderWorkerPatternMulti-GPU tensor parallelism

๐Ÿ”— Ecosystem Integration

RBG integrates with ecosystem components for production LLM inference:

NVIDIA Dynamo

NVIDIA Dynamo is an open-source, datacenter-scale inference stack that orchestrates multi-node AI workloads above inference engines like vLLM and SGLang:

ExampleDescription
dynamo/pd-disagg.yamlPD-disaggregated with Dynamo SGLang runtime
dynamo/pd-disagg-multi-nodes.yamlMulti-node PD-disaggregated
dynamo/agg.yamlAggregated inference with Dynamo
dynamo/agg-multi-nodes.yamlMulti-node aggregated

Mooncake

Mooncake is a disaggregated architecture for LLM serving, providing KV cache transfer and reuse across distributed inference:

ExampleDescription
mooncake-store/pd-disagg-kvcache-reuse-with-mooncake.yamlPD-disaggregated with KV cache reuse
mooncake-store/agg-kvcache-reuse-with-mooncake.yamlAggregated with KV cache reuse
mooncake-transfer-engine/sgl-pd-disagg-with-mooncake-te.yamlSGLang PD-disaggregated with transfer engine
mooncake-transfer-engine/vllm-pd-disagg-with-mooncake-te.yamlvLLM PD-disaggregated with transfer engine

๐Ÿ“‚ Examples Directory

๐Ÿงฑ Basic Examples (examples/basic/)

PathDescription
rbg/base.yamlBasic RoleBasedGroup with role dependencies
rbg/dependency/Role dependency configurations
rbg/patterns/Deployment patterns: standalone, leader-worker, custom-components
rbg/scheduling/Gang scheduling: Volcano, scheduler-plugins
rbg/update-strategy/Rolling update with partition support
rbg/restart-policy/Restart policy configurations
rbg/scaling/Scaling adapter with HPA integration
rbg/role-template/RoleTemplates for reducing duplication
coordinated-policy/Coordinated rollout and scaling policies
engine-runtime/Engine runtime profile configurations

๐Ÿง  Inference Examples (examples/inference/)

PathDescription
agg-standalone.yamlAggregated SGLang (standalone pattern)
agg-leader-worker.yamlAggregated (leader-worker pattern)
pd-disagg-standalone.yamlPrefill/Decode disaggregated (standalone)
pd-disagg-leader-worker.yamlPrefill/Decode disaggregated (leader-worker)
ecosystem/NATS, etcd, Dynamo, Mooncake integration
ecosystem/dynamo/NVIDIA Dynamo runtime examples
ecosystem/mooncake/Mooncake KV cache transfer engine

๐Ÿ“š Documentation

SourceLink
Official Docsrolebasedgroup.github.io
Local Docsdoc/TOC.md

Version Compatibility

RBG VersionKubernetesLeaderWorkerSet
v0.7.0>=v1.22.xNot Required
v0.6.0>=v1.28.x>=v0.7.0
v0.5.0>=v1.28.x>=v0.6.0
v0.4.0>=v1.28.x>=v0.7.0

๐ŸŒ Ecosystem Projects

The rolebasedgroup GitHub organization hosts companion projects that extend RBG with autoscaling, CLI tooling, AI agent integration, and documentation:

ProjectDescription
rbg-plannerEngine-agnostic, SLA-driven autoscaler for LLM inference on Kubernetes. Supports SGLang, vLLM, NVIDIA Dynamo via pluggable metrics adapters. Uses ARIMA-based load prediction and automatic SLA profiling to scale prefill/decode roles to meet TTFT/ITL latency targets.
inference-engine-runtimePython-based sidecar runtime for AI inference engines. Provides LoRA adapter management, unified Prometheus metrics, and distributed topology management for SGLang and vLLM engines.
inference-ext-cliRBG CLI extension (llmctl) for LLM inference workload management. Provides service/model management, benchmark orchestration, automated parameter search (Optuna), convergence analysis, and web dashboards for experiment visualization.
rbg-agent-guideAI agent skill guides for RBG operations. Provides deployment skills for AI coding assistants (e.g., Claude Code) to help users deploy LLM models to Kubernetes using RBG CRD and CLI.
rolebasedgroup.github.ioOfficial RBG documentation website built with Docusaurus, deployed at rolebasedgroup.github.io.

๐Ÿค Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Verify copyright headers
make copyright-check

# Add missing headers
make copyright-fix

๐Ÿ’ฌ Community

ChannelLink
Slack#rbg channel
IssuesGitHub Issues
DiscussionsCommunity Discussions

๐Ÿ“œ Code of Conduct

This project follows the Kubernetes Code of Conduct.


๐Ÿ™ Acknowledgment

RBG is inspired by and reuses code from LeaderWorkerSet (LWS).