Kubernetes Operator
May 27, 2026 · View on GitHub
End-to-end Kubernetes operator design and construction. Covers the operator pattern (control loops for stateful workloads), CRD design (schema, validation, conversion, status), the reconciliation loop (idempotency, convergence, level-triggered vs edge-triggered), framework selection (controller-runtime / Kubebuilder / operator-SDK / metacontroller), and the operational concerns (finalizers, leader election, RBAC scoping, status subresource, observability).
This skill works with Go-based operators (the dominant ecosystem) and notes when alternative languages or frameworks (Python / KOPF, Java / JOSDK) apply.
When to use this skill
| Situation | Skill applies |
|---|---|
| Building a new operator for an internal platform primitive | Yes — start with the operator pattern decision |
| Auditing an existing operator for production-readiness | Yes — use anti-patterns + scripts/reconciliation_audit.py |
| Designing CRDs for a custom resource | Yes — use CRD design + scripts/crd_validator.py |
| Deciding "operator vs Helm chart vs plain manifests" | Yes — use the decision matrix below |
| Scaffolding a new operator project | Yes — scripts/operator_scaffold.py |
| Debugging a controller that "isn't reconciling" | Yes — use reconciliation troubleshooting |
| Just running someone else's operator (Postgres, Kafka, etc.) | Partially — useful for understanding what it does and how to monitor it |
When NOT to write an operator
Operators are extensions of Kubernetes. They cost ongoing maintenance, RBAC review, security audit, version-skew handling, and (often) a dedicated SRE rotation. Avoid if:
- Your resource is stateless and fits CronJob + Deployment + ConfigMap (no extra control loop needed)
- You can model the workload as a Helm chart with values overrides
- An existing community operator (CNCF / vendor) covers your needs — adopt it instead
- The lifecycle has < 5 transitions (then a Job + readiness probe is simpler)
- You'd be the only consumer (consider a less heavyweight extension point — admission webhook, Lua/JSONNet, GitOps controller config)
A useful rule: write an operator only when the resource has at least 2-3 non-trivial state transitions AND existing primitives can't model them cleanly.
The operator pattern in one paragraph
An operator is a controller (Kubernetes control-loop program) that watches CustomResources (CRs) representing application-specific state, and drives the cluster toward the desired state declared in the CR's spec, recording observed state in the CR's status. The pattern: a domain expert encodes operational knowledge as software — "when CR X is created, do A; when X.spec.replicas changes, do B; when underlying Pod fails, do C" — so users get a declarative API instead of a wiki of runbooks.
Three building blocks:
- CRD (CustomResourceDefinition) — schema for the new resource type
- Controller — control loop that reconciles spec → state
- Custom Resource (CR) — instances users create to ask for things
Operator vs alternatives — decision matrix
| Need | Use |
|---|---|
| Run a stateless app | Deployment + Service |
| Run a stateful app with simple lifecycle | StatefulSet + headless Service |
| Manage configuration drift | Argo CD / Flux (GitOps controllers) |
| Add validation/mutation to existing K8s API | Admission webhook (no operator) |
| Manage external resources from inside cluster | Operator (if non-trivial), OR Crossplane (if matches its model) |
| Automate complex stateful workloads with domain expertise | Operator |
| Multi-cluster federation | Operator + push to other clusters (e.g., Karmada, KubeFed) |
| Simple "do X on YAML change" | Metacontroller (declarative composition, no Go) |
CRD design — the foundation
A bad CRD = perpetual operator pain. Spend time here.
Required design decisions
| Decision | Options | Default |
|---|---|---|
| Scope | Namespaced / Cluster | Namespaced (unless cluster-wide makes no sense without it) |
| Versioning | v1alpha1 / v1beta1 / v1 | Start v1alpha1, graduate per Kubernetes API conventions |
| Conversion | None / Webhook / None-with-storage-version | Webhook once you have > 1 served version |
| Subresources | status / scale / both | Always enable status; scale if user-controlled replicas |
| Validation | OpenAPI schema / admission webhook / both | OpenAPI for shape; webhook for cross-field |
| Printer columns | Yes (recommended) | Always — kubectl get is much nicer |
| Categories | Optional | Add for grouping (e.g., kubectl get all,databases) |
| Short names | Optional | Add (e.g., db for Database) |
Spec / Status separation
Spec = what the user wants (desired state) Status = what the operator observes (actual state)
The user writes Spec. The operator writes Status. Never the other way round.
apiVersion: example.com/v1
kind: Database
metadata:
name: prod-orders
spec: # user-owned
version: "14.10"
storage: 100Gi
replicas: 3
backup:
schedule: "0 2 * * *"
status: # operator-owned
phase: Running
observedGeneration: 5
conditions:
- type: Available
status: "True"
reason: AllReplicasReady
lastTransitionTime: "2026-05-27T08:00:00Z"
endpoints:
primary: prod-orders-primary.default.svc.cluster.local
replicas: ["prod-orders-replica-0.default.svc.cluster.local"]
observedVersion: "14.10"
CRD schema rules
- Use OpenAPI v3 schema, structural (no
x-kubernetes-preserve-unknown-fieldsunless you really need it) - Validate at the API edge — required fields, enum values, format, regex pattern, min/max
- Use defaults sparingly — every default is something users can't tell the operator "I don't care, you decide"
- Add descriptions — they show up in
kubectl explain, which is how users learn your API - Use
x-kubernetes-validations(CEL, K8s 1.25+) for cross-field validation that doesn't need a webhook - Version your API thoughtfully — v1alpha1 can break; v1 is forever
See references/operator-pattern-and-crds.md for the full schema design guide including conversion webhooks, status subresource semantics, and the printer-column DSL.
The reconciliation loop
The control loop is the heart of an operator. It runs whenever a watched resource changes (or periodically).
loop {
obj = fetch(CR_key)
if (obj is being deleted) {
handle_finalizer(obj)
return
}
desired = compute_desired_state(obj.spec)
actual = observe_actual_state(cluster)
if (desired != actual) {
apply_diff(desired, actual)
}
update_status(obj, actual)
if (transient_error) requeue_after(backoff)
}
Five rules of reconciliation
- Idempotent. Running reconcile twice with the same inputs produces the same effect. No "create or fail" — always "ensure exists."
- Level-triggered, not edge-triggered. Reconcile from current state, not from the diff of what changed. Don't say "user changed X, so increment Y" — say "Y should equal f(X), check and set if needed."
- Converge. Each reconcile gets closer to desired state, or stays there. Doesn't oscillate.
- Fail safe. If something errors, requeue with exponential backoff. Don't crash; don't loop tight.
- Observed generation tracking. Status.observedGeneration tells users "yes, I saw your latest spec change."
Common reconciliation patterns
| Pattern | When |
|---|---|
| Direct apply | Stateless dependent resources (Deployments, Services, ConfigMaps owned by this CR) |
| Phase machine | Multi-step lifecycle with explicit phases (Pending → Bootstrapping → Running → Updating → Terminating) |
| Sub-controllers | One controller per logical concern (e.g., one for the StatefulSet, one for backups, one for network policies) |
| Owner references | All child resources have OwnerReference back to the CR → garbage collection is automatic on delete |
See references/controller-runtime-patterns.md for Go-flavored controller-runtime examples per pattern, plus error handling, requeueing strategies, and indexer use.
Framework selection
The three dominant Go frameworks (and others):
| Framework | Use when | Skip when |
|---|---|---|
| controller-runtime | You want full control; building a complex multi-controller operator; library, not framework | You want batteries-included scaffolding |
| Kubebuilder | Greenfield operator; want scaffolding, conventions, makefile, CI templates | You don't want code generation in your repo |
| operator-SDK | Came from Operator Lifecycle Manager world; want OLM packaging | OLM isn't your target |
| Metacontroller | Operator logic fits a simple "given parent + children YAML → output YAML" model, written in any language | You need fine-grained control or complex state |
| KOPF (Python) | Team is Python-first; operator is mostly orchestration, not perf-critical | Need maximum K8s API surface |
| JOSDK (Java) | Team is Java-first; integrating with Java ecosystem libs | Same as above |
| kube-rs (Rust) | Team is Rust-first; perf or reliability concerns where Go isn't enough | No Rust expertise on the team |
Default recommendation: Kubebuilder for greenfield operators. It uses controller-runtime under the hood, gives you scaffolding without lock-in, and you can drop into raw controller-runtime any time.
Operational concerns
Finalizers
Without finalizers, when a CR is deleted, the controller doesn't get to clean up external resources (database in cloud, DNS record, S3 bucket).
Pattern:
- On CR create, add finalizer string to
metadata.finalizers - On CR delete (when
metadata.deletionTimestampis set), do cleanup - After successful cleanup, remove the finalizer
- K8s sees no finalizers, completes the delete
Anti-pattern: finalizer that can't complete (cleanup waits on something unavailable). User can't delete the CR; must manually edit out the finalizer (kubectl patch). Always have a max-retry / timeout for cleanup.
Leader election
With > 1 controller replica (for HA), only one should reconcile at a time. Use lease-based leader election (built into controller-runtime).
mgr, err := ctrl.NewManager(cfg, ctrl.Options{
LeaderElection: true,
LeaderElectionID: "myop.example.com",
LeaderElectionNamespace: "myop-system",
})
Without leader election, two controllers fight, status flaps, you've created your own chaos.
RBAC scoping
Default RBAC generated by Kubebuilder is wide ("get/list/watch/create/update/delete/patch on everything in my group + my CRD"). Tighten it:
- Get/list/watch on resources the controller observes
- Create/update/patch only on resources the controller manages
- Delete only on resources it owns
- No
*verbs unless absolutely necessary - Cluster-scoped vs namespace-scoped carefully — many operators don't need cluster scope
A controller with cluster-admin RBAC is a privilege-escalation vector.
Status subresource
Always enable status as a subresource:
versions:
- name: v1
served: true
storage: true
subresources:
status: {}
Why:
kubectl get -o yamlshows status without ambiguity- Updates to status don't change spec's
resourceVersion(no infinite reconcile loop) - RBAC can grant status-only updates separately
Observability
Operators are silent until they aren't. Wire metrics:
reconcile_total{controller="db"}— count of reconciliationsreconcile_errors_total{controller="db"}— errored reconciliationsreconcile_duration_seconds{controller="db"}— histogramworkqueue_depth{controller="db"}— backlogworkqueue_unfinished_work_seconds{controller="db"}— pending work agecontroller_runtime_active_workers{controller="db"}— concurrency
Controller-runtime exposes these via the /metrics endpoint. Scrape with Prometheus.
Logging:
- Structured (logr / zap)
- Include
namespace/nameof the reconciled CR - Include
reconcileID(UUID per reconciliation, threading through sub-calls) - WARN/ERROR on requeue with reason
- INFO on phase transitions
End-to-end workflows
Workflow: Scaffold a new operator
- Decide. Confirm operator is the right pattern (see anti-section above).
- Bootstrap.
Or usekubebuilder init --domain example.com --repo example.com/db-operator kubebuilder create api --group example.com --version v1alpha1 --kind Databasescripts/operator_scaffold.py --name db-operator --group example.com --kind Databasefor our pre-templated structure including stricter RBAC, observability, and finalizer scaffolding. - Design CRD. Write the spec/status schema; validate with
scripts/crd_validator.py. - Implement controller. Start with: fetch → handle delete (finalizer) → reconcile children → update status.
- Add tests. Use envtest (sets up a real K8s API server for tests).
- Wire observability. Prometheus metrics, structured logging.
- Document. README with example CR, available fields, status meanings, common errors.
Workflow: Audit an existing operator
- Run
scripts/reconciliation_audit.py --controller-path ./internal/controllers --crd ./config/crd/bases/*.yaml. - Review flagged anti-patterns:
- Missing finalizer + creates external resources
- No leader election + replicas > 1
- Status writes spec fields
- Reconcile is not idempotent (creates duplicates on retry)
- No requeue backoff (tight loop on error)
- Wide RBAC verbs / wide scopes
- Missing observed generation tracking
- File issues for each finding.
- Re-audit after fixes.
Workflow: Design a CRD
- Sketch the spec: minimum fields the user must provide to make sense of the resource.
- Sketch the status: what the operator will report back.
- Validate with
scripts/crd_validator.py --schema my-crd.yaml. Output: warnings on missing descriptions, dangerous fields (preserve-unknown), missing printer columns, etc. - Add OpenAPI validation: required, enum, pattern, min/max.
- Add
x-kubernetes-validationsfor cross-field rules. - Add printer columns: at minimum,
PhaseandAge. - Add
kubectl explaindescriptions for every field. - Iterate with users — read the YAML they write, identify confusion.
Workflow: Upgrade CRD versions
The v1alpha1 → v1 upgrade is one of the most error-prone parts of operator development.
- Add the new version (v1beta1) alongside v1alpha1. Both
served: true. Pick the storage version carefully (whichever you'd rather have in etcd). - Write a conversion webhook if fields differ. Or use "round-tripping" conversion — write to the storage version, read from the served version.
- Test conversion both directions with test cases.
- Deploy the new operator with both versions served.
- Migrate users to v1beta1 (update tooling, docs, examples).
- Remove v1alpha1 (
served: false, then later remove from CRD).
Don't delete an old version while users are still writing it.
Anti-patterns
- Status fields in spec. User writes "phase: Running", operator never overrides — chaos. Status is operator-only.
- No finalizer + external resources. User deletes CR, cloud resources leak.
- No leader election + multiple replicas. Two controllers race; status flaps.
- Tight loop on error. Reconcile returns error, K8s requeues immediately, infinite spin. Always backoff.
- Reconcile creates resources directly with name = CR name (no ownerRef). Garbage collection won't clean up; orphaned resources accumulate.
- Spec field is array; controller appends instead of setting. Each reconcile grows the array; eventually hits etcd object size limit.
- Cross-CR coupling without watchers. Controller for A reads B; if B changes, A doesn't reconcile until something else triggers it.
- Wide RBAC.
cluster-adminfor convenience; massive blast radius. - No status conditions. User has no way to tell why a CR is failing.
- Operator that requires
kubectl editfor normal operations. Defeats the declarative API. - No observed generation. User can't tell if their spec change has been processed.
- Operator that mutates Pods directly. Bypasses StatefulSet/Deployment semantics; gets very confusing very fast.
Tooling outputs
| Script | Input | Output |
|---|---|---|
scripts/crd_validator.py | Path to one or more CRD YAML files | Markdown report of schema issues, missing descriptions, dangerous fields, printer column suggestions |
scripts/operator_scaffold.py | Operator name, group, kind, namespace-scope | Bootstrapped project structure with stricter RBAC, observability, finalizer skeleton |
scripts/reconciliation_audit.py | Controller source directory (Go) + CRD YAMLs | Findings: missing finalizers, no leader election, tight loops, wide RBAC, status anti-patterns |
All scripts: stdlib only, argparse CLI, JSON or markdown output.
References
- operator-pattern-and-crds.md — pattern fundamentals, CRD schema design, versioning, conversion, status subresource
- controller-runtime-patterns.md — Go controller-runtime examples, reconciliation patterns, finalizers, leader election, indexers
- operator-anti-patterns.md — the full anti-pattern catalog with detection heuristics and fixes
Related skills
engineering/chaos-engineering— chaos-test operators (kill the controller, partition from API server)engineering/observability-designer— wire metrics + logging for operatorsengineering/incident-commander— operators amplify blast radius; incident response matters moreengineering/feature-flags-architect— operators withspec.feature.<x>.enabledfields effectively become flag systems; consider the trade-off