Design & Decisions

April 8, 2026 · View on GitHub

This document captures architectural decisions and design patterns for the ToolHive Operator.

Operator Design Principles

CRD Attribute vs `PodTemplateSpec`

When building operators, the decision of when to use a podTemplateSpec and when to use a CRD attribute is always disputed. For the ToolHive Operator we have a defined rule of thumb.

Use Dedicated CRD Attributes For:

Business logic that affects your operator's behavior
Validation requirements (ranges, formats, constraints)
Cross-resource coordination (affects Services, ConfigMaps, etc.)
Operator decision making (triggers different reconciliation paths)

Use PodTemplateSpec For:

Infrastructure concerns (node selection, resources, affinity)
Sidecar containers
Standard Kubernetes pod configuration
Things a cluster admin would typically configure

Quick Decision Test:

"Does this affect my operator's reconciliation logic?" -> Dedicated attribute
"Is this standard Kubernetes pod configuration?" -> PodTemplateSpec
"Do I need to validate this beyond basic Kubernetes validation?" -> Dedicated attribute

MCPRegistry Architecture Decisions

Status Management Design

Decision: Use standard Kubernetes workload status pattern matching MCPServer — flat Phase + Ready condition + ReadyReplicas + URL.

Rationale:

Consistency with MCPServer and standard Kubernetes workload patterns
Enables kubectl wait --for=condition=Ready and standard monitoring
The operator only needs to track deployment readiness, not internal registry server state
Tracking internal sync/API states would require the operator to call the registry server, which with auth enabled is not feasible

Implementation: Controller sets Phase, Message, URL, ReadyReplicas, and a Ready condition directly based on the API deployment's readiness. The latest resource version is refetched before status updates to avoid conflicts.

History: The original design used a StatusCollector pattern (mcpregistrystatus package) that batched status changes from multiple independent sources — an APIStatusCollector for deployment state and originally a sync collector — then applied them atomically via a single Status().Update(). A StatusDeriver computed the overall phase from sub-phases (SyncPhase + APIPhase → MCPRegistryPhase). This was removed because with sync operations moved to the registry server itself, only one status source remained (deployment readiness), making the batching/derivation indirection unnecessary. The new approach produces the same number of API server calls with less abstraction.

Registry API Service Pattern

Decision: Deploy individual API service per MCPRegistry rather than shared service.

Rationale:

Isolation: Each registry has independent lifecycle and scaling
Security: Per-registry access control possible
Reliability: Failure of one registry doesn't affect others
Lifecycle Management: Automatic cleanup via owner references

Trade-offs: More resources consumed but better isolation and security.

Error Handling Strategy

Decision: Structured error types (registryapi.Error) with condition metadata.

Rationale:

Different error types need different handling strategies
Structured errors carry ConditionReason for setting Kubernetes conditions with specific failure reasons (e.g., ConfigMapFailed, DeploymentFailed)
Enables better observability via condition reasons

Implementation: registryapi.Error carries ConditionReason and Message. The controller uses errors.As to extract structured fields when available, falling back to generic NotReady reason for unstructured errors.

Performance Design Decisions

Resource Optimization

Status Updates: Single refetch-then-update per reconciliation cycle
API Deployment: Lazy creation only when needed (implemented)

Security Architecture

Permission Model

Minimal required permissions following principle of least privilege:

ConfigMaps: For storage management
Services/Deployments: For API service management
MCPRegistry: For status updates

Network Security

Optional network policies for registry API access control in security-sensitive environments.