Workloads Lifecycle Management
June 23, 2026 · View on GitHub
The workloads API provides a unified interface for managing MCP server deployments across different runtimes. This document explains how workloads are created, managed, and destroyed.
Overview
The workloads manager abstracts lifecycle operations across:
- Local Docker/Podman deployments
- Remote MCP servers
- Kubernetes deployments (via operator)
Implementation: pkg/workloads/manager.go
Workload Lifecycle
stateDiagram-v2
[*] --> Starting: Deploy
Starting --> Running: Success
Starting --> Error: Failed
Running --> Stopping: Stop
Running --> Unhealthy: Health Failed
Running --> Unauthenticated: Auth Failed
Running --> AuthRetrying: Transient Token Refresh Failures
Running --> Stopped: Container Exit
Stopping --> Stopped: Success
Stopped --> Starting: Restart
Stopped --> Removing: Delete
Unauthenticated --> Starting: Re-authenticate
Unauthenticated --> Removing: Delete
AuthRetrying --> Running: Refresh Succeeds
AuthRetrying --> Unauthenticated: Ceiling Exceeded or Permanent Error
Removing --> [*]: Success
Error --> Starting: Restart
Error --> Removing: Delete
States: pkg/container/runtime/types.go
starting,running,stopping,stoppedremoving,error,unhealthy,unauthenticated,auth_retryingpolicy_stopped,unknown
The auth_retrying cadence and ceiling can be tuned via environment
variables on the proxy process:
TOOLHIVE_TOKEN_AUTH_RETRYING_TICK_INTERVAL(default10m): cadence between background refresh attempts during the AuthRetrying window.TOOLHIVE_TOKEN_AUTH_RETRYING_MAX_ELAPSED(default24h): ceiling before the workload is finally markedunauthenticated.
Core Operations
Deploy
Foreground:
thv run my-server --foreground
Creates transport → deploys container → starts proxy → blocks until shutdown
Detached:
thv run my-server
Saves state → forks process → returns immediately → child runs in background
Implementation: pkg/workloads/manager.go
Stop
thv stop my-server
Container workload: Stops proxy process → stops container → preserves state
Remote workload: Stops proxy → preserves state
Implementation: pkg/workloads/manager.go
Start
thv start my-server
Note:
thv restartremains available as an alias for backward compatibility.
Loads state → verifies not running → starts workload with saved config
Implementation: pkg/workloads/manager.go
Delete
thv rm my-server
Container workload: Stops proxy → removes container → deletes state
Remote workload: Stops proxy → deletes state
Implementation: pkg/workloads/manager.go
Upgrade
thv upgrade check [my-server] # offline metadata comparison, never pulls
thv upgrade apply my-server --yes # verify + pull, then recreate
Upgrades apply only to registry-sourced workloads — those run by a registry entry name, which records RunConfig.RegistryServerName. A workload run from a raw image reference has no registry server name and is reported as not-registry-sourced.
Check is an offline comparison. The checker (pkg/workloads/upgrade.Checker) looks up the workload's RegistryServerName in the configured registry and compares the running image tag against the candidate the registry advertises (semver, conservative: a latest tag, a different repository, or non-comparable tags yield unknown). When the registry advertises a strictly newer tag the status is upgrade-available, and the result also surfaces env-var drift (variables the candidate now declares that the workload does not yet supply) and posture drift (transport, permission profile — network isolation is a local-only choice the registry cannot express, so it is not reported as drift). Checks never pull images.
Apply is the single security-critical path (pkg/workloads/upgrade.Applier). It re-derives the check against the registry on every apply (closing the time-of-check/time-of-use window — a CheckResult from an earlier check is never trusted), then, in order:
- Resolves and verifies the candidate image's provenance (by registry server name).
- Builds a merged
RunConfigthat preserves the entire user configuration — env vars, secrets, OIDC/authz/audit/telemetry, tool filters, middleware, transport/posture — and changes only the image, any merged env/secrets supplied via--env/--secret, and the registry source URLs. - Runs the policy gate and performs the verified pull.
- Only then asks the manager to recreate the workload via
UpdateWorkload(stop → delete → start with the new config).
Steps 1–3 all complete before any destruction, so a failure while preparing the candidate leaves the running workload untouched. There is no automatic rollback: once recreation begins, the previous image/config is not restored — recovery is a forward operation. Posture drift (transport, permission profile) is surfaced as a warning, not converged: the upgraded workload keeps its full existing posture, including transport, network isolation, and permission profile.
Runtime boundary: the "verified pull before destruction" guarantee holds precisely for local container runtimes (the scope here). On Kubernetes the verification and policy gate still precede recreation, but the byte-level pull is delegated to the kubelet and happens after recreation.
The same Applier backs both the CLI (thv upgrade apply) and the API (POST /api/v1beta/workloads/{name}/upgrade, with GET .../{name}/upgrade-check and GET .../upgrade-check for single and bulk checks), so the verify-then-pull ordering lives in exactly one place. thv list --check-upgrades annotates the workload list with each workload's check result.
Implementation: pkg/workloads/upgrade/, cmd/thv/app/upgrade.go
Proxy termination
Stop and delete terminate the proxy using its recorded PID. When PID-based termination is unavailable or fails (for example, the status file is missing, records no PID, or the process is already gone), they fall back to port-based cleanup: the process holding the proxy port is terminated only after it is confirmed to be this workload's proxy. This prevents an orphaned proxy from continuing to hold the port after the container has been stopped or removed.
Implementation: pkg/workloads/manager.go
List
Listing combines container workloads from the runtime with remote workloads from persisted state. The manager can filter workloads by label or group, and can optionally include stopped workloads.
Implementation: pkg/workloads/manager.go
Batch Operations
Some operations (stop, delete) support processing multiple workloads in a single invocation, handling each workload sequentially or in parallel as appropriate.
Pattern: Operations return a CompletionFunc (call to block until completion); internally uses golang.org/x/sync/errgroup
Timeout: 5 minutes per operation
Container vs Remote
Container Workloads
Components:
- Container (via runtime)
- Proxy process (detached mode)
- Permission profile
- Network isolation
Available operations: All
Remote Workloads
Components:
- Proxy process only
- No container
- No permission profile
Available operations: Deploy, stop, restart, delete, list
Detection: RunConfig.RemoteURL != ""
Implementation: pkg/workloads/manager.go
State Management
Storage Locations
RunConfig state:
- Path:
$XDG_STATE_HOME/toolhive/runconfigs/<name>.json - Default:
~/.local/state/toolhive/runconfigs/<name>.json - Contains: Full RunConfig
- Used for: Restart, export
Status file:
- Path:
$XDG_DATA_HOME/toolhive/statuses/<name>.json - Default:
~/.local/share/toolhive/statuses/<name>.json - Contains: Status, PID, timestamps
- Used for: List, monitoring
PID file (container workloads only):
- Path:
$XDG_DATA_HOME/toolhive/pids/toolhive-<name>.pid - Default:
~/.local/share/toolhive/pids/toolhive-<name>.pid - Contains: Proxy process PID
- Used for: Stop operation
Implementation: pkg/state/, pkg/workloads/statuses/
Status Manager
Provides atomic status updates:
SetWorkloadStatus- Update statusGetWorkload- Read statusSetWorkloadPID- Set PIDDeleteWorkloadStatus- Remove status
Implementation: pkg/workloads/statuses/file_status.go
Labels and Filtering
Standard Labels
The system automatically applies standard labels to workloads:
toolhive-name- Full workload nametoolhive-basename- Base name without timestamptoolhive-transport- Transport protocol typetoolhive-port- Proxy port number
Implementation: pkg/labels/, pkg/runner/config.go
Custom Labels
Users can apply custom labels for organizational purposes. Labels support filtering during list operations.
Implementation: pkg/workloads/types/labels.go
Related Documentation
- Core Concepts - Workload concept
- Deployment Modes - Lifecycle per mode
- Transport Architecture - Transport lifecycle
- Groups - Group operations