Virtual MCP Server Observability
April 21, 2026 · View on GitHub
This document describes the observability for the Virtual MCP Server (vMCP), which aggregates multiple backend MCP servers into a unified interface. The vMCP provides OpenTelemetry-based instrumentation for monitoring backend operations and composite tool workflow executions.
For general ToolHive observability concepts and proxy runner telemetry, see the main Observability and Telemetry documentation.
For migrating from legacy attribute names to the new OTEL MCP semantic conventions, see the Telemetry Migration Guide.
Overview
The vMCP telemetry provides visibility into:
- Backend operations: Track requests to individual backend MCP servers including tool calls, resource reads, prompt retrieval, and capability listing
- Workflow executions: Monitor composite tool workflow performance and errors
- Distributed tracing: Correlate requests across the vMCP and its backends
The vMCP uses a decorator pattern to wrap backend clients and workflow executors with telemetry instrumentation. This approach provides consistent metrics and tracing without modifying the core business logic.
The implementation of both metrics and traces can be found in pkg/vmcp/server/telemetry.go.
Metrics
Backend Metrics
Backend metrics track requests to individual backend MCP servers.
toolhive_vmcp_backends_discovered (Gauge)
Number of backends discovered. Recorded once at startup.
toolhive_vmcp_backend_requests (Counter)
Total number of requests sent to backend MCP servers.
| Attribute | Type | Description |
|---|---|---|
target.workload_id | string | Backend workload ID |
target.workload_name | string | Backend workload name |
target.base_url | string | Backend base URL |
target.transport_type | string | Backend transport type (stdio, sse, streamable-http) |
action | string | Internal action name (call_tool, read_resource, get_prompt, list_capabilities) |
mcp.method.name | string | MCP method name (tools/call, resources/read, prompts/get, list_capabilities) |
Method-specific attributes (added in addition to the above):
| Attribute | Method | Description |
|---|---|---|
tool_name | call_tool | Tool name (ToolHive-specific) |
gen_ai.tool.name | call_tool | Tool name (OTEL MCP semconv) |
resource_uri | read_resource | Resource URI (ToolHive-specific) |
mcp.resource.uri | read_resource | Resource URI (OTEL MCP semconv) |
prompt_name | get_prompt | Prompt name (ToolHive-specific) |
gen_ai.prompt.name | get_prompt | Prompt name (OTEL MCP semconv) |
toolhive_vmcp_backend_errors (Counter)
Total number of errors from backend MCP servers.
Attributes: Same as toolhive_vmcp_backend_requests.
toolhive_vmcp_backend_requests_duration (Histogram, seconds)
Duration of requests to backend MCP servers. Uses default histogram bucket boundaries.
Attributes: Same as toolhive_vmcp_backend_requests.
mcp.client.operation.duration (Histogram, seconds)
Duration of MCP client operations per the OTEL MCP semantic conventions.
Bucket boundaries: [0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300]
| Attribute | Type | Condition | Description |
|---|---|---|---|
mcp.method.name | string | Always | MCP method name |
network.transport | string | Always | "tcp" or "pipe" |
error.type | string | On error | Go error type (e.g., *url.Error) |
Workflow Metrics
Workflow metrics track composite tool workflow executions.
toolhive_vmcp_workflow_executions (Counter)
Total number of workflow executions.
| Attribute | Type | Description |
|---|---|---|
workflow.name | string | Workflow name |
toolhive_vmcp_workflow_errors (Counter)
Total number of workflow execution errors.
Attributes: Same as toolhive_vmcp_workflow_executions.
toolhive_vmcp_workflow_duration (Histogram, seconds)
Duration of workflow executions.
Attributes: Same as toolhive_vmcp_workflow_executions.
Distributed Tracing
Backend Operation Spans
The vMCP creates a span for each backend operation with SpanKindClient.
Span naming convention: {mcp.method.name} {target} where target is the
tool name or prompt name. For methods without a bounded target (e.g.,
resources/read, list_capabilities), only the method name is used to avoid
unbounded cardinality in span names. The resource URI is captured in span
attributes instead.
Examples:
"tools/call fetch"— tool call to the "fetch" tool"resources/read"— resource read (URI inmcp.resource.uriattribute)"prompts/get summarize"— prompt retrieval for "summarize""list_capabilities"— capability listing
Span attributes include both ToolHive-specific backward-compatible attributes
(target.workload_id, target.workload_name, target.base_url,
target.transport_type, action) and OTEL MCP spec attributes
(mcp.method.name, gen_ai.tool.name, mcp.resource.uri,
gen_ai.prompt.name).
Error handling: On error, the span records the error via span.RecordError()
and sets status to codes.Error.
Workflow Execution Spans
Workflow executor spans use the name telemetryWorkflowExecutor.ExecuteWorkflow
with the workflow.name attribute. These spans nest the individual backend
operation spans, enabling attribution of workflow errors or latency to specific
tool calls.
Trace Context Propagation
The vMCP client passes the current context through to backend calls, preserving
trace context across the vMCP aggregation layer. The
InjectMetaTraceContext function (pkg/telemetry/propagation.go) can inject
W3C Trace Context (traceparent, tracestate) into the MCP _meta field for
backends that support it.
Configuration
MCPTelemetryConfig (preferred): Define telemetry settings in a shared
MCPTelemetryConfig resource and reference it via spec.telemetryConfigRef
in VirtualMCPServer. This eliminates duplication when managing multiple servers
and keeps telemetry configuration consistent across MCPServer, MCPRemoteProxy,
and VirtualMCPServer resources.
# Shared telemetry configuration
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPTelemetryConfig
metadata:
name: shared-otel
spec:
openTelemetry:
enabled: true
endpoint: otel-collector:4318
insecure: true
tracing:
enabled: true
samplingRate: "0.1"
metrics:
enabled: true
prometheus:
enabled: true
---
# VirtualMCPServer referencing shared telemetry config
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
name: my-vmcp
spec:
telemetryConfigRef:
name: shared-otel
serviceName: my-vmcp
groupRef:
name: my-group
incomingAuth:
type: anonymous
See examples/operator/virtual-mcps/vmcp_with_telemetry_ref.yaml
for a complete example with an MCPGroup and backend MCPServer.
Inline (deprecated): The inline spec.config.telemetry field still works
but is deprecated and will be removed in a future API version. It is mutually exclusive with
telemetryConfigRef (CEL enforced). Migrate to telemetryConfigRef to use the
shared MCPTelemetryConfig pattern.
# Deprecated — use telemetryConfigRef instead
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
name: my-vmcp
spec:
groupRef:
name: my-group
config:
telemetry:
endpoint: "otel-collector:4317"
serviceName: "my-vmcp"
insecure: true
tracingEnabled: true
samplingRate: "0.1"
metricsEnabled: true
enablePrometheusMetricsPath: true
useLegacyAttributes: true
incomingAuth:
type: anonymous
See the VirtualMCPServer API reference for complete CRD documentation.
Related Documentation
- Observability and Telemetry - Main ToolHive observability documentation
- Telemetry Migration Guide - Legacy to new attribute migration
- VirtualMCPServer API Reference - Complete CRD specification