Virtual MCP Server Observability

April 21, 2026 · View on GitHub

This document describes the observability for the Virtual MCP Server (vMCP), which aggregates multiple backend MCP servers into a unified interface. The vMCP provides OpenTelemetry-based instrumentation for monitoring backend operations and composite tool workflow executions.

For general ToolHive observability concepts and proxy runner telemetry, see the main Observability and Telemetry documentation.

For migrating from legacy attribute names to the new OTEL MCP semantic conventions, see the Telemetry Migration Guide.

Overview

The vMCP telemetry provides visibility into:

  1. Backend operations: Track requests to individual backend MCP servers including tool calls, resource reads, prompt retrieval, and capability listing
  2. Workflow executions: Monitor composite tool workflow performance and errors
  3. Distributed tracing: Correlate requests across the vMCP and its backends

The vMCP uses a decorator pattern to wrap backend clients and workflow executors with telemetry instrumentation. This approach provides consistent metrics and tracing without modifying the core business logic.

The implementation of both metrics and traces can be found in pkg/vmcp/server/telemetry.go.

Metrics

Backend Metrics

Backend metrics track requests to individual backend MCP servers.

toolhive_vmcp_backends_discovered (Gauge)

Number of backends discovered. Recorded once at startup.

toolhive_vmcp_backend_requests (Counter)

Total number of requests sent to backend MCP servers.

AttributeTypeDescription
target.workload_idstringBackend workload ID
target.workload_namestringBackend workload name
target.base_urlstringBackend base URL
target.transport_typestringBackend transport type (stdio, sse, streamable-http)
actionstringInternal action name (call_tool, read_resource, get_prompt, list_capabilities)
mcp.method.namestringMCP method name (tools/call, resources/read, prompts/get, list_capabilities)

Method-specific attributes (added in addition to the above):

AttributeMethodDescription
tool_namecall_toolTool name (ToolHive-specific)
gen_ai.tool.namecall_toolTool name (OTEL MCP semconv)
resource_uriread_resourceResource URI (ToolHive-specific)
mcp.resource.uriread_resourceResource URI (OTEL MCP semconv)
prompt_nameget_promptPrompt name (ToolHive-specific)
gen_ai.prompt.nameget_promptPrompt name (OTEL MCP semconv)

toolhive_vmcp_backend_errors (Counter)

Total number of errors from backend MCP servers.

Attributes: Same as toolhive_vmcp_backend_requests.

toolhive_vmcp_backend_requests_duration (Histogram, seconds)

Duration of requests to backend MCP servers. Uses default histogram bucket boundaries.

Attributes: Same as toolhive_vmcp_backend_requests.

mcp.client.operation.duration (Histogram, seconds)

Duration of MCP client operations per the OTEL MCP semantic conventions.

Bucket boundaries: [0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300]

AttributeTypeConditionDescription
mcp.method.namestringAlwaysMCP method name
network.transportstringAlways"tcp" or "pipe"
error.typestringOn errorGo error type (e.g., *url.Error)

Workflow Metrics

Workflow metrics track composite tool workflow executions.

toolhive_vmcp_workflow_executions (Counter)

Total number of workflow executions.

AttributeTypeDescription
workflow.namestringWorkflow name

toolhive_vmcp_workflow_errors (Counter)

Total number of workflow execution errors.

Attributes: Same as toolhive_vmcp_workflow_executions.

toolhive_vmcp_workflow_duration (Histogram, seconds)

Duration of workflow executions.

Attributes: Same as toolhive_vmcp_workflow_executions.

Distributed Tracing

Backend Operation Spans

The vMCP creates a span for each backend operation with SpanKindClient.

Span naming convention: {mcp.method.name} {target} where target is the tool name or prompt name. For methods without a bounded target (e.g., resources/read, list_capabilities), only the method name is used to avoid unbounded cardinality in span names. The resource URI is captured in span attributes instead.

Examples:

  • "tools/call fetch" — tool call to the "fetch" tool
  • "resources/read" — resource read (URI in mcp.resource.uri attribute)
  • "prompts/get summarize" — prompt retrieval for "summarize"
  • "list_capabilities" — capability listing

Span attributes include both ToolHive-specific backward-compatible attributes (target.workload_id, target.workload_name, target.base_url, target.transport_type, action) and OTEL MCP spec attributes (mcp.method.name, gen_ai.tool.name, mcp.resource.uri, gen_ai.prompt.name).

Error handling: On error, the span records the error via span.RecordError() and sets status to codes.Error.

Workflow Execution Spans

Workflow executor spans use the name telemetryWorkflowExecutor.ExecuteWorkflow with the workflow.name attribute. These spans nest the individual backend operation spans, enabling attribution of workflow errors or latency to specific tool calls.

Trace Context Propagation

The vMCP client passes the current context through to backend calls, preserving trace context across the vMCP aggregation layer. The InjectMetaTraceContext function (pkg/telemetry/propagation.go) can inject W3C Trace Context (traceparent, tracestate) into the MCP _meta field for backends that support it.

Configuration

MCPTelemetryConfig (preferred): Define telemetry settings in a shared MCPTelemetryConfig resource and reference it via spec.telemetryConfigRef in VirtualMCPServer. This eliminates duplication when managing multiple servers and keeps telemetry configuration consistent across MCPServer, MCPRemoteProxy, and VirtualMCPServer resources.

# Shared telemetry configuration
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPTelemetryConfig
metadata:
  name: shared-otel
spec:
  openTelemetry:
    enabled: true
    endpoint: otel-collector:4318
    insecure: true
    tracing:
      enabled: true
      samplingRate: "0.1"
    metrics:
      enabled: true
  prometheus:
    enabled: true
---
# VirtualMCPServer referencing shared telemetry config
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
  name: my-vmcp
spec:
  telemetryConfigRef:
    name: shared-otel
    serviceName: my-vmcp
  groupRef:
    name: my-group
  incomingAuth:
    type: anonymous

See examples/operator/virtual-mcps/vmcp_with_telemetry_ref.yaml for a complete example with an MCPGroup and backend MCPServer.

Inline (deprecated): The inline spec.config.telemetry field still works but is deprecated and will be removed in a future API version. It is mutually exclusive with telemetryConfigRef (CEL enforced). Migrate to telemetryConfigRef to use the shared MCPTelemetryConfig pattern.

# Deprecated — use telemetryConfigRef instead
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
  name: my-vmcp
spec:
  groupRef:
    name: my-group
  config:
    telemetry:
      endpoint: "otel-collector:4317"
      serviceName: "my-vmcp"
      insecure: true
      tracingEnabled: true
      samplingRate: "0.1"
      metricsEnabled: true
      enablePrometheusMetricsPath: true
      useLegacyAttributes: true
  incomingAuth:
    type: anonymous

See the VirtualMCPServer API reference for complete CRD documentation.