Observability and Telemetry
April 21, 2026 · View on GitHub
This document describes the observability architecture implemented in ToolHive for monitoring MCP (Model Context Protocol) server interactions. ToolHive provides OpenTelemetry-based instrumentation with support for distributed tracing, metrics collection, and structured logging.
This document is intended for developers working on ToolHive. For user guides on setting up and using these features, see the ToolHive documentation:
- Observability overview, including trace structure and example metrics
- CLI guide, including how to enable and configure telemetry and send to common backends
For migrating from legacy attribute names to the new OTEL MCP semantic conventions, see the Telemetry Migration Guide.
Overview
ToolHive's observability stack provides complete visibility into MCP proxy operations through:
- Distributed tracing: Track requests across the proxy-container boundary with OpenTelemetry traces
- Metrics collection: Monitor performance, usage patterns, and error rates with Prometheus and OTLP metrics
- Structured logging: Capture detailed audit events for compliance and debugging
- Protocol-aware instrumentation: MCP-specific insights beyond generic HTTP metrics
See the original design document for more details on the design and goals of this observability architecture.
Architecture
graph TD
A[MCP Client] --> B[ToolHive Proxy Runner]
B --> C[Container MCP Server]
B --> D[OpenTelemetry Middleware]
D --> E[Trace Exporter]
D --> F[Metrics Exporter]
E --> G[OTLP Endpoint]
E --> H[Jaeger]
E --> I[DataDog]
F --> J[Prometheus /metrics]
F --> K[OTLP Metrics]
G --> L[Observability Backend]
K --> L
J --> M[Prometheus Server]
classDef toolhive fill:#EDD9A3,color:#000;
classDef external fill:#7AB7FF,color:#000;
class B,D toolhive;
class L,M external;
Integration with Existing Middleware
The OpenTelemetry middleware integrates seamlessly with ToolHive's existing middleware stack:
graph TD
A[HTTP Request] --> B[Authentication Middleware]
B --> C[MCP Parsing Middleware]
C --> D[OpenTelemetry Middleware]
D --> E[Authorization Middleware]
E --> F[Audit Middleware]
F --> G[MCP Server Handler]
style D fill:#EDD9A3,color:#000;
The telemetry middleware:
- Leverages parsed MCP data from the parsing middleware
- Includes authentication context from JWT claims
- Captures authorization decisions for compliance
- Correlates with audit events for complete observability
This provides end-to-end visibility across the entire request lifecycle while maintaining the modular architecture of ToolHive's middleware system.
Configuration
CLI Flags
| Flag | Type | Default | Description |
|---|---|---|---|
--otel-endpoint | string | "" | OTLP endpoint URL (e.g., localhost:4317). Telemetry is disabled when empty and Prometheus is not enabled. |
--otel-tracing-enabled | bool | true | Enable distributed tracing (requires endpoint) |
--otel-metrics-enabled | bool | true | Enable OTLP metrics export (requires endpoint) |
--otel-sampling-rate | float | 0.1 | Trace sampling rate (0.0–1.0). The CLI default is 0.1 (10%); the Kubernetes CRD default is 0.05 (5%). Config file values override the CLI default when the flag is not explicitly set. |
--otel-service-name | string | "toolhive-mcp-proxy" | Service name for telemetry resource |
--otel-headers | string[] | nil | OTLP authentication headers (key=value format) |
--otel-insecure | bool | false | Use HTTP instead of HTTPS for the OTLP endpoint |
--otel-enable-prometheus-metrics-path | bool | false | Expose Prometheus /metrics endpoint on the transport port |
--otel-env-vars | string[] | nil | Environment variables to include in spans (comma-separated) |
--otel-custom-attributes | string | "" | Custom resource attributes (key1=value1,key2=value2) |
--otel-use-legacy-attributes | bool | true | Emit legacy attribute names alongside new OTEL semantic convention names |
Configuration File
Telemetry can also be configured via ~/.toolhive/config.yaml:
otel:
endpoint: "localhost:4317"
sampling-rate: 0.1
env-vars:
- NODE_ENV
- DEPLOYMENT_ENV
insecure: true
use-legacy-attributes: false
CLI flags take precedence over configuration file values when explicitly set.
Kubernetes CRD
MCPTelemetryConfig (preferred): Define telemetry settings in a shared
MCPTelemetryConfig resource and reference it via spec.telemetryConfigRef
in MCPServer, MCPRemoteProxy, or VirtualMCPServer. This eliminates duplication
when managing multiple servers. Each server provides a unique serviceName
override. Sensitive headers (API keys, bearer tokens) are stored in Kubernetes
Secrets via sensitiveHeaders[].secretKeyRef.
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPTelemetryConfig
metadata:
name: shared-otel
spec:
openTelemetry:
enabled: true
endpoint: otel-collector:4318
insecure: true
tracing:
enabled: true
samplingRate: "0.1"
metrics:
enabled: true
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPServer
metadata:
name: my-server
spec:
# ... other fields ...
telemetryConfigRef:
name: shared-otel
serviceName: my-server # unique per server
See examples/operator/mcp-servers/mcpserver_fetch_otel.yaml
for a complete example.
Inline (deprecated): The inline spec.telemetry (MCPServer, MCPRemoteProxy)
and spec.config.telemetry (VirtualMCPServer) fields still work but are
deprecated and will be removed in a future API version. They are mutually exclusive with
telemetryConfigRef (CEL enforced). All three resource types now support
spec.telemetryConfigRef.
For VirtualMCPServer telemetry, see the vMCP observability docs.
Validation Rules
- If an OTLP endpoint is configured but both
tracingEnabledandmetricsEnabledarefalse, configuration validation fails. - If only
enablePrometheusMetricsPathis enabled (no OTLP endpoint), Prometheus metrics are served without OTLP export. - If nothing is configured (no endpoint, no Prometheus), telemetry is disabled.
Metrics Reference
MCP Proxy Metrics
These metrics are emitted by the telemetry middleware (pkg/telemetry/middleware.go)
for each MCP server proxy.
toolhive_mcp_requests (Counter)
Total number of MCP requests processed.
| Attribute | Type | Description |
|---|---|---|
method | string | HTTP method (POST, GET) |
status_code | string | HTTP status code (200, 500) |
status | string | "success" or "error" (error if status >= 400) |
mcp_method | string | MCP method name (tools/call, resources/read, etc.) |
mcp_resource_id | string | Tool name, resource URI, or prompt name |
server | string | MCP server name |
transport | string | Backend transport type (stdio, sse, streamable-http) |
Note: SSE connection establishment events also increment this counter with
mcp_method="sse_connection"and do not includemcp_resource_id.
toolhive_mcp_request_duration (Histogram, seconds)
Duration of MCP requests. Uses default histogram bucket boundaries.
Attributes: Same as toolhive_mcp_requests.
mcp.server.operation.duration (Histogram, seconds)
Duration of MCP server operations per the OTEL MCP semantic conventions.
Bucket boundaries: [0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300]
| Attribute | Type | Condition | Description |
|---|---|---|---|
mcp.method.name | string | Always | MCP method (tools/call, resources/read, etc.) |
jsonrpc.protocol.version | string | Always | Always "2.0" |
network.transport | string | Always | "tcp" or "pipe" |
network.protocol.name | string | If applicable | "http" for SSE/streamable-http |
network.protocol.version | string | If available | HTTP protocol version (1.1, 2) |
error.type | string | On HTTP 5xx | HTTP status code as string |
gen_ai.operation.name | string | For tools/call | Always "execute_tool" |
gen_ai.tool.name | string | For tools/call | Tool name |
gen_ai.prompt.name | string | For prompts/get | Prompt name |
toolhive_mcp_tool_calls (Counter)
Total number of MCP tool invocations (only recorded for tools/call requests).
| Attribute | Type | Description |
|---|---|---|
server | string | MCP server name |
tool | string | Tool name |
status | string | "success" or "error" |
toolhive_mcp_active_connections (UpDownCounter)
Number of currently active MCP connections.
| Attribute | Type | Description |
|---|---|---|
server | string | MCP server name |
transport | string | Backend transport type |
connection_type | string | "sse" (only present for SSE connections) |
Span Attributes
HTTP Attributes
These follow the OTEL HTTP semantic conventions. They are always emitted.
Request attributes:
| Attribute | Type | Description |
|---|---|---|
http.request.method | string | HTTP request method |
url.full | string | Full request URL |
url.scheme | string | URL scheme (http, https) |
url.path | string | URL path |
url.query | string | URL query string (if present) |
server.address | string | Server host |
user_agent.original | string | User agent string |
http.request.body.size | int64 | Request body size (if > 0) |
Response attributes:
| Attribute | Type | Description |
|---|---|---|
http.response.status_code | int | Response HTTP status code |
http.response.body.size | int64 | Response body size |
MCP Protocol Attributes
These are set when an MCP JSON-RPC request is parsed by the MCP parsing
middleware (pkg/mcp/parser.go).
| Attribute | Type | Condition | Description |
|---|---|---|---|
mcp.method.name | string | Always | MCP JSON-RPC method name |
rpc.system.name | string | Always | Always "jsonrpc" |
jsonrpc.protocol.version | string | Always | Always "2.0" |
jsonrpc.request.id | string | If request has ID | JSON-RPC request ID |
mcp.resource.uri | string | Resource methods only | Resource URI |
mcp.server.name | string | Always | MCP server name |
mcp.is_batch | bool | If batch request | Batch request indicator |
The mcp.resource.uri attribute is set only for the following methods:
resources/read, resources/subscribe, resources/unsubscribe,
notifications/resources/updated.
Tool, Prompt, and Resource Attributes
For tools/call:
| Attribute | Type | Description |
|---|---|---|
gen_ai.tool.name | string | Tool name |
gen_ai.operation.name | string | Always "execute_tool" |
gen_ai.tool.call.arguments | string | Sanitized tool arguments (max 200 chars) |
For prompts/get:
| Attribute | Type | Description |
|---|---|---|
gen_ai.prompt.name | string | Prompt name |
For initialize:
| Attribute | Type | Description |
|---|---|---|
mcp.client.name | string | Client name from clientInfo |
Network and Transport Attributes
| Attribute | Type | Description | Values |
|---|---|---|---|
network.transport | string | Network transport protocol | "tcp" (SSE, streamable-http), "pipe" (stdio) |
network.protocol.name | string | Application protocol | "http" (SSE, streamable-http), empty (stdio) |
network.protocol.version | string | HTTP protocol version | "1.1", "2" |
mcp.backend.protocol.version | string | Backend MCP protocol version | SSE: "1.1" |
Session and Client Attributes
| Attribute | Type | Condition | Description |
|---|---|---|---|
mcp.session.id | string | Mcp-Session-Id header present | Session identifier |
mcp.protocol.version | string | MCP-Protocol-Version header present | MCP protocol version |
client.address | string | Remote address available | Client IP address |
client.port | int | Port parseable from remote address | Client port |
Error Attributes
| Attribute | Type | Condition | Description |
|---|---|---|---|
error.type | string | HTTP 5xx errors | HTTP status code as string (e.g., "500") |
Span status behavior:
- HTTP 5xx: Span status set to
Errorwith message"HTTP {code}" - HTTP 4xx: Span status left as
Unset(client errors per OTEL semconv) - HTTP 2xx/3xx: Span status set to
Ok
Environment and Custom Attributes
Environment variables (--otel-env-vars): Specified host environment
variables are read and added to spans as environment.{VAR_NAME} attributes.
Only variables explicitly listed in the configuration are captured.
Custom resource attributes (--otel-custom-attributes or
OTEL_RESOURCE_ATTRIBUTES): Key-value pairs added as OTEL resource attributes
to all telemetry signals.
SSE Connection Attributes
SSE connections get a dedicated short-lived span (sse.connection_established)
with:
| Attribute | Type | Description |
|---|---|---|
sse.event_type | string | Always "connection_established" |
mcp.server.name | string | MCP server name |
Plus the standard HTTP, network, and transport attributes.
Span Naming Conventions
Span names follow the OTEL MCP semantic conventions:
| Pattern | When | Example |
|---|---|---|
{mcp.method.name} {target} | MCP request with resource ID | "tools/call fetch" |
{mcp.method.name} | MCP request without resource ID | "initialize" |
{HTTP_METHOD} {url.path} | Non-MCP requests (fallback) | "GET /health" |
sse.connection_established | SSE connection setup | — |
All proxy spans use SpanKindServer.
Distributed Tracing
Trace Context Propagation
ToolHive supports W3C Trace Context propagation through two mechanisms:
- HTTP headers — Standard
traceparentandtracestateheaders - MCP
_metafield — Trace context embedded in the JSON-RPCparams._metaobject, as recommended by the MCP OpenTelemetry specification
Priority: When both are present, _meta trace context takes precedence
over HTTP headers, since _meta is the MCP-specified propagation mechanism.
How It Works
Inbound (client → ToolHive proxy):
The telemetry middleware first extracts trace context from HTTP headers, then
checks for _meta in the parsed MCP request. If _meta contains traceparent
(and optionally tracestate), the middleware extracts the trace context from it,
which overrides the HTTP header context. A child span is then created with the
extracted trace as parent.
{
"method": "tools/call",
"params": {
"name": "fetch",
"arguments": {"url": "https://example.com"},
"_meta": {
"traceparent": "00-abcdef1234567890abcdef1234567890-1234567890abcdef-01",
"tracestate": "vendor=value"
}
}
}
Outbound (vMCP → backend):
The InjectMetaTraceContext function (pkg/telemetry/propagation.go) can
inject the current trace context into the _meta field when forwarding requests
to backends, enabling end-to-end distributed tracing across the vMCP
aggregation layer.
Propagators
ToolHive configures the following OTEL propagators globally:
propagation.TraceContext{}— W3C Trace Contextpropagation.Baggage{}— W3C Baggage
Implementation
The trace context propagation is implemented in pkg/telemetry/propagation.go
using a MetaCarrier that implements propagation.TextMapCarrier for MCP
_meta maps. The MCP _meta field is extracted by the MCP parsing middleware
(pkg/mcp/parser.go) and stored in the request context.
Legacy Attribute Compatibility
ToolHive supports dual emission of span attributes controlled by the
useLegacyAttributes configuration option. When set to true (the current
default), both legacy and new OTEL semantic convention attribute names are
emitted on every span, allowing existing dashboards to continue working during
migration.
For a complete mapping of legacy to new attribute names and migration instructions, see the Telemetry Migration Guide.
Virtual MCP Server Telemetry
For observability in the Virtual MCP Server (vMCP), including backend request metrics, workflow execution telemetry, and distributed tracing, see the dedicated Virtual MCP Server Observability documentation.