OpenTelemetry Observability
February 5, 2026 · View on GitHub
The kubernetes-mcp-server supports distributed tracing and metrics via OpenTelemetry (OTEL). Observability is optional and disabled by default.
What Gets Traced
The server automatically traces all operations through middleware without requiring any code changes to individual tools:
-
MCP Tool Calls - Every tool invocation with details:
- Tool name
- Success/failure status
- Duration
- Error details (when applicable)
-
HTTP Requests - All HTTP endpoints when running in HTTP mode:
- Request method and path
- Response status
- Client information
- Duration
Note: When running in STDIO mode only MCP tool calls are traced since there is no HTTP server.
Metrics
The server collects and exposes metrics through two mechanisms:
-
Stats Endpoint (
/stats) - JSON endpoint for real-time statistics:- Tool call counts by name
- Tool call errors
- HTTP request counts by method/path/status
- Server uptime
-
OTLP Export - When an endpoint is configured, metrics are also exported to your OTLP backend every 30 seconds.
Quick Start
1. Run an OTLP Backend Locally
Option A: Jaeger (traces only)
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
docker.io/jaegertracing/all-in-one:latest
Access the Jaeger UI at http://localhost:16686
Note: Jaeger only supports traces, not metrics. To disable metrics export and avoid warnings about
MetricsServicebeing unimplemented, setOTEL_METRICS_EXPORTER=none.
Option B: Grafana LGTM Stack (traces + metrics + logs)
For full observability with metrics support:
docker run -d --name lgtm \
-p 3000:3000 \
-p 4317:4317 \
-p 4318:4318 \
docker.io/grafana/otel-lgtm:latest
Access Grafana at http://localhost:3000 (default credentials: admin/admin)
2. Enable Tracing
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Run the server
npx -y kubernetes-mcp-server@latest
3. View Traces
Make some tool calls through your MCP client, then view traces in the Jaeger UI.
Example Trace
When you call resources_get for a Pod, you'll see a trace like this in Jaeger:
Trace ID: abc123def456789
Duration: 145ms
└─ tools/call resources_get [145ms]
├─ mcp.method.name: tools/call
├─ gen_ai.tool.name: resources_get
├─ gen_ai.operation.name: execute_tool
├─ rpc.jsonrpc.version: 2.0
├─ network.transport: pipe
└─ Status: OK
If the tool call triggers an HTTP request (in HTTP mode), you'll also see:
Trace ID: abc123def456789
Duration: 150ms
├─ POST /message [150ms]
│ ├─ http.request.method: POST
│ ├─ url.path: /message
│ ├─ http.response.status_code: 200
│ ├─ client.address: 192.168.1.100
│ │
│ └─ tools/call resources_get [145ms]
├─ mcp.method.name: tools/call
├─ gen_ai.tool.name: resources_get
├─ gen_ai.operation.name: execute_tool
├─ rpc.jsonrpc.version: 2.0
├─ network.transport: tcp
└─ Status: OK
Configuration
OpenTelemetry can be configured via TOML config file or environment variables. Environment variables take precedence over TOML config values.
Note: Telemetry is automatically enabled when an endpoint is configured. Use enabled = false in TOML to explicitly disable it.
Configuration Reference
| TOML Field | Environment Variable | Description |
|---|---|---|
enabled | - | Explicit enable/disable (overrides all) |
endpoint | OTEL_EXPORTER_OTLP_ENDPOINT | OTLP endpoint URL |
protocol | OTEL_EXPORTER_OTLP_PROTOCOL | Protocol: grpc or http/protobuf |
traces_sampler | OTEL_TRACES_SAMPLER | Sampling strategy |
traces_sampler_arg | OTEL_TRACES_SAMPLER_ARG | Sampling ratio (0.0-1.0) |
TOML Configuration
Add a [telemetry] section to your config file:
[telemetry]
# Optional: explicitly enable/disable (omit to auto-enable when endpoint is set)
enabled = true
endpoint = "http://localhost:4317"
# Protocol: "grpc" (default) or "http/protobuf"
protocol = "grpc"
# Trace sampling strategy
# Options: "always_on", "always_off", "traceidratio", "parentbased_always_on", "parentbased_always_off", "parentbased_traceidratio"
traces_sampler = "traceidratio"
# Sampling ratio for ratio-based samplers (0.0 to 1.0)
traces_sampler_arg = 0.1
TOML Examples
Enable with endpoint:
[telemetry]
endpoint = "http://localhost:4317"
Production with sampling:
[telemetry]
endpoint = "http://tempo-distributor:4317"
traces_sampler = "traceidratio"
traces_sampler_arg = 0.05 # 5% sampling
Explicitly disable:
[telemetry]
enabled = false
Environment Variables
Environment variables take precedence over TOML config. This allows you to override config file settings at runtime.
Endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
Note: The server gracefully handles failures. If the endpoint is unreachable, the server logs a warning and continues without tracing.
Optional Variables
# Service name (defaults to "kubernetes-mcp-server")
export OTEL_SERVICE_NAME=kubernetes-mcp-server
# Service version (auto-detected from binary, rarely needs manual override)
export OTEL_SERVICE_VERSION=1.0.0
# Additional resource attributes (useful for multi-environment deployments)
export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production,team=platform"
Endpoint Protocols
The server supports both gRPC and HTTP/protobuf protocols:
# gRPC (default, port 4317)
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# HTTP/protobuf (port 4318)
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
# Secure endpoints (HTTPS/gRPC with TLS)
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-secure.example.com:4317
# Custom CA certificate (for self-signed certificates)
export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca.crt
Sampling Configuration
By default, the server uses ParentBased(AlwaysSample) sampling:
- Root spans (no parent): Always sampled (100%)
- Child spans: Inherit parent's sampling decision
This is ideal for development but may generate high trace volumes in production.
Production Sampling
For production with high traffic, use ratio-based sampling:
# Sample 10% of traces
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1
Available Samplers
always_on- Sample everything (default for root spans)always_off- Disable tracing entirelytraceidratio- Sample a percentage (requiresOTEL_TRACES_SAMPLER_ARGbetween 0.0 and 1.0)parentbased_always_on- Respect parent span, default to always_onparentbased_always_off- Respect parent span, default to always_offparentbased_traceidratio- Respect parent span, default to ratio
Sampling Examples
# Development: Sample everything
export OTEL_TRACES_SAMPLER=always_on
# Production: 5% sampling (good for high-traffic services)
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.05
# Temporarily disable tracing
export OTEL_TRACES_SAMPLER=always_off
# Or just unset the endpoint
unset OTEL_EXPORTER_OTLP_ENDPOINT
Deployment Examples
Claude Code (STDIO Mode)
Add the MCP server to your project's .mcp.json or global ~/.claude/settings.json:
{
"mcpServers": {
"kubernetes": {
"command": "npx",
"args": ["-y", "kubernetes-mcp-server@latest"],
"env": {
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317",
"OTEL_TRACES_SAMPLER": "always_on"
}
}
}
}
For Jaeger (traces only): Add "OTEL_METRICS_EXPORTER": "none" to disable metrics export.
Note: In STDIO mode, only MCP tool calls are traced (no HTTP request spans).
Kubernetes Deployment (HTTP Mode)
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubernetes-mcp-server
spec:
template:
spec:
containers:
- name: kubernetes-mcp-server
image: quay.io/containers/kubernetes_mcp_server:latest
env:
# OTLP endpoint (required to enable tracing)
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://tempo-distributor.observability:4317"
# Sampling (recommended for production)
- name: OTEL_TRACES_SAMPLER
value: "traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
value: "0.1" # 10% sampling
# Resource attributes (helps identify this deployment)
- name: OTEL_RESOURCE_ATTRIBUTES
value: "deployment.environment=production,k8s.cluster.name=prod-us-west-2"
# Kubernetes metadata (optional, helps correlate traces with K8s resources)
- name: KUBERNETES_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
Note: The Kubernetes metadata environment variables are optional but recommended for production deployments. They help correlate traces with specific pods, namespaces, and nodes.
Docker
docker run \
-e OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4317 \
-e OTEL_TRACES_SAMPLER=always_on \
quay.io/containers/kubernetes_mcp_server:latest
Trace Attributes
MCP Tool Call Spans
Each tool call creates a span following MCP and OpenTelemetry semantic conventions:
Span Name Format: {mcp.method.name} {target} (e.g., "tools/call resources_get")
Attributes:
mcp.method.name- MCP protocol method (e.g., "tools/call") [Required]gen_ai.tool.name- Name of the tool being called (e.g., "resources_get", "helm_install") [Required for tool calls]gen_ai.operation.name- Set to "execute_tool" for tool calls [Recommended]rpc.jsonrpc.version- JSON-RPC version (typically "2.0") [Recommended]network.transport- Transport protocol: "pipe" for STDIO, "tcp" for HTTP [Recommended]error.type- Error classification: "tool_error" for tool failures, "_OTHER" for other errors [Conditional]
HTTP Request Spans
HTTP requests create spans following OpenTelemetry HTTP semantic conventions:
Span Name Format: {METHOD} {path} (e.g., "POST /message")
Attributes:
http.request.method- Request method (GET, POST, etc.) [Required]url.path- URL path [Required]url.scheme- URL scheme (http or https) [Required]server.address- Server host [Recommended]network.protocol.name- Protocol name (http) [Recommended]network.protocol.version- Protocol version (HTTP/1.1, HTTP/2) [Recommended]client.address- Client IP address [Recommended]http.route- Normalized route pattern (when different from path) [Conditional]user_agent.original- User agent string (when present) [Conditional]http.request.body.size- Request body size (when present) [Conditional]http.response.status_code- Response status code [Required]error.type- HTTP status code for 4xx/5xx responses [Conditional]
Note: HTTP spans only appear when running in HTTP mode. STDIO mode (Claude Code) only creates MCP tool call spans. The /healthz endpoint is not traced to reduce noise.
Stats Endpoint
When running in HTTP mode, the server exposes a /stats endpoint that returns real-time statistics as JSON:
curl http://localhost:8080/stats
Example response:
{
"total_tool_calls": 42,
"tool_call_errors": 2,
"tool_calls_by_name": {
"resources_list": 15,
"pods_get": 12,
"helm_list": 10,
"resources_get": 5
},
"total_http_requests": 100,
"http_requests_by_path": {
"/mcp": 50,
"/sse": 30,
"/message": 20
},
"uptime_seconds": 3600.5
}
The stats endpoint is useful for:
- Health monitoring and alerting
- Quick debugging without a full observability stack
- Integration with simple monitoring systems
Note: The /stats endpoint is only available in HTTP mode. In STDIO mode, use OTLP export for metrics.
Metrics Endpoint
When running in HTTP mode, the server exposes a /metrics endpoint for Prometheus scraping:
curl http://localhost:8080/metrics
This endpoint returns metrics in OpenMetrics/Prometheus text format, suitable for scraping by Prometheus or compatible systems.
Available Metrics
| Metric | Type | Description |
|---|---|---|
k8s_mcp_tool_calls_total | Counter | Total MCP tool calls (labeled by tool_name) |
k8s_mcp_tool_errors_total | Counter | Total MCP tool errors (labeled by tool_name) |
k8s_mcp_tool_duration_seconds | Histogram | Tool call duration in seconds |
k8s_mcp_http_requests_total | Counter | HTTP requests (labeled by http_request_method, url_path, http_response_status_class) |
k8s_mcp_server_info | Gauge | Server info (labeled by version, go_version) |
Prometheus Scrape Configuration
scrape_configs:
- job_name: 'kubernetes-mcp-server'
static_configs:
- targets: ['localhost:8080']
metrics_path: /metrics
Kubernetes ServiceMonitor
When deployed in Kubernetes with the Helm chart, enable the ServiceMonitor:
metrics:
serviceMonitor:
enabled: true
interval: 30s
Note: The /metrics endpoint is only available in HTTP mode.
Troubleshooting
Tracing not working?
-
Check endpoint is set:
echo $OTEL_EXPORTER_OTLP_ENDPOINT -
Check server logs (increase verbosity):
# Look for "OpenTelemetry tracing initialized successfully" kubernetes-mcp-server -v 2If tracing fails to initialize, you'll see:
Failed to create OTLP exporter, tracing disabled: <error details> -
Verify OTLP collector is reachable:
# For gRPC endpoint (port 4317) telnet localhost 4317 # For HTTP endpoint (port 4318) curl http://localhost:4318/v1/traces
No traces appearing in backend?
-
Check sampling - you might be sampling at 0% or using
always_off:echo $OTEL_TRACES_SAMPLER echo $OTEL_TRACES_SAMPLER_ARG -
Verify service name:
echo $OTEL_SERVICE_NAMESearch for this service name in your tracing UI (defaults to "kubernetes-mcp-server").
-
Check backend configuration - ensure your OTLP collector is forwarding to the right backend.
-
Verify protocol compatibility:
- If using HTTP-based backends, ensure you set
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf - Check if you need port 4317 (gRPC) or 4318 (HTTP)
- If using HTTP-based backends, ensure you set
TLS/Certificate Issues
If using HTTPS/secure endpoints:
-
Certificate errors:
# Provide custom CA certificate export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca.crt -
Self-signed certificates:
# For testing only - not recommended for production export OTEL_EXPORTER_OTLP_INSECURE=true
Performance Impact
Tracing has minimal performance overhead:
- Middleware tracing: Typically 1-2ms per tool call
- Network overhead: Spans are batched and exported every 5 seconds
- Memory: Approximately 1-5MB for span buffers
- CPU: Negligible (<1% for most workloads)
For production deployments with high traffic, use ratio-based sampling to reduce costs while maintaining observability.
Advanced Topics
Resource Detection
The OpenTelemetry SDK automatically detects and adds resource attributes from the environment:
- Host information: hostname, OS, architecture
- Process information: PID, executable name
- Container information: container ID (when running in containers)
- Kubernetes information: pod name, namespace (when K8s env vars are present)
These are merged with any attributes you set via OTEL_RESOURCE_ATTRIBUTES.
Distributed Tracing
When the kubernetes-mcp-server is part of a distributed system:
- Parent spans are automatically detected and respected
- Trace context is propagated via standard W3C Trace Context headers
- Sampling decisions from parent spans are inherited (via ParentBased sampler)
This means traces can span multiple services seamlessly.
Custom Resource Attributes
Add custom attributes to help identify and filter traces:
export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=staging,team=platform,region=us-west-2,version=v1.2.3"
These attributes appear on all spans from this service instance and are useful for:
- Filtering traces by environment (prod vs staging)
- Analyzing performance by region or deployment
- Tracking issues to specific versions or teams