Chapter 6: Production Deployment
April 13, 2026 ยท View on GitHub
Welcome to Chapter 6: Production Deployment. In this part of MCP Python SDK Tutorial: Building AI Tool Servers, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Deploy MCP servers to production with Docker, monitoring, error handling, and scaling strategies.
Production Deployment Architecture
flowchart TD
SERVER[MCP Python Server] --> T{Transport Mode}
T -->|stdio| LOCAL[Local subprocess\nClaude Desktop / Code]
T -->|HTTP + SSE| DOCKER[Docker Container]
DOCKER --> K8S[Kubernetes / Cloud Run]
K8S --> LB[Load Balancer]
LB --> I1[Instance 1]
LB --> I2[Instance 2]
I1 --> MON[Monitoring: Prometheus + Grafana]
Docker Deployment
Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy server code
COPY . .
# Non-root user
RUN useradd -m mcpuser && chown -R mcpuser:mcpuser /app
USER mcpuser
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD python -c "import sys; sys.exit(0)"
CMD ["python", "server.py"]
docker-compose.yml
version: '3.8'
services:
mcp-server:
build: .
container_name: mcp-server
restart: unless-stopped
environment:
- LOG_LEVEL=INFO
- MAX_CONNECTIONS=100
volumes:
- ./data:/app/data:ro
- ./logs:/app/logs
healthcheck:
test: ["CMD", "python", "-c", "import socket; s=socket.socket(); s.connect(('localhost', 8000)); s.close()"]
interval: 30s
timeout: 10s
retries: 3
Monitoring
Logging
import logging
from logging.handlers import RotatingFileHandler
def setup_logging():
logger = logging.getLogger("mcp")
logger.setLevel(logging.INFO)
# File handler with rotation
file_handler = RotatingFileHandler(
"logs/server.log",
maxBytes=10_000_000, # 10MB
backupCount=5
)
file_handler.setFormatter(logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
))
logger.addHandler(file_handler)
return logger
logger = setup_logging()
Metrics
from prometheus_client import Counter, Histogram, start_http_server
import time
# Metrics
tool_calls = Counter('mcp_tool_calls_total', 'Total tool calls', ['tool_name'])
tool_duration = Histogram('mcp_tool_duration_seconds', 'Tool execution time', ['tool_name'])
@app.call_tool()
async def call_tool(name: str, arguments: dict):
start_time = time.time()
try:
result = await execute_tool(name, arguments)
tool_calls.labels(tool_name=name).inc()
return result
finally:
duration = time.time() - start_time
tool_duration.labels(tool_name=name).observe(duration)
# Start Prometheus metrics server
start_http_server(9090)
Error Handling
from mcp.types import TextContent
import traceback
@app.call_tool()
async def call_tool(name: str, arguments: dict):
try:
return await execute_tool(name, arguments)
except ValueError as e:
logger.warning(f"Validation error in {name}: {e}")
return [TextContent(type="text", text=f"โ Invalid input: {e}")]
except ConnectionError as e:
logger.error(f"Connection error in {name}: {e}")
return [TextContent(type="text", text="๐ Service temporarily unavailable")]
except Exception as e:
logger.exception(f"Unexpected error in {name}")
return [TextContent(type="text", text="โ ๏ธ Internal server error")]
Health Checks
from datetime import datetime
class HealthMonitor:
def __init__(self):
self.start_time = datetime.now()
self.request_count = 0
self.error_count = 0
async def check_health(self) -> dict:
uptime = (datetime.now() - self.start_time).total_seconds()
error_rate = self.error_count / max(self.request_count, 1)
return {
"status": "healthy" if error_rate < 0.1 else "degraded",
"uptime_seconds": uptime,
"total_requests": self.request_count,
"error_rate": error_rate
}
health = HealthMonitor()
@app.call_tool()
async def call_tool(name: str, arguments: dict):
health.request_count += 1
if name == "health":
status = await health.check_health()
return [TextContent(type="text", text=json.dumps(status))]
Scaling Strategies
Horizontal Scaling
# Use Redis for shared state across instances
import redis.asyncio as redis
class SharedCache:
def __init__(self):
self.redis = redis.from_url("redis://localhost:6379")
async def get(self, key: str):
return await self.redis.get(key)
async def set(self, key: str, value: str, ttl: int = 3600):
await self.redis.setex(key, ttl, value)
cache = SharedCache()
Load Balancing (nginx)
upstream mcp_servers {
least_conn;
server mcp1:8000;
server mcp2:8000;
server mcp3:8000;
}
server {
listen 80;
location / {
proxy_pass http://mcp_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Configuration
from pydantic_settings import BaseSettings
class ProductionConfig(BaseSettings):
# Server
server_name: str = "mcp-prod"
max_connections: int = 1000
timeout_seconds: int = 30
# Security
api_key_required: bool = True
allowed_origins: list[str] = ["https://app.example.com"]
# Monitoring
log_level: str = "INFO"
metrics_port: int = 9090
# Database
database_url: str
class Config:
env_file = ".env.production"
config = ProductionConfig()
Deployment Checklist
- โ Docker image built and tested
- โ Environment variables configured
- โ Logging configured with rotation
- โ Monitoring and metrics enabled
- โ Health checks implemented
- โ Error handling comprehensive
- โ Security hardened (no secrets in code)
- โ Rate limiting configured
- โ Backups configured for stateful data
- โ CI/CD pipeline set up
Next Steps
Chapter 7 covers client integration with Claude Code, Claude.ai, and custom applications.
Continue to: Chapter 7: Client Integration
Previous: โ Chapter 5: Authentication & Security
What Problem Does This Solve?
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for self, name, server so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 6: Production Deployment as an operating subsystem inside MCP Python SDK Tutorial: Building AI Tool Servers, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around logger, text, redis as your checklist when adapting these patterns to your own repository.
How it Works Under the Hood
Under the hood, Chapter 6: Production Deployment usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
self. - Input normalization: shape incoming data so
namereceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
server. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Source Walkthrough
Use the following upstream sources to verify implementation details while reading this chapter:
- MCP Python SDK repository
Why it matters: authoritative reference on
MCP Python SDK repository(github.com).
Suggested trace strategy:
- search upstream code for
selfandnameto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production