Connection Pool Environment Variables
October 8, 2025 · View on GitHub
Overview
All connection pool settings can be configured via environment variables. This is useful for containerized deployments and CI/CD pipelines.
Environment Variable Naming
Format: LUNAROUTE_<PROVIDER>_<SETTING>=value
OpenAI Provider Pool Settings
# Request timeout (seconds) - default: 600
export LUNAROUTE_OPENAI_TIMEOUT_SECS=600
# Connection timeout (seconds) - default: 10
export LUNAROUTE_OPENAI_CONNECT_TIMEOUT_SECS=10
# Max idle connections per host - default: 32
export LUNAROUTE_OPENAI_POOL_MAX_IDLE=32
# Idle connection timeout (seconds) - default: 90
export LUNAROUTE_OPENAI_POOL_IDLE_TIMEOUT_SECS=90
# TCP keepalive interval (seconds) - default: 60
export LUNAROUTE_OPENAI_TCP_KEEPALIVE_SECS=60
# Max retries for transient errors - default: 3
export LUNAROUTE_OPENAI_MAX_RETRIES=3
# Enable pool metrics - default: true
export LUNAROUTE_OPENAI_ENABLE_POOL_METRICS=true
Anthropic Provider Pool Settings
# Request timeout (seconds) - default: 600 (10 min for extended thinking)
export LUNAROUTE_ANTHROPIC_TIMEOUT_SECS=600
# Connection timeout (seconds) - default: 10
export LUNAROUTE_ANTHROPIC_CONNECT_TIMEOUT_SECS=10
# Max idle connections per host - default: 32
export LUNAROUTE_ANTHROPIC_POOL_MAX_IDLE=32
# Idle connection timeout (seconds) - default: 90
export LUNAROUTE_ANTHROPIC_POOL_IDLE_TIMEOUT_SECS=90
# TCP keepalive interval (seconds) - default: 60
export LUNAROUTE_ANTHROPIC_TCP_KEEPALIVE_SECS=60
# Max retries for transient errors - default: 3
export LUNAROUTE_ANTHROPIC_MAX_RETRIES=3
# Enable pool metrics - default: true
export LUNAROUTE_ANTHROPIC_ENABLE_POOL_METRICS=true
Configuration Precedence
Settings are applied in this order (later overrides earlier):
- Built-in defaults (in code)
- YAML config file (
--config config.yaml) - Environment variables (highest priority)
Complete Example
#!/bin/bash
# Production deployment with optimized pool settings
# OpenAI - standard timeout, higher concurrency
export OPENAI_API_KEY="sk-..."
export LUNAROUTE_OPENAI_TIMEOUT_SECS=300
export LUNAROUTE_OPENAI_POOL_MAX_IDLE=64
export LUNAROUTE_OPENAI_POOL_IDLE_TIMEOUT_SECS=60
# Anthropic - extended timeout for thinking, lower concurrency
export ANTHROPIC_API_KEY="sk-ant-..."
export LUNAROUTE_ANTHROPIC_TIMEOUT_SECS=600
export LUNAROUTE_ANTHROPIC_POOL_MAX_IDLE=16
export LUNAROUTE_ANTHROPIC_POOL_IDLE_TIMEOUT_SECS=90
# Session recording
export LUNAROUTE_ENABLE_SESSION_RECORDING=true
export LUNAROUTE_LOG_LEVEL=info
# Start server
./lunaroute-server
Docker Compose Example
version: '3.8'
services:
lunaroute:
image: lunaroute:latest
ports:
- "8081:8081"
environment:
# OpenAI pool config
OPENAI_API_KEY: ${OPENAI_API_KEY}
LUNAROUTE_OPENAI_TIMEOUT_SECS: 300
LUNAROUTE_OPENAI_POOL_MAX_IDLE: 64
# Anthropic pool config
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
LUNAROUTE_ANTHROPIC_TIMEOUT_SECS: 600
LUNAROUTE_ANTHROPIC_POOL_MAX_IDLE: 32
# Logging
LUNAROUTE_LOG_LEVEL: info
RUST_LOG: lunaroute=debug
Tuning Guidelines
High-Traffic Scenario
# More connections, faster expiry
export LUNAROUTE_OPENAI_POOL_MAX_IDLE=128
export LUNAROUTE_OPENAI_POOL_IDLE_TIMEOUT_SECS=60
export LUNAROUTE_OPENAI_TCP_KEEPALIVE_SECS=30
Long-Running Requests (Extended Thinking)
# Higher timeout, longer keepalive
export LUNAROUTE_ANTHROPIC_TIMEOUT_SECS=900 # 15 minutes
export LUNAROUTE_ANTHROPIC_TCP_KEEPALIVE_SECS=120
export LUNAROUTE_ANTHROPIC_POOL_IDLE_TIMEOUT_SECS=120
Resource-Constrained Environment
# Fewer connections, quicker cleanup
export LUNAROUTE_OPENAI_POOL_MAX_IDLE=8
export LUNAROUTE_OPENAI_POOL_IDLE_TIMEOUT_SECS=30
export LUNAROUTE_OPENAI_MAX_RETRIES=1
Monitoring
Pool Metrics (Currently Not Populated)
⚠️ IMPORTANT LIMITATION: Pool metrics are defined but not currently populated in production.
The underlying HTTP client (reqwest) doesn't expose connection pool lifecycle events needed to track:
- Connection creation/reuse
- Idle connection counts
- Connection lifetimes
Metrics Infrastructure Ready:
- ✅ Metric definitions:
http_pool_connections_created_total,http_pool_connections_reused_total, etc. - ✅ Recording methods:
record_pool_connection_created(), etc. - ✅ Comprehensive test coverage (6 tests)
- ❌ Production instrumentation: Not possible without reqwest changes
Example metrics (when/if implemented):
# Connection reuse ratio (higher is better)
rate(http_pool_connections_reused_total[5m]) /
(rate(http_pool_connections_reused_total[5m]) +
rate(http_pool_connections_created_total[5m]))
# Connection creation rate (lower is better when traffic is steady)
rate(http_pool_connections_created_total[5m])
# Idle connections gauge
http_pool_connections_idle
# Connection lifetime distribution
histogram_quantile(0.95, rate(http_pool_connection_lifetime_seconds_bucket[5m]))
Options for Future Implementation:
- Wait for reqwest to add pool metrics API (upstream feature request needed)
- Migrate to hyper with custom Connector implementation (major refactoring)
- Switch to different HTTP client that exposes pool metrics (disruptive)
For now, rely on debug logging and application-level metrics (request latency, success rate) to monitor pool health.
Debug Logging
Set RUST_LOG=lunaroute_egress=debug to see connection pool behavior:
export RUST_LOG=lunaroute_egress=debug
./lunaroute-server
Output:
[DEBUG] Creating HTTP client: timeout=300s, pool_max_idle=64, pool_idle_timeout=60s...
[DEBUG] 🔌 Provider 'openai' initiating connection to https://api.openai.com
[DEBUG] ♻️ Provider 'openai' reused connection (2ms) - from pool
Troubleshooting
Symptom: Requests hang or timeout
Cause: Pool idle timeout too long, server closed connections
Fix: Reduce POOL_IDLE_TIMEOUT_SECS to 60-90s
Symptom: High connection creation rate
Cause: Pool too small or idle timeout too aggressive
Fix: Increase POOL_MAX_IDLE or POOL_IDLE_TIMEOUT_SECS
Symptom: Memory usage growing
Cause: Too many idle connections
Fix: Reduce POOL_MAX_IDLE or POOL_IDLE_TIMEOUT_SECS
Symptom: "Connection reset by peer" errors
Cause: TCP keepalive not working
Fix: Reduce TCP_KEEPALIVE_SECS to 30-60s
Implementation Status
✅ Phase 1 Complete:
- All settings configurable via YAML
- All settings configurable via environment variables
- Debug logging for connection behavior
- Prometheus metrics infrastructure ready (definitions, tests)
⚠️ Limitations:
- Pool metrics not populated (reqwest doesn't expose pool events)
- Static pool config metrics could be added (TODO)
🚧 Phase 2 (Future):
- Pool metrics instrumentation (requires upstream changes or HTTP client migration)
- Multi-provider per dialect support
- Per-provider pool override
- Connection rotation strategies
- Advanced health checks