Embedded LLM Proxy
April 12, 2026 · View on GitHub
agentsh includes an embedded HTTP proxy that intercepts all LLM API requests from AI agents, providing Data Loss Prevention (DLP), usage tracking, and audit logging.
Overview
When enabled, the proxy:
- Starts automatically with each session, binding to a random available port
- Sets environment variables (
ANTHROPIC_BASE_URL,OPENAI_BASE_URL) so agents route through the proxy - Detects the LLM provider (Anthropic, OpenAI API, ChatGPT) from request headers
- Applies DLP redaction to request bodies before forwarding to upstream
- Logs requests and responses with token usage to session storage
- Extracts token usage for cost attribution and monitoring
┌─────────────────────────────────────────────────────────────────┐
│ AI Agent Session │
│ ┌────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Agent │───▶│ Embedded Proxy │───▶│ LLM Provider │ │
│ │ (Claude, │ │ │ │ (Anthropic, │ │
│ │ Codex, │ │ • DLP redact │ │ OpenAI, etc.) │ │
│ │ etc.) │◀───│ • Log request │◀───│ │ │
│ └────────────┘ │ • Track usage │ └─────────────────┘ │
│ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Session Storage │ │
│ │ llm-requests.jsonl │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Configuration
Proxy Configuration
# In server-config.yaml or session config
proxy:
mode: embedded # embedded | disabled
port: 0 # 0 = random available port
# Provider base URLs (customize for alternative endpoints)
providers:
anthropic: https://api.anthropic.com
openai: https://api.openai.com
Custom Provider URLs
You can configure custom base URLs to route traffic to alternative LLM endpoints:
proxy:
mode: embedded
providers:
# Use LiteLLM as an OpenAI-compatible proxy
openai: http://localhost:8000
# Use a corporate Anthropic gateway
anthropic: https://llm-gateway.corp.example.com/anthropic
Use cases:
- LiteLLM/vLLM: Route to self-hosted OpenAI-compatible endpoints
- Azure OpenAI: Point to Azure OpenAI Service endpoints
- Corporate gateways: Route through internal proxies for compliance
- Local development: Test against mock LLM servers
ChatGPT login flow: When providers.openai is set to the default URL (https://api.openai.com), OAuth tokens (non sk-* Bearer tokens) are automatically routed to the ChatGPT backend. Custom URLs route all traffic to the configured endpoint.
DLP Configuration
dlp:
mode: redact # redact | disabled
# Built-in patterns (all enabled by default)
patterns:
email: true # user@example.com
phone: true # 555-123-4567, (555) 123-4567
credit_card: true # 4111-1111-1111-1111
ssn: true # 123-45-6789
api_keys: true # sk-xxx, api-xxx, key_xxx
# Custom patterns for organization-specific data
custom_patterns:
- name: customer_id # Internal name (for logs)
display: identifier # Display name (shown in redacted output)
regex: "CUST-[0-9]{8}"
- name: internal_project
display: project_code
regex: "PROJ-[A-Z]{3}-[0-9]{4}"
Storage Configuration
storage:
store_bodies: false # Store full request/response bodies (Phase 2)
retention:
max_age_days: 30
max_size_mb: 500
eviction: oldest_first # oldest_first | largest_first
Dialect Detection
The proxy automatically detects the LLM provider from request headers:
| Provider | Detection Method |
|---|---|
| Anthropic | x-api-key header present, or anthropic-version header |
| OpenAI | Authorization: Bearer * header present |
Note: ChatGPT OAuth tokens (Bearer tokens without sk- prefix) are automatically routed to the ChatGPT backend when using the default OpenAI URL. When a custom providers.openai URL is configured, all OpenAI-dialect traffic routes to that endpoint.
Requests without recognized auth headers receive a 400 Bad Request response.
DLP Redaction
How It Works
- Request body is parsed as JSON
- All string values are scanned against enabled patterns
- Matches are replaced with
[REDACTED:pattern_name] - Redaction metadata is logged (field path, pattern type, count)
Example
Original request:
{
"messages": [{
"role": "user",
"content": "Email john@example.com about project CUST-12345678"
}]
}
After DLP redaction:
{
"messages": [{
"role": "user",
"content": "Email [REDACTED:email] about project [REDACTED:identifier]"
}]
}
Log entry includes:
{
"dlp": {
"redactions": [
{"field": "messages[0].content", "type": "email", "count": 1},
{"field": "messages[0].content", "type": "customer_id", "count": 1}
]
}
}
Token Usage Tracking
The proxy extracts token usage from LLM responses and normalizes across providers:
| Provider | Response Format | Normalized |
|---|---|---|
| Anthropic | usage.input_tokens, usage.output_tokens | Same |
| OpenAI | usage.prompt_tokens, usage.completion_tokens | input_tokens, output_tokens |
Usage is logged with each response and aggregated in session reports.
Storage Format
Requests and responses are logged to ~/.agentsh/sessions/<session-id>/llm-requests.jsonl:
Request entry:
{
"id": "req_abc123",
"session_id": "sess_xyz",
"timestamp": "2026-01-02T10:30:00Z",
"dialect": "anthropic",
"request": {
"method": "POST",
"path": "/v1/messages",
"body_size": 1234,
"body_hash": "sha256:..."
},
"dlp": {
"redactions": [...]
}
}
Response entry:
{
"request_id": "req_abc123",
"session_id": "sess_xyz",
"timestamp": "2026-01-02T10:30:01Z",
"duration_ms": 1500,
"response": {
"status": 200,
"body_size": 2048
},
"usage": {
"input_tokens": 150,
"output_tokens": 892
}
}
CLI Commands
Proxy Status
# Status for latest session
agentsh proxy status
# Status for specific session
agentsh proxy status <session-id>
# JSON output
agentsh proxy status --json
Output:
Session: abc123
Proxy: running on 127.0.0.1:54321
Mode: embedded
DLP: redact (5 patterns active)
Requests: 42 (3 with redactions)
Tokens: 15,230 in / 28,456 out
Session Logs with LLM Filter
# Show only LLM events
agentsh session logs <session-id> --type=llm
# Available types: llm, fs, net, exec
Reports with LLM Stats
Session reports automatically include LLM usage when available:
agentsh report <session-id> --level=detailed
Report includes:
## LLM Usage
| Provider | Requests | Input Tokens | Output Tokens |
|----------|----------|--------------|---------------|
| anthropic | 35 | 12,450 | 24,890 |
| openai | 7 | 2,780 | 3,566 |
## DLP Events
| Pattern | Redactions | Affected Requests |
|---------|------------|-------------------|
| email | 12 | 8 |
| api_key | 3 | 2 |
Environment Variables
The proxy sets these environment variables for agent processes:
| Variable | Value | Purpose |
|---|---|---|
ANTHROPIC_BASE_URL | http://127.0.0.1:<port> | Route Anthropic SDK through proxy |
OPENAI_BASE_URL | http://127.0.0.1:<port> | Route OpenAI SDK through proxy |
AGENTSH_SESSION_ID | Session ID | Correlate agent requests with session |
<NAME>_API_URL (or expose_as) | http://127.0.0.1:<port>/svc/<name>/ | Route child code to a declared http_services upstream; one variable per service. Names must not collide with the three reserved names above. |
Declared HTTP Services
http_services: is a top-level YAML policy block that lets operators declare named HTTP upstream services a child process can reach through the proxy gateway. Each entry gives a service a URL-safe name, binds it to an upstream HTTPS URL, and defines per-method, per-path rules — so an agent can be allowed to read GitHub issues but blocked from creating them, or gated behind an approval prompt for any write.
The gateway exposes each declared service as a path prefix /svc/<name>/. Child processes receive an env var (<NAME>_API_URL by default, or the name set in expose_as) pointing at that prefix — they treat it as the base URL and append their own paths. The proxy strips the prefix, evaluates the remaining path and method against the rules, and forwards to the upstream on allow.
Configuration example
http_services:
- name: github # URL-safe identifier; used in /svc/github/
upstream: https://api.github.com # must be https unless allow_direct is set
expose_as: GITHUB_API_URL # optional; default is GITHUB_API_URL here too
aliases: [api.github.com] # extra hostnames for the fail-closed host check
allow_direct: false # if false (default), direct calls to the host are blocked
default: deny # allow | deny; applied when no rule matches
rules:
- name: read-issues
methods: [GET] # empty or "*" means any method
paths:
- /repos/*/*/issues
- /repos/*/*/issues/*
decision: allow
message: "reading issues is allowed"
- name: create-issue-needs-approval
methods: [POST]
paths:
- /repos/*/*/issues
decision: approve
message: "Agent wants to create an issue: approve?"
timeout: 5m
Env var contract
When the proxy starts, it injects one env var per declared service into the child process environment:
- The name is
<NAME>_API_URLwhere<NAME>is the uppercasednamefield. - If
expose_asis set, that exact string is used instead. - The value is the proxy base URL with the service prefix appended, e.g.
http://127.0.0.1:PORT/svc/github/. - Child code should treat this as the new base URL and append its own path segments — e.g.
/repos/owner/repo/issuesbecomeshttp://127.0.0.1:PORT/svc/github/repos/owner/repo/issues. - Env var names must match
[A-Za-z_][A-Za-z0-9_]*, must not beANTHROPIC_BASE_URL,OPENAI_BASE_URL, orAGENTSH_SESSION_ID, and must be unique across all declared services (comparison is case-insensitive on Windows).
Credential substitution fields
When a service entry includes a secret: block, agentsh performs credential substitution
so the agent never sees the real credential. The following fields control this behaviour:
| Field | Required | Description |
|---|---|---|
secret.ref | Yes (when secret: present) | Secret store URI, e.g. vault://kv/data/github#token. Scheme must match a declared providers: entry. |
secret.format | Yes (when secret: present) | Fake credential template, e.g. ghp_{rand:36}. Must have {rand:N} with N >= 24. |
inject.header.name | No | Header to inject the real credential into, e.g. Authorization. Only valid when secret is configured. |
inject.header.template | With inject.header.name | Template string, must contain {{secret}}. E.g. Bearer {{secret}}. |
scrub_response | No | Replace real credentials in response bodies with fakes. Defaults to true when secret is present, false otherwise. |
Decision flow
For each request arriving at /svc/<name>/...:
- The service is looked up by name from the path prefix.
- The remaining path is checked for traversal:
//,., and..segments are rejected with 403 before any rule runs. A single trailing slash is stripped before matching. - Rules are evaluated in declaration order. The first rule whose
methodsandpathsboth match wins. - If no rule matches, the service's
defaultapplies (denyif not set). allowforwards the request to the upstream;denyreturns 403;approvegates on the approvals manager;auditlogs and forwards.
Fail-closed host enforcement
When a service is declared with allow_direct: false (the default), the netmonitor blocks direct HTTP/HTTPS connections to the upstream hostname and all aliases. The child process can only reach that host through the gateway prefix. This ensures all traffic is subject to the declared rules.
When a direct attempt is blocked, an http_service_denied_direct event is emitted in the audit stream. Setting allow_direct: true opts a single service out of this constraint — use it only as an escape hatch, for example when a third-party SDK cannot be configured to use a custom base URL.
Logging
HTTP service requests are logged to the same JSONL file as LLM requests (~/.agentsh/sessions/<session-id>/llm-requests.jsonl). Log entries carry a service_kind discriminator: "llm" for LLM proxy traffic and "http_service" for declared service traffic. The same storage helpers (StoreRequestBody, StoreResponseBody) and body-hash recording that apply to LLM entries apply here, so requests and responses are stored and retrievable through the same session-log commands.
When to use http_services
Use http_services when you want to expose a specific, audited surface of a third-party API to an agent, while blocking everything else on that host. If you only need to allow the agent to reach a host without per-path rule enforcement, a network_rules allow is simpler. If you need to allow a host but do not want the per-method/path audit trail, use network rules. http_services is the right tool when you need the combination of: specific allowed paths, block-everything-else on that host, approval gating for sensitive operations, and a full request/response audit log.
Use http_services with secret: when you want the gateway to manage credentials on
behalf of the agent — the agent never sees the real credential, and the gateway injects
it on allowed requests. This is the recommended pattern for any service where the agent
needs to authenticate but should not hold the credential directly.
Security Considerations
What the Proxy Protects Against
| Threat | Protection |
|---|---|
| PII leakage to LLM | DLP redaction removes sensitive data before it reaches the provider |
| Credential exposure | API key patterns detect and redact secrets in prompts |
| Untracked LLM usage | All requests logged with token counts for cost attribution |
| Shadow AI | Agents must route through proxy; direct calls bypass session controls |
What the Proxy Does NOT Protect Against
| Threat | Reason |
|---|---|
| Encoded/obfuscated PII | Regex patterns only match plain text |
| PII in images/files | Only text content is scanned |
| Malicious agent bypassing proxy | Agent could ignore env vars (defense in depth with network rules) |
| LLM provider data retention | Data reaches provider after redaction |
Best Practices
- Enable network rules to block direct LLM API access, forcing agents through the proxy
- Review custom patterns to cover organization-specific sensitive data
- Monitor redaction logs to detect and address data leakage attempts
- Set retention policies appropriate for your compliance requirements
Troubleshooting
Proxy Not Starting
# Check proxy status
agentsh proxy status
# Check session logs for errors
agentsh session logs <session-id> --type=llm
Requests Not Routed Through Proxy
Verify environment variables are set:
echo $ANTHROPIC_BASE_URL
echo $OPENAI_BASE_URL
If empty, the proxy may be disabled or failed to start.
DLP Not Redacting Expected Patterns
- Verify DLP mode is
redact(notdisabled) - Check that the relevant pattern is enabled
- For custom patterns, verify the regex syntax
High Latency
The proxy adds minimal overhead (<10ms typically). If experiencing high latency:
- Check network connectivity to upstream
- Verify storage disk I/O isn't saturated
- Consider increasing storage retention eviction frequency