=^..^= Pounce
June 18, 2026 · View on GitHub
Pure Python ASGI server for Python 3.14t, with a frozen config model and a low-overhead HTTP/1.1 fast path.
import pounce
pounce.run("myapp:app")
What is Pounce?
Pounce is a Python ASGI server for Python 3.14+, with a worker model designed for free-threaded Python 3.14t. It runs standard ASGI applications, supports streaming responses, and gives you a clear upgrade path from process-based servers such as Uvicorn.
Pounce's built-in HTTP/1.1 parser is optimized for the sync worker hot path, its
ServerConfig object is frozen after construction, and thread-worker reloads
use generational worker swaps with drain behavior.
On Python 3.14t, worker threads share one interpreter and one copy of your app. On GIL builds, Pounce falls back to multi-process workers automatically.
Why people pick it:
- ASGI-first — Runs standard ASGI apps with CLI and programmatic entry points
- Free-threading native — True thread parallelism with a frozen shared
ServerConfig - Fast-path parsing — Built-in HTTP/1.1 parser for sync workers with tested smuggling and header-limit checks
- Protocol extras — HTTP/2 and WebSocket are install-gated optional paths; HTTP/3 is optional with limited parity
- Thread-worker reloads — Rolling restart uses generational worker swap with drain behavior on supported worker modes
- Observable surfaces — Typed lifecycle events, optional Prometheus metrics, OpenTelemetry, and Server-Timing headers
- Optional helpers — TLS, compression, static files, middleware, rate limiting, and request queueing stay opt-in
- Migration path — Familiar CLI for teams moving from Uvicorn-style deployments
See docs/design/core-contract.md for the supported core, optional helpers, and proof required for public claims.
Use Pounce For
- Serving ASGI apps — Tunable workers, TLS, graceful shutdown, and deployment controls
- Free-threaded Python deployments — Shared-memory worker threads on Python 3.14t
- Streaming workloads — Server-sent events, streamed HTML, and token-by-token responses
- Teams migrating from Uvicorn — Similar CLI shape with a different worker model
Framework Compatibility
Tested in CI with 48 integration tests across every major ASGI framework:
| Framework | Status | Features Verified |
|---|---|---|
| FastAPI 0.135+ | Full | Routing, Pydantic validation, dependency injection, middleware, exception handlers, lifespan, WebSocket, streaming, OpenAPI schema |
| Starlette 1.0+ | Full | Routing, BaseHTTPMiddleware, lifespan with state, streaming, WebSocket, background tasks, exception handlers |
| Django 6.0+ | Full | Async views, URL routing, path params, JSON body, query params, middleware, error handling |
| Litestar 2.21+ | Core | Routing, dependency injection, middleware, lifespan, streaming, exception handlers. WebSocket: known routing issue |
Pounce achieves compatibility through correct ASGI 3.0 implementation — no framework-specific code or workarounds.
Performance
Pounce is designed to make the pure-Python request path competitive while keeping the server core free of C extensions. Treat the numbers below as a benchmark snapshot, not a universal guarantee.
| Scenario | Pounce | Uvicorn | Notes |
|---|---|---|---|
| 1 worker | ~7.2k req/s | ~6.5k req/s | Async event loop, h11 parser |
| 4 workers | ~16k req/s | ~17k req/s | Threads (pounce) vs processes (uvicorn) |
Measured with wrk -t4 -c100 -d10s on macOS Apple Silicon, plain-text "hello world" ASGI app, Python 3.14t. Re-run locally before making deployment decisions.
Run pounce bench --workers 4 --compare to reproduce on your machine.
For release or PR evidence, use
python benchmarks/run_benchmark.py --repeat 5 --artifact-output results.json
so the run carries the metadata required by benchmarks/artifact-schema.json
and grouped variance across samples.
Key optimizations in the sync worker path:
- Fast HTTP/1.1 parser — Direct bytes parsing is benchmarked separately from h11 and covers method validation, header size limits, duplicate
Content-Length, andContent-Length/Transfer-Encodingambiguity - Keep-alive connections — Connection reuse eliminates TCP handshake overhead
- Shared socket distribution — Single accept queue for thread workers avoids macOS SO_REUSEPORT limitations
Installation
pip install bengal-pounce
Requires Python 3.14+
Optional extras:
pip install bengal-pounce[h2] # HTTP/2 stream multiplexing
pip install bengal-pounce[ws] # WebSocket via wsproto
pip install bengal-pounce[tls] # TLS with truststore
pip install bengal-pounce[h3] # HTTP/3 (QUIC/UDP, requires TLS; limited parity)
pip install bengal-pounce[full] # All protocol extras
Quick Start
| Usage | Command |
|---|---|
| Programmatic | pounce.run("myapp:app") |
| CLI | pounce serve --app myapp:app |
| Multi-worker | pounce serve --app myapp:app --workers 4 |
| TLS | pounce serve --app myapp:app --ssl-certfile cert.pem --ssl-keyfile key.pem |
| HTTP/3 | pounce serve --app myapp:app --http3 --ssl-certfile cert.pem --ssl-keyfile key.pem |
| Dev reload | pounce serve --app myapp:app --reload |
| App factory | pounce serve --app myapp:create_app() |
| Testing | with TestServer(app) as server: ... |
Most settings also live in pounce.toml or pyproject.toml under [tool.pounce].
Run pounce config schema --output-format toml-template for every available field.
Features
| Feature | Description | Docs |
|---|---|---|
| Deployment | Production workers, compression, observability, and shutdown behavior | Deployment → |
| Migration | Move from Uvicorn with similar CLI concepts | Migrate from Uvicorn → |
| HTTP/1.1 | h11 (async) + fast built-in parser (sync) | HTTP/1.1 → |
| HTTP/2 | Optional stream multiplexing via h2 | HTTP/2 → |
| HTTP/3 | Optional QUIC/UDP via bengal-zoomies; limited parity until reload/drain and benchmark proof lands | HTTP/3 → |
| WebSocket | Optional RFC 6455 support via wsproto; WS-over-H2 requires h2 + ws extras | WebSocket → |
| Static Files | Pre-compressed files, ETags, range requests | Static Files → |
| Middleware | ASGI3 middleware stack support | Middleware → |
| OpenTelemetry | Optional distributed tracing (OTLP) | OpenTelemetry → |
| Lifecycle Logging | Structured JSON event logging | Logging → |
| Graceful Shutdown | Mode-scoped connection draining for deploys | Shutdown → |
| Dev Error Pages | Rich tracebacks with syntax highlighting | Errors → |
| TLS | SSL with truststore integration | TLS → |
| Compression | zstd (stdlib PEP 784) + gzip + WS compression | Compression → |
| Workers | Auto-detect: threads (3.14t) or processes (GIL) | Workers → |
| Auto Reload | Graceful restart on file changes | Reload → |
| Rate Limiting | Optional per-IP token bucket with 429 responses | Rate Limiting → |
| Request Queueing | Optional bounded queue with 503 load shedding | Request Queueing → |
| Prometheus | Optional /metrics endpoint | Metrics → |
| Sentry | Optional error tracking and performance monitoring | Sentry → |
| Introspection | Opt-in /_pounce/info endpoint for live config/runtime debugging (loopback-only by default) | Introspection → |
| Testing | TestServer + pytest fixture for integration tests | Testing → |
| Benchmarking | Built-in pounce bench command with comparative analysis | Bench → |
| Lifecycle Events | Public API for typed connection/request events | API → |
📚 Full documentation: lbliii.github.io/pounce | Complete Feature List →
Usage
Programmatic Configuration — Full control from Python
import pounce
pounce.run(
"myapp:app",
host="0.0.0.0",
port=8000,
workers=4,
)
How It Works — Adaptive worker model
On Python 3.14t (free-threading): workers are threads. One process, N threads, each with its own asyncio event loop. Shared memory, no fork overhead, no IPC.
On GIL builds: workers are processes. Same API, same config. The supervisor detects the
runtime via sys._is_gil_enabled() and adapts automatically.
A request flows through: socket accept -> protocol parser -> ASGI scope
construction -> app(scope, receive, send) -> response serialization -> socket write.
Async workers use h11; sync workers use a fast built-in parser for lower latency.
Protocol Extras — Install only what you need
| Protocol | Backend | Install |
|---|---|---|
| HTTP/1.1 | h11 (async) / fast built-in parser (sync) | built-in |
| HTTP/2 | h2 (stream multiplexing, priority signals) | bengal-pounce[h2] |
| WebSocket | wsproto (HTTP/1 WebSocket; WS-over-H2 also requires h2) | bengal-pounce[ws] |
| TLS | stdlib ssl + truststore | bengal-pounce[tls] |
| HTTP/3 | bengal-zoomies (QUIC/UDP) | bengal-pounce[h3] |
| All | Everything above | bengal-pounce[full] |
Compression uses Python 3.14's stdlib compression.zstd — zero external dependencies.
Testing — Real server for integration tests
from pounce.testing import TestServer
import httpx
def test_homepage(my_app):
with TestServer(my_app) as server:
resp = httpx.get(f"{server.url}/")
assert resp.status_code == 200
The pounce_server pytest fixture is auto-registered when pounce is installed:
def test_api(pounce_server, my_app):
server = pounce_server(my_app)
resp = httpx.get(f"{server.url}/health")
assert resp.status_code == 200
Key Ideas
- Free-threading first. Threads, not processes. One interpreter, N event loops, and a
frozen shared
ServerConfig. On GIL builds, falls back to multi-process automatically. - Pure Python. No Rust, no C extensions in the server core. Debuggable, hackable, readable.
- Typed end-to-end. Frozen config, typed ASGI definitions, and no new type suppressions without review.
- Lean dependencies. Two required runtime deps:
h11for HTTP/1.1 parsing andmilo-clifor the CLI. The request hot path depends only onh11; everything else is optional. - Observable by design. Lifecycle events are public API —
from pounce import BufferedCollector, ResponseCompleted. Frameworks build dashboards on typed events, not log parsing. - Framework tested. Verified against FastAPI, Starlette, Django, and Litestar with 48 integration tests.
- Optional helpers. Static files, middleware, rate limiting, request queueing, Prometheus metrics, Sentry, and OpenTelemetry are available without becoming mandatory request-path dependencies.
Documentation
| Section | Description |
|---|---|
| Get Started | Installation and quickstart |
| Protocols | HTTP/1.1, HTTP/2, WebSocket, HTTP/3 |
| Configuration | Server config, TLS, CLI |
| Deployment | Workers, compression, production |
| Extending | ASGI bridge, custom protocols |
| Tutorials | Uvicorn migration guide |
| Troubleshooting | Common issues and fixes |
| Reference | API documentation |
| About | Architecture, performance, FAQ |
Development
git clone https://github.com/lbliii/pounce.git
cd pounce
uv sync --group dev
pytest
See CONTRIBUTING.md for setup, feedback loops, and recipes (how to add a test, a config field, or an error). Read AGENTS.md for the project's design philosophy and stop-and-ask escape hatches.
The Bengal Ecosystem
A structured reactive stack — every layer written in pure Python for 3.14t free-threading.
| ᓚᘏᗢ | Bengal | Static site generator | Docs |
| ∿∿ | Purr | Content runtime | — |
| ⌁⌁ | Chirp | Web framework | Docs |
| =^..^= | Pounce | ASGI server ← You are here | Docs |
| )彡 | Kida | Template engine | Docs |
| ฅᨐฅ | Patitas | Markdown parser | Docs |
| ⌾⌾⌾ | Rosettes | Syntax highlighter | Docs |
Python-native. Free-threading ready. No npm required.
License
MIT