=^..^= Pounce

June 18, 2026 · View on GitHub

Pure Python ASGI server for Python 3.14t, with a frozen config model and a low-overhead HTTP/1.1 fast path.

import pounce

pounce.run("myapp:app")

What is Pounce?

Pounce is a Python ASGI server for Python 3.14+, with a worker model designed for free-threaded Python 3.14t. It runs standard ASGI applications, supports streaming responses, and gives you a clear upgrade path from process-based servers such as Uvicorn.

Pounce's built-in HTTP/1.1 parser is optimized for the sync worker hot path, its ServerConfig object is frozen after construction, and thread-worker reloads use generational worker swaps with drain behavior.

On Python 3.14t, worker threads share one interpreter and one copy of your app. On GIL builds, Pounce falls back to multi-process workers automatically.

Why people pick it:

ASGI-first — Runs standard ASGI apps with CLI and programmatic entry points
Free-threading native — True thread parallelism with a frozen shared ServerConfig
Fast-path parsing — Built-in HTTP/1.1 parser for sync workers with tested smuggling and header-limit checks
Protocol extras — HTTP/2 and WebSocket are install-gated optional paths; HTTP/3 is optional with limited parity
Thread-worker reloads — Rolling restart uses generational worker swap with drain behavior on supported worker modes
Observable surfaces — Typed lifecycle events, optional Prometheus metrics, OpenTelemetry, and Server-Timing headers
Optional helpers — TLS, compression, static files, middleware, rate limiting, and request queueing stay opt-in
Migration path — Familiar CLI for teams moving from Uvicorn-style deployments

See docs/design/core-contract.md for the supported core, optional helpers, and proof required for public claims.

Use Pounce For

Serving ASGI apps — Tunable workers, TLS, graceful shutdown, and deployment controls
Free-threaded Python deployments — Shared-memory worker threads on Python 3.14t
Streaming workloads — Server-sent events, streamed HTML, and token-by-token responses
Teams migrating from Uvicorn — Similar CLI shape with a different worker model

Framework Compatibility

Tested in CI with 48 integration tests across every major ASGI framework:

Framework	Status	Features Verified
FastAPI 0.135+	Full	Routing, Pydantic validation, dependency injection, middleware, exception handlers, lifespan, WebSocket, streaming, OpenAPI schema
Starlette 1.0+	Full	Routing, BaseHTTPMiddleware, lifespan with state, streaming, WebSocket, background tasks, exception handlers
Django 6.0+	Full	Async views, URL routing, path params, JSON body, query params, middleware, error handling
Litestar 2.21+	Core	Routing, dependency injection, middleware, lifespan, streaming, exception handlers. WebSocket: known routing issue

Pounce achieves compatibility through correct ASGI 3.0 implementation — no framework-specific code or workarounds.

Performance

Pounce is designed to make the pure-Python request path competitive while keeping the server core free of C extensions. Treat the numbers below as a benchmark snapshot, not a universal guarantee.

Scenario	Pounce	Uvicorn	Notes
1 worker	~7.2k req/s	~6.5k req/s	Async event loop, h11 parser
4 workers	~16k req/s	~17k req/s	Threads (pounce) vs processes (uvicorn)

Measured with wrk -t4 -c100 -d10s on macOS Apple Silicon, plain-text "hello world" ASGI app, Python 3.14t. Re-run locally before making deployment decisions.

Run pounce bench --workers 4 --compare to reproduce on your machine. For release or PR evidence, use python benchmarks/run_benchmark.py --repeat 5 --artifact-output results.json so the run carries the metadata required by benchmarks/artifact-schema.json and grouped variance across samples.

Key optimizations in the sync worker path:

Fast HTTP/1.1 parser — Direct bytes parsing is benchmarked separately from h11 and covers method validation, header size limits, duplicate Content-Length, and Content-Length/Transfer-Encoding ambiguity
Keep-alive connections — Connection reuse eliminates TCP handshake overhead
Shared socket distribution — Single accept queue for thread workers avoids macOS SO_REUSEPORT limitations

Installation

pip install bengal-pounce

Requires Python 3.14+

Optional extras:

pip install bengal-pounce[h2]     # HTTP/2 stream multiplexing
pip install bengal-pounce[ws]     # WebSocket via wsproto
pip install bengal-pounce[tls]    # TLS with truststore
pip install bengal-pounce[h3]     # HTTP/3 (QUIC/UDP, requires TLS; limited parity)
pip install bengal-pounce[full]   # All protocol extras

Quick Start

Usage	Command
Programmatic	`pounce.run("myapp:app")`
CLI	`pounce serve --app myapp:app`
Multi-worker	`pounce serve --app myapp:app --workers 4`
TLS	`pounce serve --app myapp:app --ssl-certfile cert.pem --ssl-keyfile key.pem`
HTTP/3	`pounce serve --app myapp:app --http3 --ssl-certfile cert.pem --ssl-keyfile key.pem`
Dev reload	`pounce serve --app myapp:app --reload`
App factory	`pounce serve --app myapp:create_app()`
Testing	`with TestServer(app) as server: ...`

Most settings also live in pounce.toml or pyproject.toml under [tool.pounce]. Run pounce config schema --output-format toml-template for every available field.

Features

Feature	Description	Docs
Deployment	Production workers, compression, observability, and shutdown behavior	Deployment →
Migration	Move from Uvicorn with similar CLI concepts	Migrate from Uvicorn →
HTTP/1.1	h11 (async) + fast built-in parser (sync)	HTTP/1.1 →
HTTP/2	Optional stream multiplexing via h2	HTTP/2 →
HTTP/3	Optional QUIC/UDP via bengal-zoomies; limited parity until reload/drain and benchmark proof lands	HTTP/3 →
WebSocket	Optional RFC 6455 support via wsproto; WS-over-H2 requires h2 + ws extras	WebSocket →
Static Files	Pre-compressed files, ETags, range requests	Static Files →
Middleware	ASGI3 middleware stack support	Middleware →
OpenTelemetry	Optional distributed tracing (OTLP)	OpenTelemetry →
Lifecycle Logging	Structured JSON event logging	Logging →
Graceful Shutdown	Mode-scoped connection draining for deploys	Shutdown →
Dev Error Pages	Rich tracebacks with syntax highlighting	Errors →
TLS	SSL with truststore integration	TLS →
Compression	zstd (stdlib PEP 784) + gzip + WS compression	Compression →
Workers	Auto-detect: threads (3.14t) or processes (GIL)	Workers →
Auto Reload	Graceful restart on file changes	Reload →
Rate Limiting	Optional per-IP token bucket with 429 responses	Rate Limiting →
Request Queueing	Optional bounded queue with 503 load shedding	Request Queueing →
Prometheus	Optional `/metrics` endpoint	Metrics →
Sentry	Optional error tracking and performance monitoring	Sentry →
Introspection	Opt-in `/_pounce/info` endpoint for live config/runtime debugging (loopback-only by default)	Introspection →
Testing	`TestServer` + pytest fixture for integration tests	Testing →
Benchmarking	Built-in `pounce bench` command with comparative analysis	Bench →
Lifecycle Events	Public API for typed connection/request events	API →

📚 Full documentation: lbliii.github.io/pounce | Complete Feature List →

Usage

Programmatic Configuration — Full control from Python

import pounce

pounce.run(
    "myapp:app",
    host="0.0.0.0",
    port=8000,
    workers=4,
)

How It Works — Adaptive worker model

On Python 3.14t (free-threading): workers are threads. One process, N threads, each with its own asyncio event loop. Shared memory, no fork overhead, no IPC.

On GIL builds: workers are processes. Same API, same config. The supervisor detects the runtime via sys._is_gil_enabled() and adapts automatically.

A request flows through: socket accept -> protocol parser -> ASGI scope construction -> app(scope, receive, send) -> response serialization -> socket write. Async workers use h11; sync workers use a fast built-in parser for lower latency.

Protocol Extras — Install only what you need

Protocol	Backend	Install
HTTP/1.1	h11 (async) / fast built-in parser (sync)	built-in
HTTP/2	h2 (stream multiplexing, priority signals)	`bengal-pounce[h2]`
WebSocket	wsproto (HTTP/1 WebSocket; WS-over-H2 also requires h2)	`bengal-pounce[ws]`
TLS	stdlib ssl + truststore	`bengal-pounce[tls]`
HTTP/3	bengal-zoomies (QUIC/UDP)	`bengal-pounce[h3]`
All	Everything above	`bengal-pounce[full]`

Compression uses Python 3.14's stdlib compression.zstd — zero external dependencies.

Testing — Real server for integration tests

from pounce.testing import TestServer
import httpx

def test_homepage(my_app):
    with TestServer(my_app) as server:
        resp = httpx.get(f"{server.url}/")
        assert resp.status_code == 200

The pounce_server pytest fixture is auto-registered when pounce is installed:

def test_api(pounce_server, my_app):
    server = pounce_server(my_app)
    resp = httpx.get(f"{server.url}/health")
    assert resp.status_code == 200

Key Ideas

Free-threading first. Threads, not processes. One interpreter, N event loops, and a frozen shared ServerConfig. On GIL builds, falls back to multi-process automatically.
Pure Python. No Rust, no C extensions in the server core. Debuggable, hackable, readable.
Typed end-to-end. Frozen config, typed ASGI definitions, and no new type suppressions without review.
Lean dependencies. Two required runtime deps: h11 for HTTP/1.1 parsing and milo-cli for the CLI. The request hot path depends only on h11; everything else is optional.
Observable by design. Lifecycle events are public API — from pounce import BufferedCollector, ResponseCompleted. Frameworks build dashboards on typed events, not log parsing.
Framework tested. Verified against FastAPI, Starlette, Django, and Litestar with 48 integration tests.
Optional helpers. Static files, middleware, rate limiting, request queueing, Prometheus metrics, Sentry, and OpenTelemetry are available without becoming mandatory request-path dependencies.

Documentation

📚 lbliii.github.io/pounce

Section	Description
Get Started	Installation and quickstart
Protocols	HTTP/1.1, HTTP/2, WebSocket, HTTP/3
Configuration	Server config, TLS, CLI
Deployment	Workers, compression, production
Extending	ASGI bridge, custom protocols
Tutorials	Uvicorn migration guide
Troubleshooting	Common issues and fixes
Reference	API documentation
About	Architecture, performance, FAQ

Development

git clone https://github.com/lbliii/pounce.git
cd pounce
uv sync --group dev
pytest

See CONTRIBUTING.md for setup, feedback loops, and recipes (how to add a test, a config field, or an error). Read AGENTS.md for the project's design philosophy and stop-and-ask escape hatches.

The Bengal Ecosystem

A structured reactive stack — every layer written in pure Python for 3.14t free-threading.


ᓚᘏᗢ	Bengal	Static site generator	Docs
∿∿	Purr	Content runtime	—
⌁⌁	Chirp	Web framework	Docs
=^..^=	Pounce	ASGI server ← You are here	Docs
)彡	Kida	Template engine	Docs
ฅᨐฅ	Patitas	Markdown parser	Docs
⌾⌾⌾	Rosettes	Syntax highlighter	Docs

Python-native. Free-threading ready. No npm required.

License

MIT