=^..^= Pounce

June 18, 2026 · View on GitHub

PyPI version Build Status Python 3.14+ License: MIT Status: Beta

Pure Python ASGI server for Python 3.14t, with a frozen config model and a low-overhead HTTP/1.1 fast path.

import pounce

pounce.run("myapp:app")

What is Pounce?

Pounce is a Python ASGI server for Python 3.14+, with a worker model designed for free-threaded Python 3.14t. It runs standard ASGI applications, supports streaming responses, and gives you a clear upgrade path from process-based servers such as Uvicorn.

Pounce's built-in HTTP/1.1 parser is optimized for the sync worker hot path, its ServerConfig object is frozen after construction, and thread-worker reloads use generational worker swaps with drain behavior.

On Python 3.14t, worker threads share one interpreter and one copy of your app. On GIL builds, Pounce falls back to multi-process workers automatically.

Why people pick it:

  • ASGI-first — Runs standard ASGI apps with CLI and programmatic entry points
  • Free-threading native — True thread parallelism with a frozen shared ServerConfig
  • Fast-path parsing — Built-in HTTP/1.1 parser for sync workers with tested smuggling and header-limit checks
  • Protocol extras — HTTP/2 and WebSocket are install-gated optional paths; HTTP/3 is optional with limited parity
  • Thread-worker reloads — Rolling restart uses generational worker swap with drain behavior on supported worker modes
  • Observable surfaces — Typed lifecycle events, optional Prometheus metrics, OpenTelemetry, and Server-Timing headers
  • Optional helpers — TLS, compression, static files, middleware, rate limiting, and request queueing stay opt-in
  • Migration path — Familiar CLI for teams moving from Uvicorn-style deployments

See docs/design/core-contract.md for the supported core, optional helpers, and proof required for public claims.

Use Pounce For

  • Serving ASGI apps — Tunable workers, TLS, graceful shutdown, and deployment controls
  • Free-threaded Python deployments — Shared-memory worker threads on Python 3.14t
  • Streaming workloads — Server-sent events, streamed HTML, and token-by-token responses
  • Teams migrating from Uvicorn — Similar CLI shape with a different worker model

Framework Compatibility

Tested in CI with 48 integration tests across every major ASGI framework:

FrameworkStatusFeatures Verified
FastAPI 0.135+FullRouting, Pydantic validation, dependency injection, middleware, exception handlers, lifespan, WebSocket, streaming, OpenAPI schema
Starlette 1.0+FullRouting, BaseHTTPMiddleware, lifespan with state, streaming, WebSocket, background tasks, exception handlers
Django 6.0+FullAsync views, URL routing, path params, JSON body, query params, middleware, error handling
Litestar 2.21+CoreRouting, dependency injection, middleware, lifespan, streaming, exception handlers. WebSocket: known routing issue

Pounce achieves compatibility through correct ASGI 3.0 implementation — no framework-specific code or workarounds.


Performance

Pounce is designed to make the pure-Python request path competitive while keeping the server core free of C extensions. Treat the numbers below as a benchmark snapshot, not a universal guarantee.

ScenarioPounceUvicornNotes
1 worker~7.2k req/s~6.5k req/sAsync event loop, h11 parser
4 workers~16k req/s~17k req/sThreads (pounce) vs processes (uvicorn)

Measured with wrk -t4 -c100 -d10s on macOS Apple Silicon, plain-text "hello world" ASGI app, Python 3.14t. Re-run locally before making deployment decisions.

Run pounce bench --workers 4 --compare to reproduce on your machine. For release or PR evidence, use python benchmarks/run_benchmark.py --repeat 5 --artifact-output results.json so the run carries the metadata required by benchmarks/artifact-schema.json and grouped variance across samples.

Key optimizations in the sync worker path:

  • Fast HTTP/1.1 parser — Direct bytes parsing is benchmarked separately from h11 and covers method validation, header size limits, duplicate Content-Length, and Content-Length/Transfer-Encoding ambiguity
  • Keep-alive connections — Connection reuse eliminates TCP handshake overhead
  • Shared socket distribution — Single accept queue for thread workers avoids macOS SO_REUSEPORT limitations

Installation

pip install bengal-pounce

Requires Python 3.14+

Optional extras:

pip install bengal-pounce[h2]     # HTTP/2 stream multiplexing
pip install bengal-pounce[ws]     # WebSocket via wsproto
pip install bengal-pounce[tls]    # TLS with truststore
pip install bengal-pounce[h3]     # HTTP/3 (QUIC/UDP, requires TLS; limited parity)
pip install bengal-pounce[full]   # All protocol extras

Quick Start

UsageCommand
Programmaticpounce.run("myapp:app")
CLIpounce serve --app myapp:app
Multi-workerpounce serve --app myapp:app --workers 4
TLSpounce serve --app myapp:app --ssl-certfile cert.pem --ssl-keyfile key.pem
HTTP/3pounce serve --app myapp:app --http3 --ssl-certfile cert.pem --ssl-keyfile key.pem
Dev reloadpounce serve --app myapp:app --reload
App factorypounce serve --app myapp:create_app()
Testingwith TestServer(app) as server: ...

Most settings also live in pounce.toml or pyproject.toml under [tool.pounce]. Run pounce config schema --output-format toml-template for every available field.


Features

FeatureDescriptionDocs
DeploymentProduction workers, compression, observability, and shutdown behaviorDeployment →
MigrationMove from Uvicorn with similar CLI conceptsMigrate from Uvicorn →
HTTP/1.1h11 (async) + fast built-in parser (sync)HTTP/1.1 →
HTTP/2Optional stream multiplexing via h2HTTP/2 →
HTTP/3Optional QUIC/UDP via bengal-zoomies; limited parity until reload/drain and benchmark proof landsHTTP/3 →
WebSocketOptional RFC 6455 support via wsproto; WS-over-H2 requires h2 + ws extrasWebSocket →
Static FilesPre-compressed files, ETags, range requestsStatic Files →
MiddlewareASGI3 middleware stack supportMiddleware →
OpenTelemetryOptional distributed tracing (OTLP)OpenTelemetry →
Lifecycle LoggingStructured JSON event loggingLogging →
Graceful ShutdownMode-scoped connection draining for deploysShutdown →
Dev Error PagesRich tracebacks with syntax highlightingErrors →
TLSSSL with truststore integrationTLS →
Compressionzstd (stdlib PEP 784) + gzip + WS compressionCompression →
WorkersAuto-detect: threads (3.14t) or processes (GIL)Workers →
Auto ReloadGraceful restart on file changesReload →
Rate LimitingOptional per-IP token bucket with 429 responsesRate Limiting →
Request QueueingOptional bounded queue with 503 load sheddingRequest Queueing →
PrometheusOptional /metrics endpointMetrics →
SentryOptional error tracking and performance monitoringSentry →
IntrospectionOpt-in /_pounce/info endpoint for live config/runtime debugging (loopback-only by default)Introspection →
TestingTestServer + pytest fixture for integration testsTesting →
BenchmarkingBuilt-in pounce bench command with comparative analysisBench →
Lifecycle EventsPublic API for typed connection/request eventsAPI →

📚 Full documentation: lbliii.github.io/pounce | Complete Feature List →


Usage

Programmatic Configuration — Full control from Python
import pounce

pounce.run(
    "myapp:app",
    host="0.0.0.0",
    port=8000,
    workers=4,
)
How It Works — Adaptive worker model

On Python 3.14t (free-threading): workers are threads. One process, N threads, each with its own asyncio event loop. Shared memory, no fork overhead, no IPC.

On GIL builds: workers are processes. Same API, same config. The supervisor detects the runtime via sys._is_gil_enabled() and adapts automatically.

A request flows through: socket accept -> protocol parser -> ASGI scope construction -> app(scope, receive, send) -> response serialization -> socket write. Async workers use h11; sync workers use a fast built-in parser for lower latency.

Protocol Extras — Install only what you need
ProtocolBackendInstall
HTTP/1.1h11 (async) / fast built-in parser (sync)built-in
HTTP/2h2 (stream multiplexing, priority signals)bengal-pounce[h2]
WebSocketwsproto (HTTP/1 WebSocket; WS-over-H2 also requires h2)bengal-pounce[ws]
TLSstdlib ssl + truststorebengal-pounce[tls]
HTTP/3bengal-zoomies (QUIC/UDP)bengal-pounce[h3]
AllEverything abovebengal-pounce[full]

Compression uses Python 3.14's stdlib compression.zstd — zero external dependencies.

Testing — Real server for integration tests
from pounce.testing import TestServer
import httpx

def test_homepage(my_app):
    with TestServer(my_app) as server:
        resp = httpx.get(f"{server.url}/")
        assert resp.status_code == 200

The pounce_server pytest fixture is auto-registered when pounce is installed:

def test_api(pounce_server, my_app):
    server = pounce_server(my_app)
    resp = httpx.get(f"{server.url}/health")
    assert resp.status_code == 200

Key Ideas

  • Free-threading first. Threads, not processes. One interpreter, N event loops, and a frozen shared ServerConfig. On GIL builds, falls back to multi-process automatically.
  • Pure Python. No Rust, no C extensions in the server core. Debuggable, hackable, readable.
  • Typed end-to-end. Frozen config, typed ASGI definitions, and no new type suppressions without review.
  • Lean dependencies. Two required runtime deps: h11 for HTTP/1.1 parsing and milo-cli for the CLI. The request hot path depends only on h11; everything else is optional.
  • Observable by design. Lifecycle events are public API — from pounce import BufferedCollector, ResponseCompleted. Frameworks build dashboards on typed events, not log parsing.
  • Framework tested. Verified against FastAPI, Starlette, Django, and Litestar with 48 integration tests.
  • Optional helpers. Static files, middleware, rate limiting, request queueing, Prometheus metrics, Sentry, and OpenTelemetry are available without becoming mandatory request-path dependencies.

Documentation

📚 lbliii.github.io/pounce

SectionDescription
Get StartedInstallation and quickstart
ProtocolsHTTP/1.1, HTTP/2, WebSocket, HTTP/3
ConfigurationServer config, TLS, CLI
DeploymentWorkers, compression, production
ExtendingASGI bridge, custom protocols
TutorialsUvicorn migration guide
TroubleshootingCommon issues and fixes
ReferenceAPI documentation
AboutArchitecture, performance, FAQ

Development

git clone https://github.com/lbliii/pounce.git
cd pounce
uv sync --group dev
pytest

See CONTRIBUTING.md for setup, feedback loops, and recipes (how to add a test, a config field, or an error). Read AGENTS.md for the project's design philosophy and stop-and-ask escape hatches.


The Bengal Ecosystem

A structured reactive stack — every layer written in pure Python for 3.14t free-threading.

ᓚᘏᗢBengalStatic site generatorDocs
∿∿PurrContent runtime
⌁⌁ChirpWeb frameworkDocs
=^..^=PounceASGI server ← You are hereDocs
)彡KidaTemplate engineDocs
ฅᨐฅPatitasMarkdown parserDocs
⌾⌾⌾RosettesSyntax highlighterDocs

Python-native. Free-threading ready. No npm required.


License

MIT