servo-fetch

June 14, 2026 · View on GitHub

CI PyPI Python uv Ruff

Python bindings for servo-fetch — fetch, render, and extract web content with an embedded Servo browser engine.

  • No Chromium — single binary, no browser download
  • JavaScript execution — full Servo engine with SpiderMonkey
  • Schema extraction — declarative CSS-selector → structured JSON, no LLM
  • Async-readyasyncio.to_thread wrappers, AsyncClient with streaming crawl
  • Typed — full .pyi stubs, works with ty / mypy / pyright

Install

pip install servo-fetch

Quick Start

import servo_fetch
page = servo_fetch.fetch("https://example.com")
page.html          # rendered HTML
page.inner_text    # document.body.innerText
page.markdown      # readable Markdown (lazy, cached)
page.title         # str | None

Schema Extraction

from servo_fetch import Schema, Field

schema = Schema(
    base_selector=".product",
    fields=[
        Field(name="title", selector="h2", type="text"),
        Field(name="price", selector=".price", type="text"),
        Field(name="url", selector="a", type="attribute", attribute="href"),
    ],
)

page = servo_fetch.fetch("https://shop.example.com", schema=schema)
page.extracted  # [{"title": "...", "price": "...", "url": "..."}]

Session Cookies

To fetch authenticated pages, pass a str or os.PathLike path to a Netscape-format cookies.txt via cookies_file:

import servo_fetch

page = servo_fetch.fetch("https://app.example.com/dashboard", cookies_file="cookies.txt")

# Also accepted by Client.fetch / Client.crawl and their async equivalents.
client = servo_fetch.Client(user_agent="MyBot/1.0")
pages = client.crawl("https://app.example.com", cookies_file="cookies.txt", max_pages=20)

A missing or malformed file raises servo_fetch.CookieError. Cookies are scoped to the target's site, so out-of-scope entries in the file are ignored.

Custom Headers

Pass a dict[str, str] via headers to add request headers (e.g. API tokens). Accepted by fetch, Client.fetch / Client.crawl / Client.map, and their async equivalents:

import servo_fetch

page = servo_fetch.fetch("https://api.example.com", headers={"Authorization": "Bearer TOKEN"})

Framing headers (Host, Content-Length, …) are rejected, and User-Agent / Cookie have dedicated options; invalid headers raise ValueError.

Async

from servo_fetch import fetch_async, AsyncClient

page = await fetch_async("https://example.com")

async with AsyncClient(user_agent="MyBot/1.0") as client:
    async for page in client.crawl_stream("https://docs.example.com", max_pages=50):
        print(page.url, page.title)

Develop

Requires uv.

uv sync --group all              # create venv + install dev deps
uv run maturin develop           # build extension (debug, fast compile)
uv run pytest                    # run tests
uv run ruff check python tests   # lint
uv run ty check python           # type check

Troubleshooting

Linux: "cannot allocate memory in static TLS block"

Servo's native extension uses large thread-local storage. On some Linux systems, set before importing:

export GLIBC_TUNABLES=glibc.rtld.optional_static_tls=16384