ฅᨐฅ Patitas

June 15, 2026 · View on GitHub

PyPI version Build Status Python 3.14+ License: MIT CommonMark ReDoS Safe

A Python Markdown parser and CommonMark parser for typed ASTs, frontmatter, directives, and notebook content.

from patitas import Markdown

md = Markdown()
html = md("# Hello **World**")

What is Patitas?

Patitas is a pure-Python Markdown parser that parses to a typed AST and renders to HTML. It's CommonMark 0.31.2 compliant, has a single runtime dependency (PyYAML, used for frontmatter), and is built for Python 3.14+.

Why people pick it:

  • ReDoS-resistant — hand-written FSM lexer, no regex backtracking; a bounded nesting depth guards against stack exhaustion on adversarial input. See security for the current threat model.
  • Typed AST — Frozen dataclasses (Heading, Paragraph, Strong, etc.) with IDE autocomplete and type checking.
  • CommonMark — Full 0.31.2 spec compliance (652 examples).
  • Incremental parsing — Re-parse only changed blocks; ~170x faster for small edits than full re-parse (up to ~430x for edits near the start of a document).
  • Free-threading native — Frozen AST, ContextVar config, no shared mutable parser state. Parallel speedups vary by Python build, hardware, and corpus.
  • LLM-saferender_llm + composable sanitize policies for RAG, retrieval, safe context.
  • Directives — MyST-style blocks (admonition, dropdown, tabs) plus custom directives.
  • Notebook + frontmatter support — Parse .ipynb content and YAML frontmatter as part of content pipelines.

Use Patitas For

  • Markdown to HTML pipelines — Render docs, blogs, and site content from Python
  • Typed Markdown processing — Analyze or transform documents through a typed AST
  • Secure user-input parsing — Handle untrusted Markdown without regex backtracking risk
  • Content tooling — Frontmatter extraction, excerpts, meta descriptions, and notebook conversion
  • Modern docs stacks — Directives, syntax highlighting, LLM-safe rendering, and incremental parsing

What it does

FunctionDescription
parse(source)Parse Markdown to typed AST
parse_frontmatter(content)Parse YAML frontmatter to (metadata, body)
parse_notebook(content, source_path?)Parse Jupyter .ipynb to (markdown, metadata)
parse_incremental(new, prev, ...)Re-parse only the changed region (O(change))
render(doc)Render AST to HTML
render_llm(doc)Render AST to LLM-friendly plain text (no HTML)
sanitize(doc, policy)Strip HTML, dangerous URLs, zero-width chars
lint(source)Lint Markdown over the typed AST, returning Diagnostics
extract_text(node)Extract plain text from any AST node
extract_excerpt(ast, source, ...)Structurally correct excerpt from AST (list previews, meta)
extract_meta_description(ast, source)Meta description from first paragraph/heading
extract_body(content)Strip --- delimited frontmatter block (no YAML parse)
Markdown()All-in-one parser and renderer

The public compatibility boundary is the top-level patitas package. Prefer from patitas import ... for integrations; direct imports from parser, lexer, and renderer implementation modules are still pre-1.0 internals unless the API reference says otherwise.


More Features

  • ReDoS-resistant — hand-written FSM lexer, no regex backtracking; a bounded nesting depth guards against stack exhaustion on adversarial input. See security for the current threat model.
  • Typed AST — Frozen dataclasses (Heading, Paragraph, Strong, etc.) with IDE autocomplete and type checking.
  • CommonMark — Full 0.31.2 spec compliance (652 examples).
  • Incremental parsing — Re-parse only changed blocks; ~170x faster for small edits than full re-parse (up to ~430x for edits near the start of a document).
  • Free-threading native — Frozen AST, ContextVar config, no shared mutable parser state. Parallel speedups vary by Python build, hardware, and corpus.
  • LLM-saferender_llm + composable sanitize policies for RAG, retrieval, safe context.
  • Directives — MyST-style blocks (admonition, dropdown, tabs) plus custom directives.
  • Plugins — Tables, footnotes, math, strikethrough, task lists.
  • Lintinglint(source) + a stateless LintRule protocol; a ruff-for-Markdown with three starter rules. See linting.
  • Minimal dependencies — PyYAML for frontmatter; core parser is pure Python.

GFM-style features are available through plugins. Measured GFM 0.29 compliance is 654/672 (97.3%) with the GFM plugins enabled (Markdown(plugins=["table", "strikethrough", "task_lists", "autolinks"])); the remaining gap is the tagfilter extension, CommonMark 0.28-vs-0.31.2 emphasis drift, and a few autolink edge cases. See GFM compliance tracking.


Installation

pip install patitas

Requires Python 3.14+

Optional extras:

pip install patitas[syntax]      # Syntax highlighting via Rosettes
pip install patitas[all]         # All optional features

Quick Start

Parse and render

from patitas import parse, render

doc = parse("# Hello **World**")
html = render(doc)
# <h1 id="hello-world">Hello <strong>World</strong></h1>

Frontmatter

Parse YAML frontmatter from Markdown or other content, returning a (metadata, body) tuple:

from patitas import parse_frontmatter, extract_body

content = """---
title: Hello
weight: 10
---
# Body content
"""
metadata, body = parse_frontmatter(content)
# metadata: {"title": "Hello", "weight": 10.0}
# body: "# Body content"

# When YAML is broken, extract_body strips the --- block without parsing
body_only = extract_body(content)

Notebook support

Parse Jupyter notebooks (.ipynb) to Markdown content and metadata — stdlib JSON only:

from patitas import parse_notebook

with open("demo.ipynb") as f:
    content, metadata = parse_notebook(f.read(), "demo.ipynb")

# content: Markdown string (cells → fenced code, outputs → HTML)
# metadata: title, type, notebook{kernel_name, cell_count}, etc.

Security

Patitas is immune to ReDoS attacks.

Traditional Markdown parsers use regex patterns vulnerable to catastrophic backtracking:

# Malicious input that can freeze regex-based parsers
evil = "a](" + "\\)" * 10000

# Patitas: completes in milliseconds (no catastrophic backtracking)

Patitas uses a hand-written finite state machine lexer:

  • Single character lookahead — No backtracking, ever
  • Linear time for typical input — a few non-lexer paths remain super-linear on adversarial input; see Known limitations
  • Safe for untrusted input — pair with input-size limits and timeouts; see Recommendations

⚠️ render() does not sanitize output. The default renderer is CommonMark-compliant and passes raw HTML and javascript:/data: URLs through verbatim. For untrusted content, sanitize the AST before rendering (or render to plain text with render_llm()):

from patitas import parse, render, sanitize
from patitas.sanitize import web_safe

doc = parse(untrusted_markdown)
# Strip HTML + disallowed URL schemes, then render HTML
html = render(sanitize(doc, policy=web_safe))

Learn more about Patitas security →


Performance

  • 652 CommonMark examples — measure on your machine with benchmarks/benchmark_vs_mistune.py; recent Python 3.14.2 free-threaded local runs are in the low tens of milliseconds.

  • Incremental parsing — For a 1-char edit in a ~100KB doc, parse_incremental is roughly 170x faster than full re-parse in the bundled benchmark (and up to ~430x for edits near the start of the document).

  • Parallel scaling — Free-threaded speedups depend on Python build, hardware, corpus size, and optional comparator packages. Run python benchmarks/benchmark_parallel.py to see results on your machine. One recent local run on Python 3.14.2 with 1,000 CommonMark documents:

      Threads    Time      Speedup
      1          0.07s     1.00x
      2          0.05s     1.42x
      4          0.04s     1.64x
      8          0.04s     1.75x
    
# From repo (after uv sync --group dev):
python benchmarks/benchmark_vs_mistune.py
python benchmarks/benchmark_parallel.py   # Free-threading scaling
pytest benchmarks/benchmark_vs_mistune.py benchmarks/benchmark_incremental.py benchmarks/benchmark_directives.py benchmarks/benchmark_scaling.py benchmarks/benchmark_excerpt.py -v --benchmark-only --benchmark-group-by=group

See benchmarks/README.md for the full suite (pipelines, phase-breakdown, CI threshold checks).


Usage

Typed AST — IDE autocomplete, catch errors at dev time
from patitas import parse
from patitas.nodes import Heading, Paragraph, Strong

doc = parse("# Hello **World**")
heading = doc.children[0]

# Full type safety
assert isinstance(heading, Heading)
assert heading.level == 1

# IDE knows the types!
for child in heading.children:
    if isinstance(child, Strong):
        print(f"Bold text: {child.children}")

All nodes are @dataclass(frozen=True, slots=True) — immutable and memory-efficient.

Directives — MyST-style blocks
:::{note}
This is a note admonition.
:::

:::{warning}
This is a warning.
:::

:::{dropdown} Click to expand
Hidden content here.
:::

:::{tab-set}

:::{tab-item} Python
Python code here.
:::

:::{tab-item} JavaScript
JavaScript code here.
:::

:::
Custom Directives — Extend with your own
from patitas import Markdown, create_registry_with_defaults
from patitas.directives.decorator import directive

# Define a custom directive with the @directive decorator
@directive("alert")
def render_alert(node, children: str, sb) -> None:
    sb.append(f'<div class="alert">{children}</div>')

# Extend defaults with your directive
builder = create_registry_with_defaults()  # Has admonition, dropdown, tabs
builder.register(render_alert())

# Use it
md = Markdown(directive_registry=builder.build())
html = md(":::{alert} This is important!\n:::")
Syntax Highlighting

With pip install patitas[syntax]:

from patitas import Markdown

md = Markdown(highlight=True)

html = md("""
```python
def hello():
    print("Highlighted!")

""")


Uses [Rosettes](https://github.com/lbliii/rosettes) for O(n) highlighting.

</details>

<details>
<summary><strong>Free-Threading</strong> — Python 3.14t</summary>

```python
from concurrent.futures import ThreadPoolExecutor
from patitas import parse

documents = ["# Doc " + str(i) for i in range(1000)]

with ThreadPoolExecutor() as executor:
    # Safe to parse in parallel — no shared mutable state
    results = list(executor.map(parse, documents))

Patitas is designed for Python 3.14t's free-threading mode (PEP 703).

LLM Safety — Sanitize and render for RAG, retrieval

When sending Markdown to an LLM, sanitize untrusted content and render to plain text:

from patitas import parse, sanitize, render_llm
from patitas.sanitize import llm_safe

doc = parse(user_content)
clean = sanitize(doc, policy=llm_safe)  # Strip HTML, dangerous URLs, zero-width chars
safe_text = render_llm(clean, source=user_content)

Pre-built policies: llm_safe, web_safe (alias), strict. Compose with |.


Migrate from mistune

Same API — swap the import:

from patitas import Markdown
md = Markdown()
html = md(source)

Full migration guide →


The Bengal Ecosystem

A structured reactive stack — every layer written in pure Python for 3.14t free-threading.

ᓚᘏᗢBengalStatic site generatorDocs
∿∿PurrContent runtime
⌁⌁ChirpWeb frameworkDocs
=^..^=PounceASGI serverDocs
)彡KidaTemplate engineDocs
ฅᨐฅPatitasMarkdown parser ← You are hereDocs
⌾⌾⌾RosettesSyntax highlighterDocs

Python-native. Free-threading ready. No npm required.


Development

git clone https://github.com/lbliii/patitas.git
cd patitas
uv sync --group dev
pytest

Run benchmarks (after uv sync --group dev):

python benchmarks/benchmark_vs_mistune.py
python benchmarks/benchmark_parallel.py   # Free-threading scaling demo
pytest benchmarks/benchmark_*.py -v --benchmark-only --benchmark-group-by=group   # Full suite

License

MIT License — see LICENSE for details.