html-to-markdown

May 13, 2026 · View on GitHub

Built with alef Rust Python Node.js WASM Java Go C# PHP Ruby Elixir R C Documentation License Live Demo
Banner
Discord

High-performance HTML to Markdown conversion powered by Rust. Ships as native bindings for Rust, Python, TypeScript/Node.js, Ruby, PHP, Go, Java, C#, Elixir, R, C (FFI), and WebAssembly with identical rendering across all runtimes.

Documentation | Live Demo | API Reference

Highlights

  • Rust-native throughput with html5ever parsing
  • 12 language bindings with consistent output across all runtimes
  • Structured resultconvert() returns ConversionResult with content, metadata, tables, images, and warnings
  • Metadata extraction — title, headers, links, images, structured data (JSON-LD, Microdata, RDFa)
  • Visitor pattern — custom callbacks for content filtering, URL rewriting, domain-specific dialects
  • Table extraction — extract structured table data (cells, headers, rendered markdown) during conversion
  • Secure by default — built-in HTML sanitization via ammonia

Quick Start

# Rust
cargo add html-to-markdown-rs

# Python
pip install html-to-markdown

# TypeScript / Node.js
npm install @kreuzberg/html-to-markdown-node

# Ruby
gem install html-to-markdown

# CLI
cargo install html-to-markdown-cli
# or
brew install kreuzberg-dev/tap/html-to-markdown

See the Installation Guide for all languages including PHP, Go, Java, C#, Elixir, R, and WASM.

Usage

convert() is the single entry point. It returns a structured ConversionResult:

# Python
from html_to_markdown import convert

result = convert("<h1>Hello</h1><p>World</p>")
print(result.content)        # # Hello\n\nWorld
print(result.metadata)       # title, links, headings, …
// TypeScript / Node.js
import { convert } from "@kreuzberg/html-to-markdown-node";

const result = convert("<h1>Hello</h1><p>World</p>");
console.log(result.content); // # Hello\n\nWorld
console.log(result.metadata); // title, links, headings, …
// Rust
use html_to_markdown_rs::convert;

let result = convert("<h1>Hello</h1><p>World</p>", None)?;
println!("{}", result.content.unwrap_or_default());

Language Bindings

LanguagePackageInstall
Rusthtml-to-markdown-rscargo add html-to-markdown-rs
Pythonhtml-to-markdownpip install html-to-markdown
TypeScript / Node.js@kreuzberg/html-to-markdown-nodenpm install @kreuzberg/html-to-markdown-node
WebAssembly@kreuzberg/html-to-markdown-wasmnpm install @kreuzberg/html-to-markdown-wasm
Rubyhtml-to-markdowngem install html-to-markdown
PHPkreuzberg-dev/html-to-markdowncomposer require kreuzberg-dev/html-to-markdown
Gohtmltomarkdowngo get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3
Javadev.kreuzberg:html-to-markdownMaven / Gradle
C#KreuzbergDev.HtmlToMarkdowndotnet add package KreuzbergDev.HtmlToMarkdown
Elixirhtml_to_markdownmix deps.get html_to_markdown
Rhtmltomarkdowninstall.packages("htmltomarkdown")
C (FFI)releasesPre-built .so / .dll / .dylib

Part of Kreuzberg.dev

  • Kreuzberg — document intelligence: text, tables, metadata from 91+ formats with optional OCR.
  • Kreuzberg Cloud — managed extraction API with SDKs, dashboards, and observability.
  • kreuzcrawl — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
  • liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
  • tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
  • alef — the polyglot binding generator that produces all per-language bindings.
  • Discord — community, roadmap, announcements.

Contributing

Contributions welcome! See CONTRIBUTING.md for setup instructions and guidelines.

License

MIT License — see LICENSE for details.