README.md

March 10, 2026 · View on GitHub

Markdrop Logo

Markdrop

Downloads PyPI Version License Stars Issues Forks

A Python package for converting PDFs to structured Markdown and interactive HTML, with AI-powered image and table descriptions across six major LLM providers. Available on PyPI.


Features

  • PDF → Markdown conversion with formatting preservation (via Docling)
  • Automatic image extraction using XRef IDs
  • Table detection using Microsoft's Table Transformer
  • PDF URL support
  • AI-powered image and table descriptions — 6 providers: Gemini, OpenAI, Anthropic Claude, Groq, OpenRouter, LiteLLM
  • Interactive HTML output with downloadable Excel tables
  • Customisable image resolution and UI elements
  • Structured logging (never pollutes your app's root logger)
  • Support for DOCX / PPTX input

Installation

Core install (PDF conversion + Gemini/OpenAI):

pip install markdrop

With Anthropic Claude:

pip install "markdrop[anthropic]"

With Groq:

pip install "markdrop[groq]"

With LiteLLM (routes to 100+ providers):

pip install "markdrop[litellm]"

Everything (including local HuggingFace models):

pip install "markdrop[all]"

OpenRouter is accessed through the openai package (already included in core), so no extra install is needed.


Supported AI Providers

Provider--ai_providerDefault modelVision
Google Geminigeminigemini-3.1-flash-lite
OpenAIopenaigpt-5.4
Anthropic Claudeanthropicclaude-opus-4-6
Groqgroqmeta-llama/llama-4-maverick-17b-128e-instruct
OpenRouteropenroutergoogle/gemini-3.1-flash-lite (any model)
LiteLLMlitellmopenai/gpt-5.4 (configurable)

All models are configurable — use --model to override for any provider, or set model_name_override in ProcessorConfig.


Quick Start

Open in Colab Watch the demo


CLI Usage

1. Convert PDF → Markdown + HTML

markdrop convert <input_path> --output_dir <dir> [--add_tables]
# Example
markdrop convert report.pdf --output_dir out --add_tables
# Also works with URLs:
markdrop convert https://arxiv.org/pdf/1706.03762 --output_dir out

2. Generate AI Descriptions for Images & Tables

markdrop describe <markdown_file> --ai_provider <provider> [--output_dir <dir>] [--remove_images] [--remove_tables]
Provider--ai_provider
Google Gemini 2.0 Flashgemini
OpenAI GPT-4oopenai
Anthropic Claude Opusanthropic
Groq Llama-4 Scoutgroq
OpenRouteropenrouter
LiteLLMlitellm
# Gemini (default)
markdrop describe doc.md --ai_provider gemini

# Anthropic Claude
markdrop describe doc.md --ai_provider anthropic --remove_images

# Groq (fastest inference)
markdrop describe doc.md --ai_provider groq

# OpenRouter (any model)
markdrop describe doc.md --ai_provider openrouter

# LiteLLM (unified gateway)
markdrop describe doc.md --ai_provider litellm

3. Set Up API Keys

markdrop setup <provider>

Keys are stored in <package-root>/.env with 0o600 permissions on POSIX systems.

markdrop setup gemini       # → GEMINI_API_KEY
markdrop setup openai       # → OPENAI_API_KEY
markdrop setup anthropic    # → ANTHROPIC_API_KEY
markdrop setup groq         # → GROQ_API_KEY
markdrop setup openrouter   # → OPENROUTER_API_KEY
markdrop setup litellm      # → LITELLM_API_KEY

4. Analyze Images in a PDF

markdrop analyze report.pdf --output_dir pdf_analysis --save_images

5. Batch Image Description Generation

markdrop generate images/ --output_dir descriptions/ --prompt "Describe in detail." \
  --llm_client gemini openai

Available --llm_client values: qwen, gemini, openai, llama-vision, molmo, pixtral


Python API

PDF Conversion

from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging

config = MarkDropConfig(
    image_resolution_scale=2.0,
    download_button_color='#444444',
    log_level=logging.INFO,
    log_dir='logs',
    excel_dir='markdrop-excel-tables',
)

html_path = markdrop("path/to/input.pdf", "output", config)
downloadable_html = add_downloadable_tables(html_path, config)

AI Descriptions

from markdrop import process_markdown, ProcessorConfig, AIProvider, setup_keys

# One-time key setup (writes to .env)
setup_keys('anthropic')

config = ProcessorConfig(
    input_path="doc.md",
    output_dir="output",
    ai_provider=AIProvider.ANTHROPIC,       # GEMINI | OPENAI | ANTHROPIC | GROQ | OPENROUTER | LITELLM
    remove_images=False,
    remove_tables=False,
    table_descriptions=True,
    image_descriptions=True,
    max_retries=3,
    retry_delay=2,
    # Override default models (all providers have matching config fields):
    anthropic_model_name="claude-sonnet-4-5",    # faster / cheaper
    anthropic_text_model_name="claude-sonnet-4-5",
)

output_path = process_markdown(config)

Using OpenRouter to access any model

config = ProcessorConfig(
    input_path="doc.md",
    output_dir="output",
    ai_provider=AIProvider.OPENROUTER,
    openrouter_model_name="meta-llama/llama-4-scout",   # any model on openrouter.ai/models
    openrouter_text_model_name="anthropic/claude-sonnet-4-5",
    openrouter_site_url="https://yoursite.com",
    openrouter_site_name="My App",
)

Using LiteLLM for any 100+ provider

import os
os.environ["ANTHROPIC_API_KEY"] = "..."   # set any provider's key

config = ProcessorConfig(
    input_path="doc.md",
    output_dir="output",
    ai_provider=AIProvider.LITELLM,
    litellm_model_name="anthropic/claude-opus-4-6",
    litellm_text_model_name="groq/llama-3.3-70b-versatile",
)

Batch Image Description Generation

from markdrop import generate_descriptions

generate_descriptions(
    input_path='images/',
    output_dir='output/',
    prompt='Give a highly detailed description of this image.',
    llm_client=['gemini', 'llama-vision'],
)

API Reference

ProcessorConfig – AI Provider Fields

FieldDefaultNotes
gemini_model_namegemini-2.0-flashVision model
gemini_text_model_namegemini-2.0-flashText model
openai_model_namegpt-4oVision + text
openai_text_model_namegpt-4o
anthropic_model_nameclaude-opus-4-6Vision
anthropic_text_model_nameclaude-sonnet-4-5Text (cheaper)
groq_model_namemeta-llama/llama-4-scout-17b-16e-instructVision
groq_text_model_namellama-3.3-70b-versatileText
openrouter_model_namegoogle/gemini-2.0-flash-001Any model string from openrouter.ai/models
openrouter_text_model_nameanthropic/claude-sonnet-4-5
litellm_model_nameopenai/gpt-4oprovider/model format
litellm_text_model_nameopenai/gpt-4o

MarkDropConfig

FieldDefaultNotes
image_resolution_scale2.0Scale factor for extracted images
download_button_color'#444444'HTML button colour
log_levellogging.INFO
log_dir'logs'
excel_dir'markdrop_excel_tables'

Contributing

We welcome contributions! See CONTRIBUTING.md.

git clone https://github.com/shoryasethia/markdrop.git
cd markdrop
python -m venv venv && source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -e ".[all]"

Project Structure

markdrop/
├── setup.py
├── requirements.txt
├── README.md
└── markdrop/
    ├── __init__.py
    ├── main.py          ← CLI entry-point
    ├── process.py       ← PDF conversion
    ├── parse.py         ← AI description engine (all 6 providers)
    ├── helper.py        ← PDF image analysis
    ├── utils.py         ← PDF download helpers
    ├── setup_keys.py    ← Interactive API key manager
    ├── ignore_warnings.py
    ├── src/
    │   └── markdrop-logo.png
    └── models/
        ├── img_descriptions.py
        ├── model_loader.py  ← Local HF model loader
        ├── responder.py
        └── logger.py

Star History

Star History Chart


License

GPL-3.0 — see LICENSE.

Changelog

See CHANGELOG.md.

Support