README.md

May 18, 2026 · View on GitHub

PDF Monster

AI-agent PDF analysis skill and Codex plugin

PDFs in, model-readable evidence out.

Python 3.10+ PyMuPDF OCR Codex Plugin License MIT

FeaturesInstallCLI UsageOutputAgent SkillLicense

PDF Monster turns PDFs into model-readable evidence: extracted text, optional OCR text, rendered page images, and embedded image files. It is built for agents that need to inspect PDFs without dumping generated folders into the user's project.

What It Does

  • Extracts per-page text from PDFs
  • Renders pages to PNG when layout or visual inspection matters
  • Runs optional OCR through Tesseract
  • Extracts embedded images for figures and screenshots
  • Emits a structured JSON manifest for agents to read
  • Avoids creating output/, pages/, or similar folders unless explicitly requested

Install

Install As A Codex Plugin

Add this repository to Codex:

codex plugin marketplace add jbaehova/pdf-monster

Then install or enable PDF Monster from Codex's Plugins UI.

For local development, add this checkout directly:

codex plugin marketplace add /absolute/path/to/pdf-monster

This repository is the installable Codex plugin package. Its plugin files follow the same root-level layout used by simple Codex plugins:

.codex-plugin/plugin.json
.claude-plugin/marketplace.json
plugin -> .
assets/pdf-monster.svg
skills/pdf-monster/SKILL.md
skills/pdf-monster/scripts/analyze_pdf.py

plugin is a compatibility symlink to the repository root. It keeps the installable package at the root while giving Codex a non-empty marketplace source path.

After these files are pushed to GitHub, users can add the plugin with codex plugin marketplace add jbaehova/pdf-monster.

Install As A Standalone Skill

Clone this repository, then copy or reference the skill package at skills/pdf-monster:

git clone https://github.com/jbaehova/pdf-monster.git pdf-monster

Python 3 is required. On first use, the skill tells the agent to check for PyMuPDF and install the recommended Python dependency when it is missing and pip/network installs are allowed:

python3 -m pip install -r /absolute/path/to/pdf-monster/skills/pdf-monster/requirements.txt

If you run the CLI yourself, install it once from the repo root:

python3 -m pip install -r skills/pdf-monster/requirements.txt

Optional system tools are not installed automatically:

  • Poppler: pdfinfo, pdftotext, pdftoppm, pdfimages
  • Tesseract: tesseract plus language data such as eng or kor

Use As An Agent Skill

Install the skill folder, not just SKILL.md, because the skill uses scripts/analyze_pdf.py.

Common locations:

Claude Code:   ~/.claude/skills/pdf-monster
Codex:         ~/.codex/skills/pdf-monster
Pi:            ~/.pi/agent/skills/pdf-monster or ~/.agents/skills/pdf-monster
OpenClaw:      ~/.openclaw/skills/pdf-monster or <workspace>/skills/pdf-monster
Hermes:        ~/.hermes/skills/pdf-monster

For agents without native skill discovery, point custom instructions at the absolute path to the nested SKILL.md and tell the agent to run:

python3 /absolute/path/to/pdf-monster/skills/pdf-monster/scripts/analyze_pdf.py <file.pdf> --json

CLI Usage

Basic analysis:

python3 skills/pdf-monster/scripts/analyze_pdf.py file.pdf --json

Visual or scanned PDFs:

python3 skills/pdf-monster/scripts/analyze_pdf.py file.pdf --render-pages all --ocr auto --json

Slide decks or PDFs with repeated logos/icons:

python3 skills/pdf-monster/scripts/analyze_pdf.py file.pdf --render-pages all --min-image-area 10000 --dedupe-images --json

Korean and English OCR:

python3 skills/pdf-monster/scripts/analyze_pdf.py file.pdf --render-pages all --ocr auto --ocr-lang kor+eng --json

Text-only mode:

python3 skills/pdf-monster/scripts/analyze_pdf.py file.pdf --render-pages none --no-extract-images --ocr never --json

Selected pages:

python3 skills/pdf-monster/scripts/analyze_pdf.py file.pdf --pages 1,3-5 --render-pages all --json

Persist artifacts when you actually want files kept:

python3 skills/pdf-monster/scripts/analyze_pdf.py file.pdf --save-to ./pdf-monster-artifacts --json

Output

The script prints JSON with fields such as:

  • page_count
  • selected_pages
  • backend
  • artifact_root
  • cleanup_command
  • pages_needing_visual_review
  • pages[].text
  • pages[].ocr_text
  • pages[].render_path
  • pages[].embedded_images
  • pages[].needs_visual_review
  • pages[].visual_review_reasons
  • pages[].warnings

If temporary artifacts are created, the JSON includes a cleanup_command. Run it only after the image paths are no longer needed.

Notes

PyMuPDF is the preferred backend. If it is unavailable, PDF Monster falls back to Poppler CLI tools where possible. OCR is optional; when Tesseract is missing, the script reports a warning and continues with text extraction and page rendering.

License

MIT. See LICENSE.