Document Processing Tool
October 19, 2025 ยท View on GitHub
The Document Processing tool provides intelligent document conversion capabilities for PDF, DOCX, XLSX, PPTX, HTML, CSV, PNG, and JPG files using the powerful Docling library.
Note: This tool is disabled by default. To enable it, set the ENABLE_ADDITIONAL_TOOLS environment variable to include process_document.
Overview
Convert documents to structured Markdown while preserving formatting, extracting tables, images, and metadata. The tool offers processing profiles for different use cases, from simple text extraction to advanced diagram analysis with AI models.
Note: mcp-devtools also providers a PDF extraction tool that's not quite as smart but is quick and doesn't require docling, see PDF Processing for more details.
This tool is experimental and actively developed.
Features
- Multi-format Support: PDF, DOCX, XLSX, PPTX, HTML, CSV, PNG, JPG
- Processing Profiles: Simplified interface with preset configurations
- Intelligent Conversion: Preserves document structure and formatting
- OCR Support: Extract text from scanned documents
- Hardware Acceleration: Supports MPS (macOS), CUDA, and CPU processing
- Caching System: Avoids reprocessing identical documents
- Metadata Extraction: Document metadata (title, author, page count, etc.)
- Table & Image Extraction: Preserves tables and images in markdown
- Diagram Analysis: Advanced diagram detection using vision models
- Mermaid Generation: Convert diagrams to editable Mermaid syntax
- Auto-Save: Automatically saves processed content to files
Quick Start
First, enable the tool by setting the environment variable:
ENABLE_ADDITIONAL_TOOLS="process_document"
Then ensure docling is installed in the environment you'll be running the MCP Server from:
pip install -U pip docling
Usage
You can simply prompt the agent using the tool, e.g: "Use your document processing tool to convert and save /path/to/document.pdf to markdown".
{
"name": "process_document",
"arguments": {
"source": "/path/to/document.pdf"
}
}
This uses the default text-and-image profile and saves to /path/to/document.md.
Processing Profiles
basic - Fast Text Extraction
{
"name": "process_document",
"arguments": {
"source": "/path/to/document.pdf",
"profile": "basic"
}
}
- Text extraction only
- Fastest processing
- No image or diagram analysis
- Best for: Simple text documents, quick content extraction
text-and-image - Balanced Processing (Default)
{
"name": "process_document",
"arguments": {
"source": "/path/to/document.pdf",
"profile": "text-and-image"
}
}
- Text and image extraction
- Table processing
- Good balance of speed and features
- Best for: Most document types, general use
scanned - OCR Processing
{
"name": "process_document",
"arguments": {
"source": "/path/to/scanned-document.pdf",
"profile": "scanned"
}
}
- Optimised for scanned documents
- OCR enabled by default
- Best for: Image-based PDFs, scanned documents
llm-smoldocling - Vision Enhancement
{
"name": "process_document",
"arguments": {
"source": "/path/to/document.pdf",
"profile": "llm-smoldocling"
}
}
- Enhanced with SmolDocling vision model
- Diagram detection and description
- Chart data extraction
- No external LLM required
- Best for: Documents with diagrams and charts
llm-external - Advanced Diagram Processing
{
"name": "process_document",
"arguments": {
"source": "/path/to/document.pdf",
"profile": "llm-external"
}
}
- Full diagram-to-Mermaid conversion
- Requires LLM environment variables
- Most advanced processing capabilities
- Best for: Complex documents with many diagrams
- Requires: LLM configuration (see setup below)
Output Options
Save to File (Default)
{
"name": "process_document",
"arguments": {
"source": "/path/to/document.pdf"
}
}
- Saves to
/path/to/document.md - Images saved in same directory
- Returns success message with file path
Custom Save Location
{
"name": "process_document",
"arguments": {
"source": "/path/to/document.pdf",
"save_to": "/custom/path/output.md"
}
}
Return Content Inline
{
"name": "process_document",
"arguments": {
"source": "/path/to/document.pdf",
"return_inline_only": true
}
}
Setup and Configuration
Prerequisites
- Python 3.10+ (ideally 3.13+)
- Docling (auto-installed if missing)
The tool will attempt to install Docling automatically if not found.
Environment Variables
Python Configuration
DOCLING_PYTHON_PATH="/path/to/python" # Auto-detected if not set
The tool automatically detects Python installations with Docling in the following order:
DOCLING_PYTHON_PATHenvironment variable (highest priority).python-versionfile in current directory or home directory- Cached Python path from previous detection
- Common Python installation paths
.python-version Support:
The tool respects .python-version files (used by pyenv, asdf, and other version managers) for automatic Python version selection:
- Checks current working directory first
- Falls back to home directory if not found in working directory
- Supports version formats like
3.11.5or3.11 - Automatically resolves Python paths from:
- pyenv:
~/.pyenv/versions/ - asdf:
~/.asdf/installs/python/ - UV:
~/.local/share/uv/python/ - System: Homebrew and standard paths
- pyenv:
Example .python-version file:
3.11.5
Cache Configuration
DOCLING_CACHE_DIR="~/.mcp-devtools/docling-cache"
DOCLING_CACHE_ENABLED="true"
Hardware Acceleration
DOCLING_HARDWARE_ACCELERATION="auto" # auto, mps, cuda, cpu
Processing Configuration
DOCLING_TIMEOUT="300" # Processing timeout in seconds (default: 300 = 5 minutes)
DOCLING_MAX_FILE_SIZE="100" # Maximum file size in MB (default: 100 MB)
DOCLING_MAX_MEMORY_LIMIT="5368709120" # Memory limit in bytes (default: 5GB)
MCP_DEVTOOLS_MEMORY_LIMIT="5368709120" # Go application memory limit in bytes (default: 5GB)
Memory Management
The tool implements memory limits to prevent runaway memory usage during document processing:
-
Go Application Limit: Set via
MCP_DEVTOOLS_MEMORY_LIMIT(default: 5GB)- Soft limit enforced by Go runtime's garbage collector
- Automatically triggers more aggressive GC when approaching limit
-
Python Process Limit: Set via
DOCLING_MAX_MEMORY_LIMIT(default: 5GB)- Hard limit enforced by OS resource limits
- Process terminated if limit exceeded
Example configuration for stricter limits:
# Limit to 2GB for both Go and Python
MCP_DEVTOOLS_MEMORY_LIMIT="2147483648"
DOCLING_MAX_MEMORY_LIMIT="2147483648"
OCR Configuration
DOCLING_OCR_LANGUAGES="en,fr,de"
LLM Configuration (for llm-external profile)
DOCLING_VLM_API_URL="http://localhost:11434/v1" # OpenAI-compatible endpoint
DOCLING_VLM_MODEL="granite_docling" # Vision-capable model (default: granite_docling)
DOCLING_VLM_API_KEY="your-api-key-here" # API key
Corporate Network Setup
For environments with MITM proxies:
DOCLING_EXTRA_CA_CERTS="/path/to/mitm-ca-bundle.pem"
OCR (Optical Character Recognition)
When to Use OCR
OCR Disabled (Default):
- Best for: Digital documents (native PDFs, Word documents)
- Advantages: Faster, perfect accuracy, preserves formatting
- How it works: Extracts text directly from document structure
OCR Enabled (scanned profile):
- Best for: Scanned documents, image-based PDFs, photos
- Advantages: Processes any document type, handles handwritten text
- How it works: Uses computer vision to recognise text from images
OCR Language Support
{
"name": "process_document",
"arguments": {
"profile": "scanned",
"ocr_languages": ["en", "fr", "de", "es"]
}
}
Supported languages: English (en), French (fr), German (de), Spanish (es), Italian (it), Portuguese (pt), Dutch (nl), Russian (ru), Chinese (zh), Japanese (ja), Korean (ko), and many others.
Diagram Analysis and Mermaid Generation
Basic Diagram Analysis
The llm-smoldocling profile uses built-in vision models:
- Automatic diagram detection
- Type classification with confidence scores
- Element extraction
- No external services required
Advanced Mermaid Generation
The llm-external profile converts diagrams to Mermaid syntax:
Supported LLM Providers
- Ollama (local):
http://localhost:11434/v1 - LM Studio (local):
http://localhost:1234/v1 - OpenAI:
https://api.openai.com/v1 - OpenRouter:
https://openrouter.ai/api/v1
LLM Configuration
DOCLING_VLM_API_URL="http://localhost:11434/v1"
DOCLING_VLM_MODEL="granite_docling" # Default VLM model (qwen2.5vl:7b-q8_0, or any other vision-capable model)
DOCLING_VLM_API_KEY="your-api-key"
DOCLING_LLM_MAX_TOKENS="16384"
DOCLING_LLM_TEMPERATURE="0.1"
DOCLING_LLM_TIMEOUT="240"
Diagram Features
- Automatic Detection: Identifies flowcharts, architecture diagrams, charts
- Mermaid Conversion: Generates valid Mermaid syntax
- AWS Colour Coding: Consistent colour schemes for architecture diagrams
- Validation: Validates generated Mermaid syntax
- Fallback Handling: Graceful degradation if LLM unavailable
Response Examples
File Save Response
{
"success": true,
"message": "Content successfully exported to file",
"save_path": "/path/to/document.md",
"source": "/path/to/document.pdf",
"cache_hit": false,
"metadata": {
"file_size": 15420,
"document_title": "Document Title",
"document_author": "Author Name",
"page_count": 10,
"word_count": 1500
},
"processing_info": {
"processing_mode": "advanced",
"processing_method": "advanced+vision:standard",
"hardware_acceleration": "mps",
"ocr_enabled": false,
"processing_time": 2.5,
"timestamp": "2025-07-09T22:12:15+10:00"
}
}
Inline Content Response
{
"source": "/path/to/document.pdf",
"content": "# Document Title\n\nDocument content in markdown...",
"cache_hit": false,
"metadata": {
"title": "Document Title",
"author": "Author Name",
"page_count": 10
},
"images": [
{
"id": "image_1",
"type": "picture",
"caption": "Figure 1",
"file_path": "/path/to/extracted/image_1.png"
}
],
"diagrams": [
{
"id": "diagram_1",
"type": "flowchart",
"description": "Process flow diagram showing...",
"mermaid_code": "flowchart TD\n A[Start] --> B[Process]\n B --> C[End]",
"confidence": 0.95
}
]
}
Performance
Profile Performance (Typical Document)
basic: 1-3 secondstext-and-image: 3-10 secondsscanned: 10-30 secondsllm-smoldocling: 5-15 secondsllm-external: 15-60 seconds
Hardware Impact
- CPU: Baseline performance
- MPS (macOS): 2-5x faster on Apple Silicon
- CUDA: 3-10x faster on NVIDIA GPUs
Caching
Intelligent caching based on:
- Document source and modification time
- Processing parameters and profile
- 24-hour TTL by default
Common Use Cases
Research Document Analysis
{
"name": "process_document",
"arguments": {
"source": "/path/to/research-paper.pdf",
"profile": "llm-smoldocling"
}
}
Scanned Document Digitisation
{
"name": "process_document",
"arguments": {
"source": "/path/to/scanned-invoice.pdf",
"profile": "scanned"
}
}
Architecture Documentation
{
"name": "process_document",
"arguments": {
"source": "/path/to/architecture-doc.pdf",
"profile": "llm-external"
}
}
Quick Text Extraction
{
"name": "process_document",
"arguments": {
"source": "/path/to/simple-doc.pdf",
"profile": "basic"
}
}
Troubleshooting
Common Issues
"Python path is required but not found"
- Install Python 3.10+ and ensure it's in PATH
- Set
DOCLING_PYTHON_PATHenvironment variable - Or create a
.python-versionfile in your project directory or home directory - Supported version managers: pyenv, asdf, UV
"Docling not available"
- Install:
pip install docling - Verify:
python -c "import docling; print('OK')"
"Processing timeout"
- Increase
DOCLING_TIMEOUTenvironment variable - Use faster profile (
basicinstead ofllm-external)
"Hardware acceleration not working"
- Install appropriate PyTorch version
- Check:
python -c "import torch; print(torch.backends.mps.is_available())"
"LLM external profile not available"
- Set all
DOCLING_LLM_*environment variables - Verify LLM endpoint accessibility
- Ensure model supports vision input
Debug Mode
{
"name": "process_document",
"arguments": {
"source": "/path/to/document.pdf",
"debug": true
}
}
For technical implementation details, see the Document Processing source documentation.