Single-File MCP Server
June 6, 2025 ยท View on GitHub
A powerful Model Context Protocol (MCP) server that provides intelligent web content extraction using single-file and trafilatura. Perfect for AI agents that need to access and analyze web content from JavaScript-heavy sites.
GitHub Repository: https://github.com/kwinsch/singlefile-mcp
Features
๐ Universal Web Content Access
- JavaScript Support: Handles modern SPA/React/Vue apps that require browser rendering
- Clean Content Extraction: Uses Mozilla's Readability algorithm via trafilatura
- Rich Metadata: Extracts title, author, date, description, and more
- Multiple Output Formats: Raw HTML or clean markdown-like content
๐ Smart Pagination & Token Management
- Flexible Pagination: Offset/limit system like file reading tools
- Token Limits: Configurable max tokens (up to 25,000)
- Smart Truncation: Summary mode shows beginning + end, truncate mode cuts cleanly
- Navigation Hints: Clear guidance on how to continue reading large documents
โก Performance & Control
- Selective Loading: Block images/scripts for faster processing
- Content Compression: Optional HTML compression
- Timeout Protection: Configurable timeouts prevent hanging
- Error Handling: Graceful degradation when extraction fails
Installation
Prerequisites
- Python 3.8+
- single-file CLI - Web page capture tool
- Node.js 16+ (for single-file)
- A supported browser (Chromium, Chrome, Edge, Firefox, etc.)
Install single-file CLI
The single-file CLI is essential for this MCP server to work. It uses a real browser engine to accurately capture JavaScript-rendered content.
npm install -g single-file-cli
Usage with Claude Code
Quick Install (from PyPI)
claude mcp add singlefile-mcp -s user -- uvx singlefile-mcp
This will automatically install and run the package from PyPI, similar to how Brave Search works!
Development Install (from local directory)
claude mcp add singlefile-mcp -s user -- uvx --from /path/to/single-file_mcp singlefile-mcp
Remove old server (if upgrading)
claude mcp remove single-file-fetcher --scope user
Optional: Add Brave Search MCP
claude mcp add brave-search -s user -- env BRAVE_API_KEY=YOUR_KEY npx -y @modelcontextprotocol/server-brave-search
API Reference
fetch_webpage
Fetch and process web content with intelligent extraction.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | required | URL of the webpage to fetch |
output_content | boolean | true | Whether to return content in response |
extract_content | boolean | false | Extract clean text content (recommended) |
include_metadata | boolean | true | Include page metadata (title, author, etc.) |
block_images | boolean | false | Block image downloads for faster processing |
block_scripts | boolean | true | Block JavaScript execution |
compress_html | boolean | true | Compress HTML output |
max_tokens | number | 20000 | Maximum tokens in response (max: 25000) |
truncate_method | string | "truncate" | How to handle large content: "truncate" or "summary" |
offset | number | 0 | Character offset to start reading from |
limit | number | null | Maximum characters to return |
Examples
Basic content extraction:
fetch_webpage(
url="https://example.com/article",
extract_content=True,
include_metadata=True
)
Paginated reading of large documents:
# Get overview
fetch_webpage(
url="https://docs.example.com/guide",
extract_content=True,
limit=5000
)
# Continue reading from offset
fetch_webpage(
url="https://docs.example.com/guide",
extract_content=True,
offset=5000,
limit=5000
)
Raw HTML for complex parsing:
fetch_webpage(
url="https://app.example.com/dashboard",
extract_content=False,
block_scripts=False,
max_tokens=15000
)
Practical Example: Research Workflow
Here's a real-world example combining Brave Search and Single-File MCP:
Step 1: Search for information
# Using Brave Search MCP
brave_web_search(
query="artificial intelligence history timeline",
count=5
)
Step 2: Fetch and analyze Wikipedia article
# Using Single-File MCP to extract content
fetch_webpage(
url="https://en.wikipedia.org/wiki/History_of_artificial_intelligence",
extract_content=True,
include_metadata=True,
limit=5000 # Get first 5000 chars
)
Result:
Successfully fetched webpage: https://en.wikipedia.org/wiki/History_of_artificial_intelligence
## Metadata
**Title:** History of artificial intelligence - Wikipedia
**Description:** The history of artificial intelligence (AI) began in antiquity...
**Site:** wikipedia.org
## Extracted Content (chars 0-5000 of 45000)
*Note: More content available. Use offset=5000 to continue.*
# History of artificial intelligence
The history of artificial intelligence (AI) began in antiquity, with myths,
stories and rumors of artificial beings endowed with intelligence...
[Clean, readable article content follows...]
Step 3: Continue reading with pagination
# Get next section
fetch_webpage(
url="https://en.wikipedia.org/wiki/History_of_artificial_intelligence",
extract_content=True,
offset=5000,
limit=5000
)
This workflow enables AI agents to:
- Search for current information beyond their training data
- Extract clean, structured content from any webpage
- Process JavaScript-heavy sites that other tools can't handle
- Paginate through long documents intelligently
Output Format
With Content Extraction
Successfully fetched webpage: https://example.com
## Metadata
**Title:** Example Article
**Author:** John Doe
**Date:** 2024-01-15
**Description:** An informative article about...
**Site:** example.com
## Extracted Content (chars 0-5000 of 12000)
*Note: More content available. Use offset=5000 to continue.*
# Article Title
This is the clean, readable content extracted from the webpage...
Pagination Info
When using offset/limit, responses include:
- Current position:
chars 1000-6000 of 12000 - Navigation hint:
Use offset=6000 to continue - Total size information
Use Cases
๐ Documentation Analysis
Perfect for reading large technical docs, API references, and guides that span multiple pages.
๐ฐ News & Article Processing
Extract clean article content from news sites, blogs, and publications for analysis.
๐ Research & Data Gathering
Gather structured data from websites, including metadata and clean text content.
๐ค AI Agent Integration
Enable AI agents to browse and understand web content, even from JavaScript-heavy applications.
โ๏ธ Legal Document Processing
Handle complex legal documents and government sites that require JavaScript rendering.
Technical Details
Content Extraction Pipeline
- single-file: Renders JavaScript and saves complete webpage
- trafilatura: Extracts main content using Mozilla Readability algorithm
- Pagination: Applies offset/limit for manageable chunks
- Token Management: Ensures responses fit within LLM context limits
Browser Engine
Uses a browser via single-file for full JavaScript support:
- Works with any supported browser installed on your system
- Waits for network idle before capture
- Removes hidden elements and unused styles
- Handles dynamic content loading
Metadata Extraction
Automatically extracts:
- Page title and description
- Author and publication date
- Site name and language
- Categories and tags (when available)
Error Handling
- Network Issues: Graceful timeout with informative errors
- JavaScript Errors: Continues processing even if some scripts fail
- Large Content: Automatic truncation with clear indicators
- Invalid URLs: Clear validation error messages
Development Setup
- Clone the repository:
git clone https://github.com/kwinsch/singlefile-mcp.git
cd singlefile-mcp
- Install dependencies:
pip install -r requirements.txt
- Install in development mode:
pip install -e .
- Test locally with Claude Code:
claude mcp add singlefile-mcp -s user -- uvx --from . singlefile-mcp
License
MIT License - see LICENSE file for details.
Dependencies
- single-file - Core web page capture tool that handles JavaScript rendering
- trafilatura - Content extraction using Mozilla's Readability algorithm
- mcp - Model Context Protocol for AI integration
Acknowledgments
- single-file by Gildas Lormeau - Excellent web page capture tool
- trafilatura - Robust content extraction library
- Model Context Protocol - Standardized AI integration protocol