flowise-plasmate
April 12, 2026 ยท View on GitHub
Flowise custom nodes for Plasmate - the AI browser engine that converts HTML to structured JSON with 10-100x token compression.
Perfect for building LLM workflows that need to process web content efficiently.
Installation
Prerequisites
-
Install Plasmate:
# macOS brew install plasmate/tap/plasmate # Or build from source cargo install plasmate -
Verify installation:
plasmate --version
Installing in Flowise
-
Navigate to your Flowise installation's custom nodes directory:
cd ~/.flowise/custom-nodes # Or for Docker: copy to mounted volume -
Clone or copy this package:
git clone https://github.com/user/flowise-plasmate.git cd flowise-plasmate npm install npm run build -
Restart Flowise to load the new nodes.
Nodes
Plasmate Web Browser
The main node for fetching and parsing web content.
Inputs:
- URL (required): The webpage URL to fetch
- Format: Output format
SOM- Semantic Object Model (structured JSON, maximum compression)Text- Clean readable text extractionMarkdown- Markdown formatted output
- CSS Selector (optional): Focus extraction on specific elements
- Timeout: Request timeout in seconds (default: 30)
- Plasmate Path: Path to plasmate binary (default:
plasmatein PATH) - Custom Headers: HTTP headers for authenticated requests
Outputs:
- Document: LangChain-compatible Document with metadata
- Text: Raw string output
Use Cases:
- Load documentation pages for RAG
- Fetch article content for summarization
- Parse structured data from websites
Plasmate Extract Links
Extract all links from a webpage with filtering options.
Inputs:
- URL (required): The webpage URL
- Filter Type:
All Links,Internal Only, orExternal Only - CSS Selector (optional): Limit extraction to specific page areas
- Timeout: Request timeout in seconds
Outputs:
- Links Array: Array of
{ href, text, type }objects - URLs Only: Array of URL strings
Use Cases:
- Build web crawlers
- Extract navigation structures
- Find related content links
Plasmate Text Extract
Simplified node for clean text extraction.
Inputs:
- URL (required): The webpage URL
- CSS Selector (optional): Focus on specific elements
- Include Metadata: Add page title and description
- Max Length: Truncate output to character limit
- Timeout: Request timeout in seconds
Outputs:
- Text: Clean text string
- Document: LangChain Document with metadata
Use Cases:
- Quick text extraction for chat
- Article content for RAG pipelines
- Clean text for embeddings
Example Chatflows
Basic Web Q&A
[Plasmate Web Browser] --> [Recursive Character Text Splitter] --> [OpenAI Embeddings] --> [In-Memory Vector Store] --> [Conversational Retrieval QA Chain]
Web Research Agent
[Plasmate Extract Links] --> [Loop/Iterator] --> [Plasmate Text Extract] --> [Document Aggregator] --> [LLM Chain]
Documentation Loader
[Plasmate Web Browser (SOM format)] --> [Custom Transform] --> [Vector Store]
Configuration
Custom Plasmate Path
If Plasmate is not in your PATH, specify the full path in the node's advanced settings:
/usr/local/bin/plasmate
# or
/home/user/.cargo/bin/plasmate
Authenticated Requests
For sites requiring authentication, use Custom Headers:
Authorization: Bearer your-token-here
Cookie: session=abc123
Token Compression Comparison
| Source | Raw HTML | Plasmate SOM | Reduction |
|---|---|---|---|
| News article | 50,000 tokens | 3,000 tokens | 94% |
| Documentation | 30,000 tokens | 2,500 tokens | 92% |
| E-commerce page | 80,000 tokens | 5,000 tokens | 94% |
Troubleshooting
"Plasmate not found"
Ensure Plasmate is installed and in your PATH:
which plasmate
plasmate --version
Or specify the full path in node settings.
Timeout Errors
Increase the timeout value for slow-loading pages, or use a CSS selector to fetch only needed content.
Empty Output
Some sites may block automated requests. Try adding a User-Agent header:
User-Agent: Mozilla/5.0 (compatible; Flowise/1.0)
License
MIT