flowise-plasmate

April 12, 2026 ยท View on GitHub

Flowise custom nodes for Plasmate - the AI browser engine that converts HTML to structured JSON with 10-100x token compression.

Perfect for building LLM workflows that need to process web content efficiently.

Installation

Prerequisites

  1. Install Plasmate:

    # macOS
    brew install plasmate/tap/plasmate
    
    # Or build from source
    cargo install plasmate
    
  2. Verify installation:

    plasmate --version
    

Installing in Flowise

  1. Navigate to your Flowise installation's custom nodes directory:

    cd ~/.flowise/custom-nodes
    # Or for Docker: copy to mounted volume
    
  2. Clone or copy this package:

    git clone https://github.com/user/flowise-plasmate.git
    cd flowise-plasmate
    npm install
    npm run build
    
  3. Restart Flowise to load the new nodes.

Nodes

Plasmate Web Browser

The main node for fetching and parsing web content.

Inputs:

  • URL (required): The webpage URL to fetch
  • Format: Output format
    • SOM - Semantic Object Model (structured JSON, maximum compression)
    • Text - Clean readable text extraction
    • Markdown - Markdown formatted output
  • CSS Selector (optional): Focus extraction on specific elements
  • Timeout: Request timeout in seconds (default: 30)
  • Plasmate Path: Path to plasmate binary (default: plasmate in PATH)
  • Custom Headers: HTTP headers for authenticated requests

Outputs:

  • Document: LangChain-compatible Document with metadata
  • Text: Raw string output

Use Cases:

  • Load documentation pages for RAG
  • Fetch article content for summarization
  • Parse structured data from websites

Extract all links from a webpage with filtering options.

Inputs:

  • URL (required): The webpage URL
  • Filter Type: All Links, Internal Only, or External Only
  • CSS Selector (optional): Limit extraction to specific page areas
  • Timeout: Request timeout in seconds

Outputs:

  • Links Array: Array of { href, text, type } objects
  • URLs Only: Array of URL strings

Use Cases:

  • Build web crawlers
  • Extract navigation structures
  • Find related content links

Plasmate Text Extract

Simplified node for clean text extraction.

Inputs:

  • URL (required): The webpage URL
  • CSS Selector (optional): Focus on specific elements
  • Include Metadata: Add page title and description
  • Max Length: Truncate output to character limit
  • Timeout: Request timeout in seconds

Outputs:

  • Text: Clean text string
  • Document: LangChain Document with metadata

Use Cases:

  • Quick text extraction for chat
  • Article content for RAG pipelines
  • Clean text for embeddings

Example Chatflows

Basic Web Q&A

[Plasmate Web Browser] --> [Recursive Character Text Splitter] --> [OpenAI Embeddings] --> [In-Memory Vector Store] --> [Conversational Retrieval QA Chain]

Web Research Agent

[Plasmate Extract Links] --> [Loop/Iterator] --> [Plasmate Text Extract] --> [Document Aggregator] --> [LLM Chain]

Documentation Loader

[Plasmate Web Browser (SOM format)] --> [Custom Transform] --> [Vector Store]

Configuration

Custom Plasmate Path

If Plasmate is not in your PATH, specify the full path in the node's advanced settings:

/usr/local/bin/plasmate
# or
/home/user/.cargo/bin/plasmate

Authenticated Requests

For sites requiring authentication, use Custom Headers:

Authorization: Bearer your-token-here
Cookie: session=abc123

Token Compression Comparison

SourceRaw HTMLPlasmate SOMReduction
News article50,000 tokens3,000 tokens94%
Documentation30,000 tokens2,500 tokens92%
E-commerce page80,000 tokens5,000 tokens94%

Troubleshooting

"Plasmate not found"

Ensure Plasmate is installed and in your PATH:

which plasmate
plasmate --version

Or specify the full path in node settings.

Timeout Errors

Increase the timeout value for slow-loading pages, or use a CSS selector to fetch only needed content.

Empty Output

Some sites may block automated requests. Try adding a User-Agent header:

User-Agent: Mozilla/5.0 (compatible; Flowise/1.0)

License

MIT