Activepieces Plasmate Piece

April 11, 2026 ยท View on GitHub

An Activepieces piece for Plasmate - the browser engine for AI agents. Convert HTML to Semantic Object Model (SOM) with 16x fewer tokens than raw HTML.

Features

  • Fetch Page: Convert any web page to structured SOM JSON
  • Extract Text: Get clean, readable text from web pages
  • Extract Links: Extract and categorize all links from a page

Installation

From Activepieces Marketplace

  1. Go to your Activepieces instance
  2. Navigate to Settings > Pieces
  3. Search for "Plasmate"
  4. Click Install

Manual Installation (Self-Hosted)

  1. Clone this repository into your Activepieces pieces directory:
cd /path/to/activepieces/packages/pieces/community
git clone https://github.com/plasmate-labs/activepieces-plasmate piece-plasmate
  1. Install dependencies and build:
cd piece-plasmate
npm install
npm run build
  1. Add the piece to your Activepieces configuration.

Development

# Install dependencies
npm install

# Build
npm run build

# Watch mode for development
npm run dev

Configuration

Authentication

The Plasmate piece supports two modes:

  1. Plasmate Cloud (Recommended): Get an API key from plasmate.app/dashboard
  2. Local CLI: Leave the API key empty to use a locally installed Plasmate CLI (self-hosted Activepieces only)

Actions

Fetch Page

Fetch a web page and convert it to Semantic Object Model (SOM).

Inputs:

  • URL (required): The URL to fetch
  • Output Format: SOM, Plain Text, or JSON
  • CSS Selector (optional): Extract a specific portion of the page

Output:

{
  "success": true,
  "url": "https://example.com",
  "format": "som",
  "data": {
    "regions": {
      "main": { ... },
      "navigation": { ... }
    },
    "elements": [ ... ]
  }
}

Extract Text

Extract clean, readable text from a web page.

Inputs:

  • URL (required): The URL to extract text from
  • CSS Selector (optional): Extract text from a specific portion

Output:

{
  "success": true,
  "url": "https://example.com",
  "text": "The extracted text content...",
  "lines": ["Line 1", "Line 2"],
  "stats": {
    "lineCount": 42,
    "wordCount": 350,
    "charCount": 2100
  }
}

Extract all links from a web page with filtering options.

Inputs:

  • URL (required): The URL to extract links from
  • CSS Selector (optional): Extract links from a specific portion
  • URL Filter Pattern (optional): Regex pattern to filter links
  • Include External Links: Toggle external link inclusion
  • Unique Links Only: Remove duplicate URLs

Output:

{
  "success": true,
  "url": "https://example.com",
  "links": [
    { "text": "About Us", "href": "https://example.com/about" },
    { "text": "Contact", "href": "https://example.com/contact" }
  ],
  "stats": {
    "total": 25,
    "internal": 20,
    "external": 5
  },
  "categorized": {
    "internal": [ ... ],
    "external": [ ... ]
  }
}

Use Cases

  • Content Monitoring: Track changes on web pages
  • Data Extraction: Extract structured data from websites
  • SEO Analysis: Analyze page content and link structure
  • Research Automation: Gather information from multiple sources
  • AI Workflows: Feed web content to AI models with minimal tokens

Why Plasmate?

  • 16x fewer tokens than raw HTML on average
  • 50x faster than headless browser solutions
  • 30MB memory footprint vs 300MB+ for Chrome
  • Structured output with semantic roles, not tag soup

License

MIT License - see LICENSE for details.