Plasmate Cloudflare Worker

April 11, 2026 ยท View on GitHub

Deploy Plasmate's Semantic Object Model (SOM) compiler to Cloudflare's edge network. Transform HTML into structured, AI-friendly JSON at the edge with minimal latency.

What This Does

This worker runs Plasmate's SOM compiler as a WebAssembly module on Cloudflare's global edge network. It:

  • Fetches any URL and extracts semantic structure
  • Returns clean, structured JSON optimized for AI agents
  • Reduces token usage by up to 16x compared to raw HTML
  • Runs at the edge for low latency worldwide

Quick Deploy

# Install dependencies
npm install

# Deploy to Cloudflare
wrangler deploy

Usage

Fetch and Compile a URL

curl -X POST https://your-worker.workers.dev/fetch \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

With CSS Selector

Extract only specific content:

curl -X POST https://your-worker.workers.dev/fetch \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "selector": "article"}'

Text-Only Output

Get just the readable text:

curl -X POST https://your-worker.workers.dev/fetch \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "text"}'

From JavaScript/TypeScript

const response = await fetch('https://your-worker.workers.dev/fetch', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    url: 'https://example.com',
    selector: 'main',
  }),
});

const som = await response.json();
console.log(som.title);
console.log(som.nodes);

API Reference

POST /fetch

Request body:

FieldTypeRequiredDescription
urlstringYesThe URL to fetch and compile
formatstringNoOutput format: som (default) or text
selectorstringNoCSS selector to extract specific content
headersobjectNoCustom headers to include in the fetch request

Response (SOM format):

{
  "title": "Page Title",
  "description": "Meta description",
  "nodes": [
    {
      "tag": "nav",
      "role": "navigation",
      "children": [...]
    },
    {
      "tag": "main",
      "role": "main",
      "children": [
        {
          "tag": "h1",
          "text": "Heading"
        }
      ]
    }
  ],
  "metadata": {
    "url": "https://example.com",
    "fetchedAt": "2024-01-15T10:30:00Z",
    "originalSize": 45000,
    "compressedSize": 2800
  }
}

GET /health

Health check endpoint. Returns:

{
  "status": "ok",
  "service": "plasmate-worker",
  "version": "1.0.0"
}

Customization

Environment Variables

Configure in wrangler.toml:

[vars]
MAX_HTML_SIZE = "10485760"  # 10MB limit

Enable Caching

Uncomment the KV namespace configuration in wrangler.toml and src/index.ts:

[[kv_namespaces]]
binding = "CACHE"
id = "your-kv-namespace-id"

Create a KV namespace:

wrangler kv:namespace create CACHE

Rate Limiting

For production deployments, enable rate limiting using Durable Objects. See the placeholder comments in src/index.ts for implementation guidance.

Development

# Start local development server
npm run dev

# Run type checking
npm run typecheck

# Run tests
npm test

License

MIT