Plasmate Cloudflare Worker
April 11, 2026 ยท View on GitHub
Deploy Plasmate's Semantic Object Model (SOM) compiler to Cloudflare's edge network. Transform HTML into structured, AI-friendly JSON at the edge with minimal latency.
What This Does
This worker runs Plasmate's SOM compiler as a WebAssembly module on Cloudflare's global edge network. It:
- Fetches any URL and extracts semantic structure
- Returns clean, structured JSON optimized for AI agents
- Reduces token usage by up to 16x compared to raw HTML
- Runs at the edge for low latency worldwide
Quick Deploy
# Install dependencies
npm install
# Deploy to Cloudflare
wrangler deploy
Usage
Fetch and Compile a URL
curl -X POST https://your-worker.workers.dev/fetch \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
With CSS Selector
Extract only specific content:
curl -X POST https://your-worker.workers.dev/fetch \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "selector": "article"}'
Text-Only Output
Get just the readable text:
curl -X POST https://your-worker.workers.dev/fetch \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "format": "text"}'
From JavaScript/TypeScript
const response = await fetch('https://your-worker.workers.dev/fetch', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
url: 'https://example.com',
selector: 'main',
}),
});
const som = await response.json();
console.log(som.title);
console.log(som.nodes);
API Reference
POST /fetch
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL to fetch and compile |
format | string | No | Output format: som (default) or text |
selector | string | No | CSS selector to extract specific content |
headers | object | No | Custom headers to include in the fetch request |
Response (SOM format):
{
"title": "Page Title",
"description": "Meta description",
"nodes": [
{
"tag": "nav",
"role": "navigation",
"children": [...]
},
{
"tag": "main",
"role": "main",
"children": [
{
"tag": "h1",
"text": "Heading"
}
]
}
],
"metadata": {
"url": "https://example.com",
"fetchedAt": "2024-01-15T10:30:00Z",
"originalSize": 45000,
"compressedSize": 2800
}
}
GET /health
Health check endpoint. Returns:
{
"status": "ok",
"service": "plasmate-worker",
"version": "1.0.0"
}
Customization
Environment Variables
Configure in wrangler.toml:
[vars]
MAX_HTML_SIZE = "10485760" # 10MB limit
Enable Caching
Uncomment the KV namespace configuration in wrangler.toml and src/index.ts:
[[kv_namespaces]]
binding = "CACHE"
id = "your-kv-namespace-id"
Create a KV namespace:
wrangler kv:namespace create CACHE
Rate Limiting
For production deployments, enable rate limiting using Durable Objects. See the placeholder comments in src/index.ts for implementation guidance.
Development
# Start local development server
npm run dev
# Run type checking
npm run typecheck
# Run tests
npm test
Related Projects
- Plasmate - The browser engine for AI agents
- @plasmate/wasm - WebAssembly build of the SOM compiler
License
MIT