prisma-plasmate
April 12, 2026 ยท View on GitHub
Prisma integration for Plasmate - the browser engine for AI agents.
Store and query web content with 10-100x token compression using Plasmate's Semantic Object Model (SOM).
Features
- Token Compression: Store web content as SOM JSON with 10-100x fewer tokens than raw HTML
- Type-Safe Queries: Full TypeScript support with Prisma's type safety
- Batch Processing: Efficiently fetch and store multiple URLs with concurrency control
- Full-Text Search: Query stored content with text search
- Crawl Sessions: Group related fetches for organized data management
- Link Extraction: Automatically extract and store page relationships
- Caching: Skip refetching recently stored content
Installation
npm install prisma-plasmate @prisma/client
npm install -D prisma
You also need Plasmate installed:
cargo install plasmate
# or
brew install plasmate
Quick Start
1. Add Schema Models
Add the Plasmate models to your prisma/schema.prisma:
model WebContent {
id String @id @default(cuid())
url String
canonicalUrl String?
title String?
description String?
som Json
textContent String?
htmlTokens Int?
somTokens Int?
compressionRatio Float?
statusCode Int?
contentType String?
headers Json?
fetchedAt DateTime @default(now())
updatedAt DateTime @updatedAt
crawlSession CrawlSession? @relation(fields: [crawlSessionId], references: [id])
crawlSessionId String?
outboundLinks Link[] @relation("SourceLinks")
inboundLinks Link[] @relation("TargetLinks")
@@unique([url, crawlSessionId])
@@index([url])
@@index([fetchedAt])
@@index([crawlSessionId])
}
model CrawlSession {
id String @id @default(cuid())
name String?
startedAt DateTime @default(now())
completedAt DateTime?
status CrawlStatus @default(RUNNING)
metadata Json?
contents WebContent[]
@@index([status])
@@index([startedAt])
}
model Link {
id String @id @default(cuid())
href String
text String?
rel String?
source WebContent @relation("SourceLinks", fields: [sourceId], references: [id], onDelete: Cascade)
sourceId String
target WebContent? @relation("TargetLinks", fields: [targetId], references: [id], onDelete: SetNull)
targetId String?
@@index([sourceId])
@@index([targetId])
@@index([href])
}
enum CrawlStatus {
RUNNING
COMPLETED
FAILED
CANCELLED
}
2. Run Migrations
npx prisma migrate dev --name add-web-content
3. Fetch and Store Content
import { createPlasmaPrismaClient } from 'prisma-plasmate';
const client = createPlasmaPrismaClient();
// Fetch a URL and store it
const result = await client.fetchAndStore('https://example.com');
console.log(`Stored: ${result.title}`);
console.log(`SOM tokens: ${result.somTokens}`);
// Search stored content
const results = await client.search('typescript');
for (const item of results) {
console.log(`${item.title}: ${item.url}`);
}
await client.disconnect();
Usage
PlasmaPrismaClient
The main client class provides high-level methods for web content operations:
import { createPlasmaPrismaClient } from 'prisma-plasmate';
const client = createPlasmaPrismaClient({
plasmate: {
binaryPath: 'plasmate', // Path to plasmate CLI
timeout: 30000, // Request timeout
defaultHeaders: { // Headers for all requests
'User-Agent': 'MyBot/1.0',
},
},
});
// Fetch single URL
const result = await client.fetchAndStore('https://docs.example.com', {
headers: { 'Authorization': 'Bearer token' },
cacheFor: 60 * 60 * 1000, // Don't refetch within 1 hour
});
// Batch fetch with progress
const batchResult = await client.batchFetchAndStore(urls, {
concurrency: 5,
continueOnError: true,
onProgress: (done, total, url) => {
console.log(`[${done}/${total}] ${url}`);
},
});
// Search content
const results = await client.search('react hooks', {
limit: 20,
urlPattern: 'reactjs.org',
});
// Get statistics
const stats = await client.getStats();
console.log(`Token savings: ${stats.tokensSaved}`);
Prisma Extension
For native Prisma integration, use the extension API:
import { PrismaClient } from '@prisma/client';
import { plasmateExtension } from 'prisma-plasmate';
const prisma = new PrismaClient().$extends(plasmateExtension());
// Fetch and store
const result = await prisma.$plasmate.fetch('https://example.com');
// Search
const results = await prisma.$plasmate.search('query');
// Get SOM directly
const som = await prisma.$plasmate.getSom('https://example.com');
// Statistics
const stats = await prisma.$plasmate.getStats();
Crawl Sessions
Group related fetches together:
const client = createPlasmaPrismaClient();
// Create session
const session = await client.createSession('docs-crawl', {
source: 'documentation',
version: '2.0',
});
// Fetch with session
await client.batchFetchAndStore(urls, {
crawlSessionId: session.id,
});
// Query session content
const results = await client.search('api', {
crawlSessionId: session.id,
});
// Complete session
await client.completeSession(session.id, 'COMPLETED');
Direct Prisma Queries
Access the underlying Prisma client for custom queries:
const client = createPlasmaPrismaClient();
// Get content with high compression
const efficient = await client.db.webContent.findMany({
where: {
compressionRatio: { gte: 20 },
},
orderBy: { compressionRatio: 'desc' },
take: 10,
});
// Find pages with specific links
const pages = await client.db.webContent.findMany({
where: {
outboundLinks: {
some: {
href: { contains: 'github.com' },
},
},
},
include: {
outboundLinks: true,
},
});
Schema Helpers
Generate schema programmatically:
import { generateSchema, PostgresFullTextIndex } from 'prisma-plasmate';
// Generate complete schema
const schema = generateSchema({
provider: 'postgresql',
includeLinks: true,
includeSessions: true,
});
// Get PostgreSQL full-text search SQL
console.log(PostgresFullTextIndex);
Type Safety
All operations are fully typed:
import type {
SOMResponse,
FetchResult,
SearchResult,
ContentStats,
} from 'prisma-plasmate';
async function processContent(result: FetchResult) {
console.log(result.somTokens); // number
console.log(result.title); // string | undefined
}
Token Compression
Plasmate converts HTML to a Semantic Object Model (SOM), reducing token usage by 10-100x:
const result = await client.fetchAndStore('https://docs.example.com/api');
console.log(`HTML tokens: ${result.htmlTokens}`); // ~50,000
console.log(`SOM tokens: ${result.somTokens}`); // ~2,500
console.log(`Compression: ${result.compressionRatio}x`); // 20x
This makes it practical to store and query web content for AI applications without exceeding context limits.
Full-Text Search
PostgreSQL
Enable PostgreSQL full-text search:
-- Add GIN index
CREATE INDEX web_content_text_search_idx
ON "WebContent"
USING GIN (to_tsvector('english', coalesce("textContent", '')));
SQLite
For SQLite, use FTS5:
import { SqliteFullTextIndex } from 'prisma-plasmate';
// Run the SQL to set up FTS
await prisma.$executeRawUnsafe(SqliteFullTextIndex);
API Reference
PlasmaPrismaClient
| Method | Description |
|---|---|
fetchAndStore(url, options?) | Fetch URL and store SOM |
batchFetchAndStore(urls, options?) | Batch fetch with concurrency |
search(query, options?) | Search stored content |
getByUrl(url, sessionId?) | Get content by URL |
getSom(url, sessionId?) | Get raw SOM for URL |
createSession(name?, metadata?) | Create crawl session |
completeSession(id, status?) | Mark session complete |
getStats(sessionId?) | Get token statistics |
pruneOldContent(olderThan) | Delete old content |
disconnect() | Close database connection |
Prisma Extension ($plasmate)
| Method | Description |
|---|---|
fetch(url, options?) | Fetch and store URL |
search(query, options?) | Search stored content |
getSom(url) | Get SOM for URL |
getStats() | Get statistics |
delete(url) | Delete content by URL |
License
MIT