prisma-plasmate

April 12, 2026 · View on GitHub

Prisma integration for Plasmate - the browser engine for AI agents.

Store and query web content with 10-100x token compression using Plasmate's Semantic Object Model (SOM).

Features

Token Compression: Store web content as SOM JSON with 10-100x fewer tokens than raw HTML
Type-Safe Queries: Full TypeScript support with Prisma's type safety
Batch Processing: Efficiently fetch and store multiple URLs with concurrency control
Full-Text Search: Query stored content with text search
Crawl Sessions: Group related fetches for organized data management
Link Extraction: Automatically extract and store page relationships
Caching: Skip refetching recently stored content

Installation

npm install prisma-plasmate @prisma/client
npm install -D prisma

You also need Plasmate installed:

cargo install plasmate
# or
brew install plasmate

Quick Start

1. Add Schema Models

Add the Plasmate models to your prisma/schema.prisma:

model WebContent {
  id             String   @id @default(cuid())
  url            String
  canonicalUrl   String?
  title          String?
  description    String?
  som            Json
  textContent    String?
  htmlTokens     Int?
  somTokens      Int?
  compressionRatio Float?
  statusCode     Int?
  contentType    String?
  headers        Json?
  fetchedAt      DateTime @default(now())
  updatedAt      DateTime @updatedAt

  crawlSession   CrawlSession? @relation(fields: [crawlSessionId], references: [id])
  crawlSessionId String?
  outboundLinks  Link[] @relation("SourceLinks")
  inboundLinks   Link[] @relation("TargetLinks")

  @@unique([url, crawlSessionId])
  @@index([url])
  @@index([fetchedAt])
  @@index([crawlSessionId])
}

model CrawlSession {
  id          String      @id @default(cuid())
  name        String?
  startedAt   DateTime    @default(now())
  completedAt DateTime?
  status      CrawlStatus @default(RUNNING)
  metadata    Json?
  contents    WebContent[]

  @@index([status])
  @@index([startedAt])
}

model Link {
  id       String      @id @default(cuid())
  href     String
  text     String?
  rel      String?
  source   WebContent  @relation("SourceLinks", fields: [sourceId], references: [id], onDelete: Cascade)
  sourceId String
  target   WebContent? @relation("TargetLinks", fields: [targetId], references: [id], onDelete: SetNull)
  targetId String?

  @@index([sourceId])
  @@index([targetId])
  @@index([href])
}

enum CrawlStatus {
  RUNNING
  COMPLETED
  FAILED
  CANCELLED
}

2. Run Migrations

npx prisma migrate dev --name add-web-content

3. Fetch and Store Content

import { createPlasmaPrismaClient } from 'prisma-plasmate';

const client = createPlasmaPrismaClient();

// Fetch a URL and store it
const result = await client.fetchAndStore('https://example.com');
console.log(`Stored: ${result.title}`);
console.log(`SOM tokens: ${result.somTokens}`);

// Search stored content
const results = await client.search('typescript');
for (const item of results) {
  console.log(`${item.title}: ${item.url}`);
}

await client.disconnect();

Usage

PlasmaPrismaClient

The main client class provides high-level methods for web content operations:

import { createPlasmaPrismaClient } from 'prisma-plasmate';

const client = createPlasmaPrismaClient({
  plasmate: {
    binaryPath: 'plasmate',  // Path to plasmate CLI
    timeout: 30000,          // Request timeout
    defaultHeaders: {        // Headers for all requests
      'User-Agent': 'MyBot/1.0',
    },
  },
});

// Fetch single URL
const result = await client.fetchAndStore('https://docs.example.com', {
  headers: { 'Authorization': 'Bearer token' },
  cacheFor: 60 * 60 * 1000, // Don't refetch within 1 hour
});

// Batch fetch with progress
const batchResult = await client.batchFetchAndStore(urls, {
  concurrency: 5,
  continueOnError: true,
  onProgress: (done, total, url) => {
    console.log(`[${done}/${total}] ${url}`);
  },
});

// Search content
const results = await client.search('react hooks', {
  limit: 20,
  urlPattern: 'reactjs.org',
});

// Get statistics
const stats = await client.getStats();
console.log(`Token savings: ${stats.tokensSaved}`);

Prisma Extension

For native Prisma integration, use the extension API:

import { PrismaClient } from '@prisma/client';
import { plasmateExtension } from 'prisma-plasmate';

const prisma = new PrismaClient().$extends(plasmateExtension());

// Fetch and store
const result = await prisma.$plasmate.fetch('https://example.com');

// Search
const results = await prisma.$plasmate.search('query');

// Get SOM directly
const som = await prisma.$plasmate.getSom('https://example.com');

// Statistics
const stats = await prisma.$plasmate.getStats();

Crawl Sessions

Group related fetches together:

const client = createPlasmaPrismaClient();

// Create session
const session = await client.createSession('docs-crawl', {
  source: 'documentation',
  version: '2.0',
});

// Fetch with session
await client.batchFetchAndStore(urls, {
  crawlSessionId: session.id,
});

// Query session content
const results = await client.search('api', {
  crawlSessionId: session.id,
});

// Complete session
await client.completeSession(session.id, 'COMPLETED');

Direct Prisma Queries

Access the underlying Prisma client for custom queries:

const client = createPlasmaPrismaClient();

// Get content with high compression
const efficient = await client.db.webContent.findMany({
  where: {
    compressionRatio: { gte: 20 },
  },
  orderBy: { compressionRatio: 'desc' },
  take: 10,
});

// Find pages with specific links
const pages = await client.db.webContent.findMany({
  where: {
    outboundLinks: {
      some: {
        href: { contains: 'github.com' },
      },
    },
  },
  include: {
    outboundLinks: true,
  },
});

Schema Helpers

Generate schema programmatically:

import { generateSchema, PostgresFullTextIndex } from 'prisma-plasmate';

// Generate complete schema
const schema = generateSchema({
  provider: 'postgresql',
  includeLinks: true,
  includeSessions: true,
});

// Get PostgreSQL full-text search SQL
console.log(PostgresFullTextIndex);

Type Safety

All operations are fully typed:

import type {
  SOMResponse,
  FetchResult,
  SearchResult,
  ContentStats,
} from 'prisma-plasmate';

async function processContent(result: FetchResult) {
  console.log(result.somTokens); // number
  console.log(result.title);     // string | undefined
}

Token Compression

Plasmate converts HTML to a Semantic Object Model (SOM), reducing token usage by 10-100x:

const result = await client.fetchAndStore('https://docs.example.com/api');

console.log(`HTML tokens: ${result.htmlTokens}`);     // ~50,000
console.log(`SOM tokens: ${result.somTokens}`);       // ~2,500
console.log(`Compression: ${result.compressionRatio}x`); // 20x

This makes it practical to store and query web content for AI applications without exceeding context limits.

Full-Text Search

PostgreSQL

Enable PostgreSQL full-text search:

-- Add GIN index
CREATE INDEX web_content_text_search_idx
ON "WebContent"
USING GIN (to_tsvector('english', coalesce("textContent", '')));

SQLite

For SQLite, use FTS5:

import { SqliteFullTextIndex } from 'prisma-plasmate';

// Run the SQL to set up FTS
await prisma.$executeRawUnsafe(SqliteFullTextIndex);

API Reference

PlasmaPrismaClient

Method	Description
`fetchAndStore(url, options?)`	Fetch URL and store SOM
`batchFetchAndStore(urls, options?)`	Batch fetch with concurrency
`search(query, options?)`	Search stored content
`getByUrl(url, sessionId?)`	Get content by URL
`getSom(url, sessionId?)`	Get raw SOM for URL
`createSession(name?, metadata?)`	Create crawl session
`completeSession(id, status?)`	Mark session complete
`getStats(sessionId?)`	Get token statistics
`pruneOldContent(olderThan)`	Delete old content
`disconnect()`	Close database connection

Prisma Extension ($plasmate)

Method	Description
`fetch(url, options?)`	Fetch and store URL
`search(query, options?)`	Search stored content
`getSom(url)`	Get SOM for URL
`getStats()`	Get statistics
`delete(url)`	Delete content by URL

License

MIT