Graphiti-Plasmate

April 12, 2026 · View on GitHub

Build knowledge graphs from web pages with 10-100x token compression.

This integration combines Plasmate's efficient HTML-to-SOM (Semantic Object Model) conversion with Graphiti's real-time knowledge graph capabilities for AI agents.

Features

Efficient Web Loading: Uses Plasmate CLI for 10-100x token compression vs raw HTML
Entity Extraction: Automatically extracts organizations, people, topics, and more
Relationship Mapping: Captures links, hierarchies, and semantic connections
Episodic Memory: Track browsing sessions with temporal queries
Graphiti Compatible: Direct export to Graphiti knowledge graph format

Installation

pip install graphiti-plasmate

Prerequisites

Python 3.10+
Plasmate CLI installed and in PATH
Graphiti (graphiti-core)

Quick Start

import asyncio
from graphiti_plasmate import PlasmateGraphLoader

async def main():
    # Initialize loader
    loader = PlasmateGraphLoader()
    
    # Load a single URL
    result = await loader.load("https://example.com")
    
    print(f"Found {len(result.entities)} entities")
    print(f"Found {len(result.relationships)} relationships")
    
    # Access the SOM
    print(f"Page title: {result.som.get('title')}")

asyncio.run(main())

Loading Multiple Pages

from graphiti_plasmate import PlasmateGraphLoader

loader = PlasmateGraphLoader()

# Load multiple URLs concurrently
result = loader.load_batch_sync([
    "https://example.com",
    "https://example.org",
    "https://httpbin.org/html",
], max_concurrent=5)

print(f"Loaded {len(result.successful)} pages")
print(f"Total entities: {len(result.entities)}")
print(f"Total relationships: {len(result.relationships)}")

Entity Extraction

The extractor identifies:

WebPage: Each loaded URL becomes a page entity
Topic: Extracted from headings (h1-h6)
Organization: Detected via pattern matching and structured data
Person: From JSON-LD schema.org data
Article: From structured article metadata
Email: Email addresses found in content

from graphiti_plasmate import extract_entities_from_som

# Extract entities from SOM
entities = extract_entities_from_som(som_data, source_url)

for entity in entities:
    print(f"[{entity.entity_type}] {entity.name}")

Relationship Types

LINKS_TO: Hyperlink relationships between pages
CONTAINS: Page contains topic/entity relationships
HIERARCHY: Section/heading hierarchy relationships

Episodic Memory

Track browsing sessions for temporal queries:

from pathlib import Path
from graphiti_plasmate import EpisodicMemory, PlasmateGraphLoader

# Initialize with persistent storage
memory = EpisodicMemory(storage_path=Path("./browsing_history.json"))
loader = PlasmateGraphLoader()

# Start a session
session_id = memory.start_session()

# Load and record pages
result = await loader.load("https://example.com")
episode = memory.record_episode(
    session_id=session_id,
    url=result.url,
    som=result.som,
    entities=result.entities,
    relationships=result.relationships,
)

# Query recent history
recent = memory.query_recent(hours=24)

# Query by domain
github_pages = memory.query_by_domain("github.com")

# Query by entity
pages_with_entity = memory.query_by_entity("Anthropic", entity_type="Organization")

# End session
memory.end_session()

Graphiti Integration

Export to Graphiti-compatible format:

from graphiti_plasmate import build_knowledge_graph

# Build knowledge graph
graph = build_knowledge_graph(entities, relationships)

# Access Graphiti-formatted data
nodes = graph["graphiti_nodes"]
edges = graph["graphiti_edges"]

# Use with Graphiti
from graphiti_core import Graphiti

graphiti = Graphiti(neo4j_uri, neo4j_user, neo4j_password)

# Add nodes and edges to Graphiti
for node in nodes:
    await graphiti.add_entity(node)

Graph Visualization

Export to NetworkX

import networkx as nx

def to_networkx(entities, relationships):
    G = nx.DiGraph()
    
    for entity in entities:
        G.add_node(entity.id, **{
            "name": entity.name,
            "type": entity.entity_type,
        })
    
    for rel in relationships:
        G.add_edge(rel.source_id, rel.target_id, **{
            "type": rel.relationship_type,
        })
    
    return G

# Visualize with matplotlib
import matplotlib.pyplot as plt

G = to_networkx(result.entities, result.relationships)
nx.draw(G, with_labels=True, node_color='lightblue')
plt.show()

Export to JSON for D3.js

import json

def to_d3_json(entities, relationships):
    return {
        "nodes": [{"id": e.id, "name": e.name, "group": e.entity_type} for e in entities],
        "links": [{"source": r.source_id, "target": r.target_id, "type": r.relationship_type} for r in relationships],
    }

d3_data = to_d3_json(result.entities, result.relationships)
with open("graph.json", "w") as f:
    json.dump(d3_data, f)

Configuration

Custom Plasmate Path

loader = PlasmateGraphLoader(
    plasmate_path="/path/to/plasmate",
    timeout=60.0,
)

Default Headers

loader = PlasmateGraphLoader(
    headers={
        "Authorization": "Bearer token",
        "User-Agent": "MyBot/1.0",
    }
)

Per-Request Headers

result = await loader.load(
    "https://api.example.com/page",
    headers={"X-Custom-Header": "value"}
)

API Reference

PlasmateGraphLoader

Method	Description
`load(url, headers)`	Load single URL asynchronously
`load_batch(urls, headers, max_concurrent)`	Load multiple URLs concurrently
`load_sync(url, headers)`	Synchronous single URL load
`load_batch_sync(urls, headers, max_concurrent)`	Synchronous batch load

EpisodicMemory

Method	Description
`start_session(metadata)`	Start a new browsing session
`end_session(session_id)`	End a session
`record_episode(...)`	Record a page visit
`query_recent(hours, limit)`	Query recent episodes
`query_by_domain(domain, limit)`	Filter by domain
`query_by_entity(name, type)`	Find pages with entity
`query_temporal_range(start, end)`	Query time range
`export_to_graphiti()`	Export full graph

License

MIT