Graphiti-Plasmate

April 12, 2026 ยท View on GitHub

Build knowledge graphs from web pages with 10-100x token compression.

This integration combines Plasmate's efficient HTML-to-SOM (Semantic Object Model) conversion with Graphiti's real-time knowledge graph capabilities for AI agents.

Features

  • Efficient Web Loading: Uses Plasmate CLI for 10-100x token compression vs raw HTML
  • Entity Extraction: Automatically extracts organizations, people, topics, and more
  • Relationship Mapping: Captures links, hierarchies, and semantic connections
  • Episodic Memory: Track browsing sessions with temporal queries
  • Graphiti Compatible: Direct export to Graphiti knowledge graph format

Installation

pip install graphiti-plasmate

Prerequisites

Quick Start

import asyncio
from graphiti_plasmate import PlasmateGraphLoader

async def main():
    # Initialize loader
    loader = PlasmateGraphLoader()
    
    # Load a single URL
    result = await loader.load("https://example.com")
    
    print(f"Found {len(result.entities)} entities")
    print(f"Found {len(result.relationships)} relationships")
    
    # Access the SOM
    print(f"Page title: {result.som.get('title')}")

asyncio.run(main())

Loading Multiple Pages

from graphiti_plasmate import PlasmateGraphLoader

loader = PlasmateGraphLoader()

# Load multiple URLs concurrently
result = loader.load_batch_sync([
    "https://example.com",
    "https://example.org",
    "https://httpbin.org/html",
], max_concurrent=5)

print(f"Loaded {len(result.successful)} pages")
print(f"Total entities: {len(result.entities)}")
print(f"Total relationships: {len(result.relationships)}")

Entity Extraction

The extractor identifies:

  • WebPage: Each loaded URL becomes a page entity
  • Topic: Extracted from headings (h1-h6)
  • Organization: Detected via pattern matching and structured data
  • Person: From JSON-LD schema.org data
  • Article: From structured article metadata
  • Email: Email addresses found in content
from graphiti_plasmate import extract_entities_from_som

# Extract entities from SOM
entities = extract_entities_from_som(som_data, source_url)

for entity in entities:
    print(f"[{entity.entity_type}] {entity.name}")

Relationship Types

  • LINKS_TO: Hyperlink relationships between pages
  • CONTAINS: Page contains topic/entity relationships
  • HIERARCHY: Section/heading hierarchy relationships

Episodic Memory

Track browsing sessions for temporal queries:

from pathlib import Path
from graphiti_plasmate import EpisodicMemory, PlasmateGraphLoader

# Initialize with persistent storage
memory = EpisodicMemory(storage_path=Path("./browsing_history.json"))
loader = PlasmateGraphLoader()

# Start a session
session_id = memory.start_session()

# Load and record pages
result = await loader.load("https://example.com")
episode = memory.record_episode(
    session_id=session_id,
    url=result.url,
    som=result.som,
    entities=result.entities,
    relationships=result.relationships,
)

# Query recent history
recent = memory.query_recent(hours=24)

# Query by domain
github_pages = memory.query_by_domain("github.com")

# Query by entity
pages_with_entity = memory.query_by_entity("Anthropic", entity_type="Organization")

# End session
memory.end_session()

Graphiti Integration

Export to Graphiti-compatible format:

from graphiti_plasmate import build_knowledge_graph

# Build knowledge graph
graph = build_knowledge_graph(entities, relationships)

# Access Graphiti-formatted data
nodes = graph["graphiti_nodes"]
edges = graph["graphiti_edges"]

# Use with Graphiti
from graphiti_core import Graphiti

graphiti = Graphiti(neo4j_uri, neo4j_user, neo4j_password)

# Add nodes and edges to Graphiti
for node in nodes:
    await graphiti.add_entity(node)

Graph Visualization

Export to NetworkX

import networkx as nx

def to_networkx(entities, relationships):
    G = nx.DiGraph()
    
    for entity in entities:
        G.add_node(entity.id, **{
            "name": entity.name,
            "type": entity.entity_type,
        })
    
    for rel in relationships:
        G.add_edge(rel.source_id, rel.target_id, **{
            "type": rel.relationship_type,
        })
    
    return G

# Visualize with matplotlib
import matplotlib.pyplot as plt

G = to_networkx(result.entities, result.relationships)
nx.draw(G, with_labels=True, node_color='lightblue')
plt.show()

Export to JSON for D3.js

import json

def to_d3_json(entities, relationships):
    return {
        "nodes": [{"id": e.id, "name": e.name, "group": e.entity_type} for e in entities],
        "links": [{"source": r.source_id, "target": r.target_id, "type": r.relationship_type} for r in relationships],
    }

d3_data = to_d3_json(result.entities, result.relationships)
with open("graph.json", "w") as f:
    json.dump(d3_data, f)

Configuration

Custom Plasmate Path

loader = PlasmateGraphLoader(
    plasmate_path="/path/to/plasmate",
    timeout=60.0,
)

Default Headers

loader = PlasmateGraphLoader(
    headers={
        "Authorization": "Bearer token",
        "User-Agent": "MyBot/1.0",
    }
)

Per-Request Headers

result = await loader.load(
    "https://api.example.com/page",
    headers={"X-Custom-Header": "value"}
)

API Reference

PlasmateGraphLoader

MethodDescription
load(url, headers)Load single URL asynchronously
load_batch(urls, headers, max_concurrent)Load multiple URLs concurrently
load_sync(url, headers)Synchronous single URL load
load_batch_sync(urls, headers, max_concurrent)Synchronous batch load

EpisodicMemory

MethodDescription
start_session(metadata)Start a new browsing session
end_session(session_id)End a session
record_episode(...)Record a page visit
query_recent(hours, limit)Query recent episodes
query_by_domain(domain, limit)Filter by domain
query_by_entity(name, type)Find pages with entity
query_temporal_range(start, end)Query time range
export_to_graphiti()Export full graph

License

MIT