Graphiti-Plasmate
April 12, 2026 ยท View on GitHub
Build knowledge graphs from web pages with 10-100x token compression.
This integration combines Plasmate's efficient HTML-to-SOM (Semantic Object Model) conversion with Graphiti's real-time knowledge graph capabilities for AI agents.
Features
- Efficient Web Loading: Uses Plasmate CLI for 10-100x token compression vs raw HTML
- Entity Extraction: Automatically extracts organizations, people, topics, and more
- Relationship Mapping: Captures links, hierarchies, and semantic connections
- Episodic Memory: Track browsing sessions with temporal queries
- Graphiti Compatible: Direct export to Graphiti knowledge graph format
Installation
pip install graphiti-plasmate
Prerequisites
- Python 3.10+
- Plasmate CLI installed and in PATH
- Graphiti (graphiti-core)
Quick Start
import asyncio
from graphiti_plasmate import PlasmateGraphLoader
async def main():
# Initialize loader
loader = PlasmateGraphLoader()
# Load a single URL
result = await loader.load("https://example.com")
print(f"Found {len(result.entities)} entities")
print(f"Found {len(result.relationships)} relationships")
# Access the SOM
print(f"Page title: {result.som.get('title')}")
asyncio.run(main())
Loading Multiple Pages
from graphiti_plasmate import PlasmateGraphLoader
loader = PlasmateGraphLoader()
# Load multiple URLs concurrently
result = loader.load_batch_sync([
"https://example.com",
"https://example.org",
"https://httpbin.org/html",
], max_concurrent=5)
print(f"Loaded {len(result.successful)} pages")
print(f"Total entities: {len(result.entities)}")
print(f"Total relationships: {len(result.relationships)}")
Entity Extraction
The extractor identifies:
- WebPage: Each loaded URL becomes a page entity
- Topic: Extracted from headings (h1-h6)
- Organization: Detected via pattern matching and structured data
- Person: From JSON-LD schema.org data
- Article: From structured article metadata
- Email: Email addresses found in content
from graphiti_plasmate import extract_entities_from_som
# Extract entities from SOM
entities = extract_entities_from_som(som_data, source_url)
for entity in entities:
print(f"[{entity.entity_type}] {entity.name}")
Relationship Types
- LINKS_TO: Hyperlink relationships between pages
- CONTAINS: Page contains topic/entity relationships
- HIERARCHY: Section/heading hierarchy relationships
Episodic Memory
Track browsing sessions for temporal queries:
from pathlib import Path
from graphiti_plasmate import EpisodicMemory, PlasmateGraphLoader
# Initialize with persistent storage
memory = EpisodicMemory(storage_path=Path("./browsing_history.json"))
loader = PlasmateGraphLoader()
# Start a session
session_id = memory.start_session()
# Load and record pages
result = await loader.load("https://example.com")
episode = memory.record_episode(
session_id=session_id,
url=result.url,
som=result.som,
entities=result.entities,
relationships=result.relationships,
)
# Query recent history
recent = memory.query_recent(hours=24)
# Query by domain
github_pages = memory.query_by_domain("github.com")
# Query by entity
pages_with_entity = memory.query_by_entity("Anthropic", entity_type="Organization")
# End session
memory.end_session()
Graphiti Integration
Export to Graphiti-compatible format:
from graphiti_plasmate import build_knowledge_graph
# Build knowledge graph
graph = build_knowledge_graph(entities, relationships)
# Access Graphiti-formatted data
nodes = graph["graphiti_nodes"]
edges = graph["graphiti_edges"]
# Use with Graphiti
from graphiti_core import Graphiti
graphiti = Graphiti(neo4j_uri, neo4j_user, neo4j_password)
# Add nodes and edges to Graphiti
for node in nodes:
await graphiti.add_entity(node)
Graph Visualization
Export to NetworkX
import networkx as nx
def to_networkx(entities, relationships):
G = nx.DiGraph()
for entity in entities:
G.add_node(entity.id, **{
"name": entity.name,
"type": entity.entity_type,
})
for rel in relationships:
G.add_edge(rel.source_id, rel.target_id, **{
"type": rel.relationship_type,
})
return G
# Visualize with matplotlib
import matplotlib.pyplot as plt
G = to_networkx(result.entities, result.relationships)
nx.draw(G, with_labels=True, node_color='lightblue')
plt.show()
Export to JSON for D3.js
import json
def to_d3_json(entities, relationships):
return {
"nodes": [{"id": e.id, "name": e.name, "group": e.entity_type} for e in entities],
"links": [{"source": r.source_id, "target": r.target_id, "type": r.relationship_type} for r in relationships],
}
d3_data = to_d3_json(result.entities, result.relationships)
with open("graph.json", "w") as f:
json.dump(d3_data, f)
Configuration
Custom Plasmate Path
loader = PlasmateGraphLoader(
plasmate_path="/path/to/plasmate",
timeout=60.0,
)
Default Headers
loader = PlasmateGraphLoader(
headers={
"Authorization": "Bearer token",
"User-Agent": "MyBot/1.0",
}
)
Per-Request Headers
result = await loader.load(
"https://api.example.com/page",
headers={"X-Custom-Header": "value"}
)
API Reference
PlasmateGraphLoader
| Method | Description |
|---|---|
load(url, headers) | Load single URL asynchronously |
load_batch(urls, headers, max_concurrent) | Load multiple URLs concurrently |
load_sync(url, headers) | Synchronous single URL load |
load_batch_sync(urls, headers, max_concurrent) | Synchronous batch load |
EpisodicMemory
| Method | Description |
|---|---|
start_session(metadata) | Start a new browsing session |
end_session(session_id) | End a session |
record_episode(...) | Record a page visit |
query_recent(hours, limit) | Query recent episodes |
query_by_domain(domain, limit) | Filter by domain |
query_by_entity(name, type) | Find pages with entity |
query_temporal_range(start, end) | Query time range |
export_to_graphiti() | Export full graph |
License
MIT