cognee_py
June 26, 2026 · View on GitHub
Python bindings for the cognee-rs pipeline engine, built with PyO3.
Cognee transforms raw text, files, and URLs into a persistent, queryable knowledge graph via a three-stage pipeline: add (ingest) → cognify (extract) → search (retrieve).
Installation
cognee-py is not yet published to PyPI — build it from this repository
with maturin:
cd python
maturin develop # builds the native extension into the active venv
A
pip install cognee-pystep will be documented here once the wheel is published.
Quick start
import asyncio
import json
from cognee_py import Cognee, SearchType
async def main():
cognee = Cognee() # optionally pass json.dumps(settings) to override defaults
await cognee.warm() # build engines and resolve the default user
await cognee.add(
{"type": "text", "text": "Cognee turns data into a knowledge graph."},
"main_dataset", # dataset_name is required
)
await cognee.cognify("main_dataset") # dataset_name is required
result = await cognee.search(
"What does cognee do?",
{"search_type": SearchType.GRAPH_COMPLETION},
)
print(result)
asyncio.run(main())
Set environment variables before running:
export OPENAI_URL=https://api.openai.com/v1
export OPENAI_TOKEN=sk-...
export MOCK_EMBEDDING=true # skip ONNX model download for quick tests
Upstream-compatible module-level API
An upstream-compatible API is available via cognee_py.compat for
drop-in replacement of the Python cognee SDK:
import asyncio
from cognee_py import compat as cognee, SearchType
async def main():
await cognee.add("Cognee turns data into a knowledge graph.", "main_dataset")
await cognee.cognify("main_dataset")
result = await cognee.search("What does cognee do?", SearchType.GRAPH_COMPLETION)
print(result)
asyncio.run(main())
The compat module above is the supported drop-in alias. The
cognee-py distribution intentionally does not install a top-level
cognee package, so it never shadows the upstream Python cognee package. If
you want import cognee to work directly, vendor the shim shipped in the repo
(python/cognee/) into your project.
Examples
Runnable example scripts are in the examples/ directory. Each
script validates required env vars up front and exits 0 with a clear SKIP
message when they are absent, so all examples are safe to run in CI without
credentials.
| Script | Run command | What it covers |
|---|---|---|
add_cognify_search.py | python examples/add_cognify_search.py | Core add → cognify → search pipeline |
memify_recall.py | python examples/memify_recall.py | Triplet embeddings (memify) + session recall |
datasets.py | python examples/datasets.py | Dataset listing, status, deletion |
sessions.py | python examples/sessions.py | QA history, feedback, graph-context snapshots |
config.py | python examples/config.py | Programmatic config (LLM / embedding / DBs) |
visualize.py | python examples/visualize.py | Render knowledge graph to HTML |
All examples read LLM credentials from the environment. Set MOCK_EMBEDDING=true
to skip the ONNX model download and use mock embeddings (fast, no GPU required):
export OPENAI_URL=https://api.openai.com/v1
export OPENAI_TOKEN=sk-...
export MOCK_EMBEDDING=true
cd python && python examples/add_cognify_search.py
Configuration
Programmatic config
Pass a JSON settings string to the Cognee constructor to override env-derived
defaults. The overlay order is: compiled-in defaults < env vars < constructor argument.
import json
from cognee_py import Cognee
cognee = Cognee(json.dumps({
"llm_endpoint": "https://api.openai.com/v1",
"llm_api_key": "sk-...",
"llm_model": "gpt-4o-mini",
"embedding_provider": "openai",
"embedding_model": "text-embedding-3-small",
"embedding_dimensions": 1536,
}))
Generic key/value setters and config read-back via cognee.config:
# Set individual keys (typed value, or explicit string form):
cognee.config.set("llm_model", "gpt-4o")
cognee.config.set_str("llm_api_key", "sk-...")
# Read back the current config (secrets are redacted):
cfg = cognee.config.get()
print(cfg)
Environment variables
| Variable | Purpose |
|---|---|
OPENAI_URL | LLM API base URL (OpenAI-compatible endpoint). |
OPENAI_TOKEN | LLM API key. |
OPENAI_MODEL | LLM model name (default: gpt-4o-mini). |
EMBEDDING_PROVIDER | Embedding provider: openai, ollama, onnx, mock. |
EMBEDDING_MODEL | Embedding model name. |
EMBEDDING_DIMENSIONS | Embedding vector dimensions. |
EMBEDDING_ENDPOINT | Embedding API base URL (falls back to OPENAI_URL). |
EMBEDDING_API_KEY | Embedding API key (falls back to OPENAI_TOKEN). |
MOCK_EMBEDDING | Set true to use zero-vector mock embeddings (no model download). |
COGNEE_BINDING_SUPPRESS_LOGS | Suppress the auto-installed pyo3-log bridge. |
COGNEE_HOST_SDK | Set by an upstream/host cognee SDK to suppress this binding's analytics emission (avoids double-counting). |
RUST_LOG, LOG_LEVEL | Standard tracing-subscriber env-filter level overrides. |
COGNEE_LOG_*, LOG_FILE_NAME | Consumed by setup_logging() — see docs/configuration.md (Logging section). |
OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, OTEL_SERVICE_NAME and other OTEL_* vars | Consumed by setup_telemetry(). |
TELEMETRY_DISABLED, ENV | Honoured by setup_telemetry_analytics() via cognee_telemetry::env::is_disabled. |
Operations reference
Pipeline operations
add
Ingest one or more data items into a named dataset.
# Text
await cognee.add({"type": "text", "text": "…"}, "my-dataset")
# File
await cognee.add({"type": "file", "path": "/abs/path/to/doc.txt"}, "my-dataset")
# URL
await cognee.add({"type": "url", "url": "https://example.com/article"}, "my-dataset")
# Multiple items at once
await cognee.add([
{"type": "text", "text": "First document"},
{"type": "file", "path": "/abs/path/two.txt"},
], "my-dataset")
cognify
Extract entities and relationships into the knowledge graph.
await cognee.cognify("my-dataset")
add_and_cognify
Ingest and extract in a single call.
result = await cognee.add_and_cognify(
{"type": "text", "text": "…"},
"my-dataset",
)
Search and recall
search
Query the knowledge graph. Defaults to GRAPH_COMPLETION.
result = await cognee.search("What is the capital of France?")
# With options
result = await cognee.search(
"summarise recent events",
{"search_type": SearchType.SUMMARIES, "top_k": 5, "datasets": ["news"]},
)
All 15 search types are supported (via SearchType enum):
GRAPH_COMPLETION, SUMMARIES, CHUNKS, RAG_COMPLETION, TRIPLET_COMPLETION,
GRAPH_SUMMARY_COMPLETION, CYPHER, NATURAL_LANGUAGE, GRAPH_COMPLETION_COT,
GRAPH_COMPLETION_CONTEXT_EXTENSION, FEELING_LUCKY, FEEDBACK, TEMPORAL,
CODING_RULES, CHUNKS_LEXICAL.
recall
Session-first routing: checks session QA history before falling back to graph search.
result = await cognee.recall(
"What did we discuss?",
{"session_id": "session-uuid", "scope": "auto"},
)
Memory operations
remember
Composite add + cognify with an optional improvement pass.
await cognee.remember(
{"type": "text", "text": "…"},
"my-dataset",
{"self_improvement": True},
)
memify
Index triplet embeddings from the existing knowledge graph.
Enables TripletCompletion search. Idempotent.
await cognee.memify()
remember_entry
Store a single typed memory entry (QA pair, execution trace, or feedback record).
# QA entry
await cognee.remember_entry(
{"type": "qa", "question": "What?", "answer": "This."},
"my-dataset",
"session-uuid",
)
# Trace entry
await cognee.remember_entry(
{
"type": "trace",
"originFunction": "search",
"status": "ok",
"memoryQuery": "What?",
},
"my-dataset",
"session-uuid",
)
# Feedback entry
await cognee.remember_entry(
{"type": "feedback", "qaId": "qa-uuid", "feedbackText": "Accurate.", "feedbackScore": 5},
"my-dataset",
"session-uuid",
)
Entry field names are camelCase (the entry dict is passed through verbatim, so snake_case keys are not recognised). Supported entry types:
qa:question,answer,context,feedbackText,feedbackScore,usedGraphElementIds(all optional excepttype).trace:originFunction(required),status,methodParams,methodReturnValue,memoryQuery,memoryContext,errorMessage,generateFeedbackWithLlm.feedback:qaId(required),feedbackText,feedbackScore.
Unknown type values raise CogneeValidationError.
Optional opts dict supports a tenant key.
improve
Run the four-stage session-graph bridge pipeline.
await cognee.improve({
"dataset_name": "my-dataset",
"session_ids": ["session-uuid"],
})
Datasets
datasets = await cognee.datasets.list()
items = await cognee.datasets.list_data(dataset_id)
has_content = await cognee.datasets.has(dataset_id)
statuses = await cognee.datasets.status([id1, id2])
await cognee.datasets.empty(dataset_id)
await cognee.datasets.delete_data(dataset_id, data_id)
await cognee.datasets.delete_all()
Sessions
entries = await cognee.sessions.get("session-uuid", {"last_n": 10})
await cognee.sessions.add_feedback("session-uuid", "qa-uuid", "Great answer!", 5)
await cognee.sessions.delete_feedback("session-uuid", "qa-uuid")
ctx = await cognee.sessions.get_graph_context("session-uuid")
await cognee.sessions.set_graph_context("session-uuid", "new context")
Notebooks
notebooks = await cognee.notebooks.list()
for nb in notebooks:
print(nb["id"], nb["name"])
# Create a new notebook (cells defaults to empty list; deletable is always True)
nb = await cognee.notebooks.create("My Notebook")
nb = await cognee.notebooks.create("My Notebook", cells=[...])
# Update a notebook's name and/or cells
nb = await cognee.notebooks.update(notebook_id, {"name": "New Name"})
nb = await cognee.notebooks.update(notebook_id, {"cells": [...]})
# Delete by UUID — returns True if deleted, False if not found
deleted = await cognee.notebooks.delete(notebook_id)
cognee.notebooks is a CogneeNotebooks sub-object accessible on every Cognee
instance. notebook_id is a UUID string.
Users and pipeline-run admin
# Get or create the default user account
user = await cognee.get_or_create_default_user()
print(user["id"], user["email"])
# Reset the pipeline run status for a specific pipeline within a dataset
await cognee.reset_pipeline_run_status(dataset_id, pipeline_name)
# Reset all pipeline run statuses for a dataset
await cognee.reset_dataset_pipeline_run_status(dataset_id)
dataset_id is a UUID string. pipeline_name is the pipeline name string.
Both reset methods return None on success and raise CogneeValidationError if
dataset_id is not a valid UUID.
Data lifecycle
# Forget a single item
await cognee.forget({"kind": "item", "data_id": "uuid", "dataset": {"name": "my-dataset"}})
# Forget an entire dataset
await cognee.forget({"kind": "dataset", "dataset": {"name": "my-dataset"}})
# Forget everything
await cognee.forget({"kind": "all"})
# Replace a data item (delete → re-add → re-cognify)
await cognee.update("old-data-uuid", {"type": "text", "text": "updated content"}, "my-dataset")
# Remove all files from storage (metadata DB untouched)
await cognee.prune_data()
# Wipe graph, vector, metadata, and/or cache backends
await cognee.prune_system({"prune_graph": True, "prune_vector": True})
Visualisation
# Get the HTML string
html = await cognee.visualize()
# Write to a file (returns the absolute path)
path = await cognee.visualize_to_file({"destination_path": "/tmp/graph.html"})
Requires the visualization feature compiled into the native extension.
Cloud serve / disconnect are provided by the closed cognee-py-cloud package, not the OSS cognee_py package.
Initialisation and observability
import cognee_py
# Optional: file logging (reads COGNEE_LOG_*, LOG_FILE_NAME, LOG_LEVEL).
cognee_py.setup_logging()
# Optional: OTLP trace export (reads OTEL_* env vars).
cognee_py.setup_telemetry()
# Optional: product-analytics emission (returns True if armed).
armed = cognee_py.setup_telemetry_analytics()
print(f"analytics armed: {armed}")
When cognee_py is imported, a minimal default subscriber is installed:
a pyo3-log bridge that forwards every Rust tracing::* event into Python's
standard logging module under the cognee.* logger tree. The host's
logging.basicConfig / logging.dictConfig controls level, format, and handlers.
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("cognee").setLevel(logging.DEBUG)
import cognee_py # Rust spans now flow into Python logging
Set COGNEE_BINDING_SUPPRESS_LOGS=1 before importing cognee_py to
skip the default subscriber. The host then owns all subscriber setup.
Analytics defaults
Analytics emission is ON by default — Python-SDK parity. Analytics are
armed automatically on import (no explicit setup_telemetry_analytics() call
required); the function is still exposed and returns the effective state.
| Condition | Behaviour |
|---|---|
| No opt-out env set | Armed. setup_telemetry_analytics() returns True. |
TELEMETRY_DISABLED=<non-empty> | Suppressed. Returns False. |
ENV is test / dev | Suppressed. Returns False. |
COGNEE_HOST_SDK=<any non-empty> | Suppressed (defers to the host SDK). Returns False. |
Important — if you run this binding underneath another cognee SDK that is
already the canonical sender of send_telemetry events: set
COGNEE_HOST_SDK=<name> so this binding defers and you avoid double-counting.
Error handling
All async ops raise subclasses of CogneeError:
| Exception | Meaning |
|---|---|
CogneeValidationError | Invalid input (bad data descriptor, unknown config key, malformed settings JSON). |
CogneeComponentError | Component initialisation failed (DB connection, embedding model load). |
CogneeServiceBuildError | Service warm-up failed (engine could not be constructed). |
CogneeUserBootstrapError | Default user resolution failed during warm(). |
CogneeRuntimeError | Pipeline or search runtime failure. |
CogneeUnsupportedError | Operation not available for the current backend configuration. |
CogneeFeatureNotBuiltError | Feature was not compiled into this build (e.g. visualization). |
CogneeUnknownConfigKeyError | Unknown config key passed to config.set_* or constructor. |
CogneeConfigTypeMismatchError | Wrong type for a config value. |
CogneeError | Base class — catch this to handle any cognee error. |
from cognee_py import Cognee, CogneeError, CogneeValidationError
try:
result = await cognee.search("query", {"search_type": "INVALID_TYPE"})
except CogneeValidationError as e:
print(f"Bad input: {e}")
except CogneeError as e:
print(f"Cognee error: {e}")
Advanced: low-level pipeline engine
The original pipeline engine API is available directly from cognee_py
for advanced orchestration use-cases that do not need the high-level SDK:
import cognee_py
p = cognee_py.Pipeline("my pipeline")
# add_task accepts a plain callable directly (sync, async, generator, or async
# generator — auto-detected). The callable receives each input value.
p.add_task(lambda val: val, name="echo")
ctx = cognee_py.TaskContext.mock()
# execute() takes plain Python values and returns the last task's outputs.
[result] = await p.execute(["hello"], ctx)
All pipeline-engine symbols (Pipeline, TaskContext, PipelineRunHandle,
CancellationHandle, CancellationToken, cancellation_pair, ProgressToken,
Watcher) are available at the top level of cognee_py.
Troubleshooting
ImportErroroncognee_py— runmaturin develop(or install from PyPI) first.- Embedding model download on first run — set
MOCK_EMBEDDING=trueto skip it in tests. OPENAI_URL/OPENAI_TOKENnot set — all examples exit 0 with aSKIPmessage when these are absent; export them before running.- Analytics doubly-sent — analytics are ON by default; if you run this binding underneath another
cogneeSDK that already emitssend_telemetry, setCOGNEE_HOST_SDK=<name>so this binding defers.
References
- Examples: examples/
- Observability: docs/observability/opentelemetry.md, docs/observability/send_telemetry.md
- C API bindings: capi/README.md
- JS/TS bindings: ts/README.md
- cognee-rs workspace: README.md