cognee_py

June 26, 2026 · View on GitHub

Python bindings for the cognee-rs pipeline engine, built with PyO3.

Cognee transforms raw text, files, and URLs into a persistent, queryable knowledge graph via a three-stage pipeline: add (ingest) → cognify (extract) → search (retrieve).

Installation

cognee-py is not yet published to PyPI — build it from this repository with maturin:

cd python
maturin develop          # builds the native extension into the active venv

A pip install cognee-py step will be documented here once the wheel is published.

Quick start

import asyncio
import json
from cognee_py import Cognee, SearchType

async def main():
    cognee = Cognee()           # optionally pass json.dumps(settings) to override defaults
    await cognee.warm()         # build engines and resolve the default user
    await cognee.add(
        {"type": "text", "text": "Cognee turns data into a knowledge graph."},
        "main_dataset",         # dataset_name is required
    )
    await cognee.cognify("main_dataset")   # dataset_name is required
    result = await cognee.search(
        "What does cognee do?",
        {"search_type": SearchType.GRAPH_COMPLETION},
    )
    print(result)

asyncio.run(main())

Set environment variables before running:

export OPENAI_URL=https://api.openai.com/v1
export OPENAI_TOKEN=sk-...
export MOCK_EMBEDDING=true   # skip ONNX model download for quick tests

Upstream-compatible module-level API

An upstream-compatible API is available via cognee_py.compat for drop-in replacement of the Python cognee SDK:

import asyncio
from cognee_py import compat as cognee, SearchType

async def main():
    await cognee.add("Cognee turns data into a knowledge graph.", "main_dataset")
    await cognee.cognify("main_dataset")
    result = await cognee.search("What does cognee do?", SearchType.GRAPH_COMPLETION)
    print(result)

asyncio.run(main())

The compat module above is the supported drop-in alias. The cognee-py distribution intentionally does not install a top-level cognee package, so it never shadows the upstream Python cognee package. If you want import cognee to work directly, vendor the shim shipped in the repo (python/cognee/) into your project.

Examples

Runnable example scripts are in the examples/ directory. Each script validates required env vars up front and exits 0 with a clear SKIP message when they are absent, so all examples are safe to run in CI without credentials.

Script	Run command	What it covers
`add_cognify_search.py`	`python examples/add_cognify_search.py`	Core add → cognify → search pipeline
`memify_recall.py`	`python examples/memify_recall.py`	Triplet embeddings (memify) + session recall
`datasets.py`	`python examples/datasets.py`	Dataset listing, status, deletion
`sessions.py`	`python examples/sessions.py`	QA history, feedback, graph-context snapshots
`config.py`	`python examples/config.py`	Programmatic config (LLM / embedding / DBs)
`visualize.py`	`python examples/visualize.py`	Render knowledge graph to HTML

All examples read LLM credentials from the environment. Set MOCK_EMBEDDING=true to skip the ONNX model download and use mock embeddings (fast, no GPU required):

export OPENAI_URL=https://api.openai.com/v1
export OPENAI_TOKEN=sk-...
export MOCK_EMBEDDING=true
cd python && python examples/add_cognify_search.py

Configuration

Programmatic config

Pass a JSON settings string to the Cognee constructor to override env-derived defaults. The overlay order is: compiled-in defaults < env vars < constructor argument.

import json
from cognee_py import Cognee

cognee = Cognee(json.dumps({
    "llm_endpoint": "https://api.openai.com/v1",
    "llm_api_key": "sk-...",
    "llm_model": "gpt-4o-mini",
    "embedding_provider": "openai",
    "embedding_model": "text-embedding-3-small",
    "embedding_dimensions": 1536,
}))

Generic key/value setters and config read-back via cognee.config:

# Set individual keys (typed value, or explicit string form):
cognee.config.set("llm_model", "gpt-4o")
cognee.config.set_str("llm_api_key", "sk-...")

# Read back the current config (secrets are redacted):
cfg = cognee.config.get()
print(cfg)

Environment variables

Variable	Purpose
`OPENAI_URL`	LLM API base URL (OpenAI-compatible endpoint).
`OPENAI_TOKEN`	LLM API key.
`OPENAI_MODEL`	LLM model name (default: `gpt-4o-mini`).
`EMBEDDING_PROVIDER`	Embedding provider: `openai`, `ollama`, `onnx`, `mock`.
`EMBEDDING_MODEL`	Embedding model name.
`EMBEDDING_DIMENSIONS`	Embedding vector dimensions.
`EMBEDDING_ENDPOINT`	Embedding API base URL (falls back to `OPENAI_URL`).
`EMBEDDING_API_KEY`	Embedding API key (falls back to `OPENAI_TOKEN`).
`MOCK_EMBEDDING`	Set `true` to use zero-vector mock embeddings (no model download).
`COGNEE_BINDING_SUPPRESS_LOGS`	Suppress the auto-installed `pyo3-log` bridge.
`COGNEE_HOST_SDK`	Set by an upstream/host `cognee` SDK to suppress this binding's analytics emission (avoids double-counting).
`RUST_LOG`, `LOG_LEVEL`	Standard `tracing-subscriber` env-filter level overrides.
`COGNEE_LOG_*`, `LOG_FILE_NAME`	Consumed by `setup_logging()` — see docs/configuration.md (Logging section).
`OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_EXPORTER_OTLP_HEADERS`, `OTEL_SERVICE_NAME` and other `OTEL_*` vars	Consumed by `setup_telemetry()`.
`TELEMETRY_DISABLED`, `ENV`	Honoured by `setup_telemetry_analytics()` via `cognee_telemetry::env::is_disabled`.

Operations reference

Pipeline operations

add

Ingest one or more data items into a named dataset.

# Text
await cognee.add({"type": "text", "text": "…"}, "my-dataset")

# File
await cognee.add({"type": "file", "path": "/abs/path/to/doc.txt"}, "my-dataset")

# URL
await cognee.add({"type": "url", "url": "https://example.com/article"}, "my-dataset")

# Multiple items at once
await cognee.add([
    {"type": "text", "text": "First document"},
    {"type": "file", "path": "/abs/path/two.txt"},
], "my-dataset")

cognify

Extract entities and relationships into the knowledge graph.

await cognee.cognify("my-dataset")

add_and_cognify

Ingest and extract in a single call.

result = await cognee.add_and_cognify(
    {"type": "text", "text": "…"},
    "my-dataset",
)

Search and recall

search

Query the knowledge graph. Defaults to GRAPH_COMPLETION.

result = await cognee.search("What is the capital of France?")

# With options
result = await cognee.search(
    "summarise recent events",
    {"search_type": SearchType.SUMMARIES, "top_k": 5, "datasets": ["news"]},
)

All 15 search types are supported (via SearchType enum): GRAPH_COMPLETION, SUMMARIES, CHUNKS, RAG_COMPLETION, TRIPLET_COMPLETION, GRAPH_SUMMARY_COMPLETION, CYPHER, NATURAL_LANGUAGE, GRAPH_COMPLETION_COT, GRAPH_COMPLETION_CONTEXT_EXTENSION, FEELING_LUCKY, FEEDBACK, TEMPORAL, CODING_RULES, CHUNKS_LEXICAL.

recall

Session-first routing: checks session QA history before falling back to graph search.

result = await cognee.recall(
    "What did we discuss?",
    {"session_id": "session-uuid", "scope": "auto"},
)

Memory operations

remember

Composite add + cognify with an optional improvement pass.

await cognee.remember(
    {"type": "text", "text": "…"},
    "my-dataset",
    {"self_improvement": True},
)

memify

Index triplet embeddings from the existing knowledge graph. Enables TripletCompletion search. Idempotent.

await cognee.memify()

remember_entry

Store a single typed memory entry (QA pair, execution trace, or feedback record).

# QA entry
await cognee.remember_entry(
    {"type": "qa", "question": "What?", "answer": "This."},
    "my-dataset",
    "session-uuid",
)

# Trace entry
await cognee.remember_entry(
    {
        "type": "trace",
        "originFunction": "search",
        "status": "ok",
        "memoryQuery": "What?",
    },
    "my-dataset",
    "session-uuid",
)

# Feedback entry
await cognee.remember_entry(
    {"type": "feedback", "qaId": "qa-uuid", "feedbackText": "Accurate.", "feedbackScore": 5},
    "my-dataset",
    "session-uuid",
)

Entry field names are camelCase (the entry dict is passed through verbatim, so snake_case keys are not recognised). Supported entry types:

qa: question, answer, context, feedbackText, feedbackScore, usedGraphElementIds (all optional except type).
trace: originFunction (required), status, methodParams, methodReturnValue, memoryQuery, memoryContext, errorMessage, generateFeedbackWithLlm.
feedback: qaId (required), feedbackText, feedbackScore.

Unknown type values raise CogneeValidationError. Optional opts dict supports a tenant key.

improve

Run the four-stage session-graph bridge pipeline.

await cognee.improve({
    "dataset_name": "my-dataset",
    "session_ids": ["session-uuid"],
})

Datasets

datasets      = await cognee.datasets.list()
items         = await cognee.datasets.list_data(dataset_id)
has_content   = await cognee.datasets.has(dataset_id)
statuses      = await cognee.datasets.status([id1, id2])

await cognee.datasets.empty(dataset_id)
await cognee.datasets.delete_data(dataset_id, data_id)
await cognee.datasets.delete_all()

Sessions

entries = await cognee.sessions.get("session-uuid", {"last_n": 10})

await cognee.sessions.add_feedback("session-uuid", "qa-uuid", "Great answer!", 5)
await cognee.sessions.delete_feedback("session-uuid", "qa-uuid")

ctx = await cognee.sessions.get_graph_context("session-uuid")
await cognee.sessions.set_graph_context("session-uuid", "new context")

Notebooks

notebooks = await cognee.notebooks.list()
for nb in notebooks:
    print(nb["id"], nb["name"])

# Create a new notebook (cells defaults to empty list; deletable is always True)
nb = await cognee.notebooks.create("My Notebook")
nb = await cognee.notebooks.create("My Notebook", cells=[...])

# Update a notebook's name and/or cells
nb = await cognee.notebooks.update(notebook_id, {"name": "New Name"})
nb = await cognee.notebooks.update(notebook_id, {"cells": [...]})

# Delete by UUID — returns True if deleted, False if not found
deleted = await cognee.notebooks.delete(notebook_id)

cognee.notebooks is a CogneeNotebooks sub-object accessible on every Cognee instance. notebook_id is a UUID string.

Users and pipeline-run admin

# Get or create the default user account
user = await cognee.get_or_create_default_user()
print(user["id"], user["email"])

# Reset the pipeline run status for a specific pipeline within a dataset
await cognee.reset_pipeline_run_status(dataset_id, pipeline_name)

# Reset all pipeline run statuses for a dataset
await cognee.reset_dataset_pipeline_run_status(dataset_id)

dataset_id is a UUID string. pipeline_name is the pipeline name string. Both reset methods return None on success and raise CogneeValidationError if dataset_id is not a valid UUID.

Data lifecycle

# Forget a single item
await cognee.forget({"kind": "item", "data_id": "uuid", "dataset": {"name": "my-dataset"}})

# Forget an entire dataset
await cognee.forget({"kind": "dataset", "dataset": {"name": "my-dataset"}})

# Forget everything
await cognee.forget({"kind": "all"})

# Replace a data item (delete → re-add → re-cognify)
await cognee.update("old-data-uuid", {"type": "text", "text": "updated content"}, "my-dataset")

# Remove all files from storage (metadata DB untouched)
await cognee.prune_data()

# Wipe graph, vector, metadata, and/or cache backends
await cognee.prune_system({"prune_graph": True, "prune_vector": True})

Visualisation

# Get the HTML string
html = await cognee.visualize()

# Write to a file (returns the absolute path)
path = await cognee.visualize_to_file({"destination_path": "/tmp/graph.html"})

Requires the visualization feature compiled into the native extension.

Cloud serve / disconnect are provided by the closed cognee-py-cloud package, not the OSS cognee_py package.

Initialisation and observability

import cognee_py

# Optional: file logging (reads COGNEE_LOG_*, LOG_FILE_NAME, LOG_LEVEL).
cognee_py.setup_logging()

# Optional: OTLP trace export (reads OTEL_* env vars).
cognee_py.setup_telemetry()

# Optional: product-analytics emission (returns True if armed).
armed = cognee_py.setup_telemetry_analytics()
print(f"analytics armed: {armed}")

When cognee_py is imported, a minimal default subscriber is installed: a pyo3-log bridge that forwards every Rust tracing::* event into Python's standard logging module under the cognee.* logger tree. The host's logging.basicConfig / logging.dictConfig controls level, format, and handlers.

import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("cognee").setLevel(logging.DEBUG)

import cognee_py  # Rust spans now flow into Python logging

Set COGNEE_BINDING_SUPPRESS_LOGS=1 before importing cognee_py to skip the default subscriber. The host then owns all subscriber setup.

Analytics defaults

Analytics emission is ON by default — Python-SDK parity. Analytics are armed automatically on import (no explicit setup_telemetry_analytics() call required); the function is still exposed and returns the effective state.

Condition	Behaviour
No opt-out env set	Armed. `setup_telemetry_analytics()` returns `True`.
`TELEMETRY_DISABLED=<non-empty>`	Suppressed. Returns `False`.
`ENV` is `test` / `dev`	Suppressed. Returns `False`.
`COGNEE_HOST_SDK=<any non-empty>`	Suppressed (defers to the host SDK). Returns `False`.

Important — if you run this binding underneath another cognee SDK that is already the canonical sender of send_telemetry events: set COGNEE_HOST_SDK=<name> so this binding defers and you avoid double-counting.

Error handling

All async ops raise subclasses of CogneeError:

Exception	Meaning
`CogneeValidationError`	Invalid input (bad data descriptor, unknown config key, malformed settings JSON).
`CogneeComponentError`	Component initialisation failed (DB connection, embedding model load).
`CogneeServiceBuildError`	Service warm-up failed (engine could not be constructed).
`CogneeUserBootstrapError`	Default user resolution failed during `warm()`.
`CogneeRuntimeError`	Pipeline or search runtime failure.
`CogneeUnsupportedError`	Operation not available for the current backend configuration.
`CogneeFeatureNotBuiltError`	Feature was not compiled into this build (e.g. `visualization`).
`CogneeUnknownConfigKeyError`	Unknown config key passed to `config.set_*` or constructor.
`CogneeConfigTypeMismatchError`	Wrong type for a config value.
`CogneeError`	Base class — catch this to handle any cognee error.

from cognee_py import Cognee, CogneeError, CogneeValidationError

try:
    result = await cognee.search("query", {"search_type": "INVALID_TYPE"})
except CogneeValidationError as e:
    print(f"Bad input: {e}")
except CogneeError as e:
    print(f"Cognee error: {e}")

Advanced: low-level pipeline engine

The original pipeline engine API is available directly from cognee_py for advanced orchestration use-cases that do not need the high-level SDK:

import cognee_py

p = cognee_py.Pipeline("my pipeline")
# add_task accepts a plain callable directly (sync, async, generator, or async
# generator — auto-detected). The callable receives each input value.
p.add_task(lambda val: val, name="echo")

ctx = cognee_py.TaskContext.mock()
# execute() takes plain Python values and returns the last task's outputs.
[result] = await p.execute(["hello"], ctx)

All pipeline-engine symbols (Pipeline, TaskContext, PipelineRunHandle, CancellationHandle, CancellationToken, cancellation_pair, ProgressToken, Watcher) are available at the top level of cognee_py.

Troubleshooting

ImportError on cognee_py — run maturin develop (or install from PyPI) first.
Embedding model download on first run — set MOCK_EMBEDDING=true to skip it in tests.
OPENAI_URL / OPENAI_TOKEN not set — all examples exit 0 with a SKIP message when these are absent; export them before running.
Analytics doubly-sent — analytics are ON by default; if you run this binding underneath another cognee SDK that already emits send_telemetry, set COGNEE_HOST_SDK=<name> so this binding defers.

References

Examples: examples/
Observability: docs/observability/opentelemetry.md, docs/observability/send_telemetry.md
C API bindings: capi/README.md
JS/TS bindings: ts/README.md
cognee-rs workspace: README.md