cognee_py

June 26, 2026 · View on GitHub

Python bindings for the cognee-rs pipeline engine, built with PyO3.

Cognee transforms raw text, files, and URLs into a persistent, queryable knowledge graph via a three-stage pipeline: add (ingest) → cognify (extract) → search (retrieve).

Installation

cognee-py is not yet published to PyPI — build it from this repository with maturin:

cd python
maturin develop          # builds the native extension into the active venv

A pip install cognee-py step will be documented here once the wheel is published.

Quick start

import asyncio
import json
from cognee_py import Cognee, SearchType

async def main():
    cognee = Cognee()           # optionally pass json.dumps(settings) to override defaults
    await cognee.warm()         # build engines and resolve the default user
    await cognee.add(
        {"type": "text", "text": "Cognee turns data into a knowledge graph."},
        "main_dataset",         # dataset_name is required
    )
    await cognee.cognify("main_dataset")   # dataset_name is required
    result = await cognee.search(
        "What does cognee do?",
        {"search_type": SearchType.GRAPH_COMPLETION},
    )
    print(result)

asyncio.run(main())

Set environment variables before running:

export OPENAI_URL=https://api.openai.com/v1
export OPENAI_TOKEN=sk-...
export MOCK_EMBEDDING=true   # skip ONNX model download for quick tests

Upstream-compatible module-level API

An upstream-compatible API is available via cognee_py.compat for drop-in replacement of the Python cognee SDK:

import asyncio
from cognee_py import compat as cognee, SearchType

async def main():
    await cognee.add("Cognee turns data into a knowledge graph.", "main_dataset")
    await cognee.cognify("main_dataset")
    result = await cognee.search("What does cognee do?", SearchType.GRAPH_COMPLETION)
    print(result)

asyncio.run(main())

The compat module above is the supported drop-in alias. The cognee-py distribution intentionally does not install a top-level cognee package, so it never shadows the upstream Python cognee package. If you want import cognee to work directly, vendor the shim shipped in the repo (python/cognee/) into your project.

Examples

Runnable example scripts are in the examples/ directory. Each script validates required env vars up front and exits 0 with a clear SKIP message when they are absent, so all examples are safe to run in CI without credentials.

ScriptRun commandWhat it covers
add_cognify_search.pypython examples/add_cognify_search.pyCore add → cognify → search pipeline
memify_recall.pypython examples/memify_recall.pyTriplet embeddings (memify) + session recall
datasets.pypython examples/datasets.pyDataset listing, status, deletion
sessions.pypython examples/sessions.pyQA history, feedback, graph-context snapshots
config.pypython examples/config.pyProgrammatic config (LLM / embedding / DBs)
visualize.pypython examples/visualize.pyRender knowledge graph to HTML

All examples read LLM credentials from the environment. Set MOCK_EMBEDDING=true to skip the ONNX model download and use mock embeddings (fast, no GPU required):

export OPENAI_URL=https://api.openai.com/v1
export OPENAI_TOKEN=sk-...
export MOCK_EMBEDDING=true
cd python && python examples/add_cognify_search.py

Configuration

Programmatic config

Pass a JSON settings string to the Cognee constructor to override env-derived defaults. The overlay order is: compiled-in defaults < env vars < constructor argument.

import json
from cognee_py import Cognee

cognee = Cognee(json.dumps({
    "llm_endpoint": "https://api.openai.com/v1",
    "llm_api_key": "sk-...",
    "llm_model": "gpt-4o-mini",
    "embedding_provider": "openai",
    "embedding_model": "text-embedding-3-small",
    "embedding_dimensions": 1536,
}))

Generic key/value setters and config read-back via cognee.config:

# Set individual keys (typed value, or explicit string form):
cognee.config.set("llm_model", "gpt-4o")
cognee.config.set_str("llm_api_key", "sk-...")

# Read back the current config (secrets are redacted):
cfg = cognee.config.get()
print(cfg)

Environment variables

VariablePurpose
OPENAI_URLLLM API base URL (OpenAI-compatible endpoint).
OPENAI_TOKENLLM API key.
OPENAI_MODELLLM model name (default: gpt-4o-mini).
EMBEDDING_PROVIDEREmbedding provider: openai, ollama, onnx, mock.
EMBEDDING_MODELEmbedding model name.
EMBEDDING_DIMENSIONSEmbedding vector dimensions.
EMBEDDING_ENDPOINTEmbedding API base URL (falls back to OPENAI_URL).
EMBEDDING_API_KEYEmbedding API key (falls back to OPENAI_TOKEN).
MOCK_EMBEDDINGSet true to use zero-vector mock embeddings (no model download).
COGNEE_BINDING_SUPPRESS_LOGSSuppress the auto-installed pyo3-log bridge.
COGNEE_HOST_SDKSet by an upstream/host cognee SDK to suppress this binding's analytics emission (avoids double-counting).
RUST_LOG, LOG_LEVELStandard tracing-subscriber env-filter level overrides.
COGNEE_LOG_*, LOG_FILE_NAMEConsumed by setup_logging() — see docs/configuration.md (Logging section).
OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, OTEL_SERVICE_NAME and other OTEL_* varsConsumed by setup_telemetry().
TELEMETRY_DISABLED, ENVHonoured by setup_telemetry_analytics() via cognee_telemetry::env::is_disabled.

Operations reference

Pipeline operations

add

Ingest one or more data items into a named dataset.

# Text
await cognee.add({"type": "text", "text": "…"}, "my-dataset")

# File
await cognee.add({"type": "file", "path": "/abs/path/to/doc.txt"}, "my-dataset")

# URL
await cognee.add({"type": "url", "url": "https://example.com/article"}, "my-dataset")

# Multiple items at once
await cognee.add([
    {"type": "text", "text": "First document"},
    {"type": "file", "path": "/abs/path/two.txt"},
], "my-dataset")

cognify

Extract entities and relationships into the knowledge graph.

await cognee.cognify("my-dataset")

add_and_cognify

Ingest and extract in a single call.

result = await cognee.add_and_cognify(
    {"type": "text", "text": "…"},
    "my-dataset",
)

Search and recall

Query the knowledge graph. Defaults to GRAPH_COMPLETION.

result = await cognee.search("What is the capital of France?")

# With options
result = await cognee.search(
    "summarise recent events",
    {"search_type": SearchType.SUMMARIES, "top_k": 5, "datasets": ["news"]},
)

All 15 search types are supported (via SearchType enum): GRAPH_COMPLETION, SUMMARIES, CHUNKS, RAG_COMPLETION, TRIPLET_COMPLETION, GRAPH_SUMMARY_COMPLETION, CYPHER, NATURAL_LANGUAGE, GRAPH_COMPLETION_COT, GRAPH_COMPLETION_CONTEXT_EXTENSION, FEELING_LUCKY, FEEDBACK, TEMPORAL, CODING_RULES, CHUNKS_LEXICAL.

recall

Session-first routing: checks session QA history before falling back to graph search.

result = await cognee.recall(
    "What did we discuss?",
    {"session_id": "session-uuid", "scope": "auto"},
)

Memory operations

remember

Composite add + cognify with an optional improvement pass.

await cognee.remember(
    {"type": "text", "text": "…"},
    "my-dataset",
    {"self_improvement": True},
)

memify

Index triplet embeddings from the existing knowledge graph. Enables TripletCompletion search. Idempotent.

await cognee.memify()

remember_entry

Store a single typed memory entry (QA pair, execution trace, or feedback record).

# QA entry
await cognee.remember_entry(
    {"type": "qa", "question": "What?", "answer": "This."},
    "my-dataset",
    "session-uuid",
)

# Trace entry
await cognee.remember_entry(
    {
        "type": "trace",
        "originFunction": "search",
        "status": "ok",
        "memoryQuery": "What?",
    },
    "my-dataset",
    "session-uuid",
)

# Feedback entry
await cognee.remember_entry(
    {"type": "feedback", "qaId": "qa-uuid", "feedbackText": "Accurate.", "feedbackScore": 5},
    "my-dataset",
    "session-uuid",
)

Entry field names are camelCase (the entry dict is passed through verbatim, so snake_case keys are not recognised). Supported entry types:

  • qa: question, answer, context, feedbackText, feedbackScore, usedGraphElementIds (all optional except type).
  • trace: originFunction (required), status, methodParams, methodReturnValue, memoryQuery, memoryContext, errorMessage, generateFeedbackWithLlm.
  • feedback: qaId (required), feedbackText, feedbackScore.

Unknown type values raise CogneeValidationError. Optional opts dict supports a tenant key.

improve

Run the four-stage session-graph bridge pipeline.

await cognee.improve({
    "dataset_name": "my-dataset",
    "session_ids": ["session-uuid"],
})

Datasets

datasets      = await cognee.datasets.list()
items         = await cognee.datasets.list_data(dataset_id)
has_content   = await cognee.datasets.has(dataset_id)
statuses      = await cognee.datasets.status([id1, id2])

await cognee.datasets.empty(dataset_id)
await cognee.datasets.delete_data(dataset_id, data_id)
await cognee.datasets.delete_all()

Sessions

entries = await cognee.sessions.get("session-uuid", {"last_n": 10})

await cognee.sessions.add_feedback("session-uuid", "qa-uuid", "Great answer!", 5)
await cognee.sessions.delete_feedback("session-uuid", "qa-uuid")

ctx = await cognee.sessions.get_graph_context("session-uuid")
await cognee.sessions.set_graph_context("session-uuid", "new context")

Notebooks

notebooks = await cognee.notebooks.list()
for nb in notebooks:
    print(nb["id"], nb["name"])

# Create a new notebook (cells defaults to empty list; deletable is always True)
nb = await cognee.notebooks.create("My Notebook")
nb = await cognee.notebooks.create("My Notebook", cells=[...])

# Update a notebook's name and/or cells
nb = await cognee.notebooks.update(notebook_id, {"name": "New Name"})
nb = await cognee.notebooks.update(notebook_id, {"cells": [...]})

# Delete by UUID — returns True if deleted, False if not found
deleted = await cognee.notebooks.delete(notebook_id)

cognee.notebooks is a CogneeNotebooks sub-object accessible on every Cognee instance. notebook_id is a UUID string.

Users and pipeline-run admin

# Get or create the default user account
user = await cognee.get_or_create_default_user()
print(user["id"], user["email"])

# Reset the pipeline run status for a specific pipeline within a dataset
await cognee.reset_pipeline_run_status(dataset_id, pipeline_name)

# Reset all pipeline run statuses for a dataset
await cognee.reset_dataset_pipeline_run_status(dataset_id)

dataset_id is a UUID string. pipeline_name is the pipeline name string. Both reset methods return None on success and raise CogneeValidationError if dataset_id is not a valid UUID.

Data lifecycle

# Forget a single item
await cognee.forget({"kind": "item", "data_id": "uuid", "dataset": {"name": "my-dataset"}})

# Forget an entire dataset
await cognee.forget({"kind": "dataset", "dataset": {"name": "my-dataset"}})

# Forget everything
await cognee.forget({"kind": "all"})

# Replace a data item (delete → re-add → re-cognify)
await cognee.update("old-data-uuid", {"type": "text", "text": "updated content"}, "my-dataset")

# Remove all files from storage (metadata DB untouched)
await cognee.prune_data()

# Wipe graph, vector, metadata, and/or cache backends
await cognee.prune_system({"prune_graph": True, "prune_vector": True})

Visualisation

# Get the HTML string
html = await cognee.visualize()

# Write to a file (returns the absolute path)
path = await cognee.visualize_to_file({"destination_path": "/tmp/graph.html"})

Requires the visualization feature compiled into the native extension.

Cloud serve / disconnect are provided by the closed cognee-py-cloud package, not the OSS cognee_py package.

Initialisation and observability

import cognee_py

# Optional: file logging (reads COGNEE_LOG_*, LOG_FILE_NAME, LOG_LEVEL).
cognee_py.setup_logging()

# Optional: OTLP trace export (reads OTEL_* env vars).
cognee_py.setup_telemetry()

# Optional: product-analytics emission (returns True if armed).
armed = cognee_py.setup_telemetry_analytics()
print(f"analytics armed: {armed}")

When cognee_py is imported, a minimal default subscriber is installed: a pyo3-log bridge that forwards every Rust tracing::* event into Python's standard logging module under the cognee.* logger tree. The host's logging.basicConfig / logging.dictConfig controls level, format, and handlers.

import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("cognee").setLevel(logging.DEBUG)

import cognee_py  # Rust spans now flow into Python logging

Set COGNEE_BINDING_SUPPRESS_LOGS=1 before importing cognee_py to skip the default subscriber. The host then owns all subscriber setup.

Analytics defaults

Analytics emission is ON by default — Python-SDK parity. Analytics are armed automatically on import (no explicit setup_telemetry_analytics() call required); the function is still exposed and returns the effective state.

ConditionBehaviour
No opt-out env setArmed. setup_telemetry_analytics() returns True.
TELEMETRY_DISABLED=<non-empty>Suppressed. Returns False.
ENV is test / devSuppressed. Returns False.
COGNEE_HOST_SDK=<any non-empty>Suppressed (defers to the host SDK). Returns False.

Important — if you run this binding underneath another cognee SDK that is already the canonical sender of send_telemetry events: set COGNEE_HOST_SDK=<name> so this binding defers and you avoid double-counting.

Error handling

All async ops raise subclasses of CogneeError:

ExceptionMeaning
CogneeValidationErrorInvalid input (bad data descriptor, unknown config key, malformed settings JSON).
CogneeComponentErrorComponent initialisation failed (DB connection, embedding model load).
CogneeServiceBuildErrorService warm-up failed (engine could not be constructed).
CogneeUserBootstrapErrorDefault user resolution failed during warm().
CogneeRuntimeErrorPipeline or search runtime failure.
CogneeUnsupportedErrorOperation not available for the current backend configuration.
CogneeFeatureNotBuiltErrorFeature was not compiled into this build (e.g. visualization).
CogneeUnknownConfigKeyErrorUnknown config key passed to config.set_* or constructor.
CogneeConfigTypeMismatchErrorWrong type for a config value.
CogneeErrorBase class — catch this to handle any cognee error.
from cognee_py import Cognee, CogneeError, CogneeValidationError

try:
    result = await cognee.search("query", {"search_type": "INVALID_TYPE"})
except CogneeValidationError as e:
    print(f"Bad input: {e}")
except CogneeError as e:
    print(f"Cognee error: {e}")

Advanced: low-level pipeline engine

The original pipeline engine API is available directly from cognee_py for advanced orchestration use-cases that do not need the high-level SDK:

import cognee_py

p = cognee_py.Pipeline("my pipeline")
# add_task accepts a plain callable directly (sync, async, generator, or async
# generator — auto-detected). The callable receives each input value.
p.add_task(lambda val: val, name="echo")

ctx = cognee_py.TaskContext.mock()
# execute() takes plain Python values and returns the last task's outputs.
[result] = await p.execute(["hello"], ctx)

All pipeline-engine symbols (Pipeline, TaskContext, PipelineRunHandle, CancellationHandle, CancellationToken, cancellation_pair, ProgressToken, Watcher) are available at the top level of cognee_py.

Troubleshooting

  • ImportError on cognee_py — run maturin develop (or install from PyPI) first.
  • Embedding model download on first run — set MOCK_EMBEDDING=true to skip it in tests.
  • OPENAI_URL / OPENAI_TOKEN not set — all examples exit 0 with a SKIP message when these are absent; export them before running.
  • Analytics doubly-sent — analytics are ON by default; if you run this binding underneath another cognee SDK that already emits send_telemetry, set COGNEE_HOST_SDK=<name> so this binding defers.

References