πŸš€ GraphRAG Local Ollama - Knowledge Graph

May 8, 2026 Β· View on GitHub

🀝 Contributing

We welcome contributions from the community to help enhance GraphRAG Local Ollama! Please see our Contributing Guidelines for more details on how to get involved.

Need support for llama integration.

πŸš€ GraphRAG Local Ollama - Knowledge Graph

Welcome to GraphRAG Local Ollama! This repository is an exciting adaptation of Microsoft's GraphRAG, tailored to support local models downloaded using Ollama. Say goodbye to costly OpenAPI models and hello to efficient, cost-effective local inference using Ollama!

πŸ“„ Research Paper

For more details on the GraphRAG implementation, please refer to the GraphRAG paper.

Paper Abstract

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs)to answer questions over private and/or previously unseen document collections.However, RAG fails on global questions directed at an entire text corpus, suchas β€œWhat are the main themes in the dataset?”, since this is inherently a queryfocused summarization (QFS) task, rather than an explicit retrieval task. PriorQFS methods, meanwhile, fail to scale to the quantities of text indexed by typicalRAG systems. To combine the strengths of these contrasting methods, we proposea Graph RAG approach to question answering over private text corpora that scaleswith both the generality of user questions and the quantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two stages:first to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely-related entities. Given aquestion, each community summary is used to generate a partial response, beforeall partial responses are again summarized in a final response to the user. For aclass of global sensemaking questions over datasets in the 1 million token range,we show that Graph RAG leads to substantial improvements over a naΒ¨Δ±ve RAGbaseline for both the comprehensiveness and diversity of generated answers.

🌟 Features

  • Local Model Support: Leverage local models with Ollama for LLM and embeddings.
  • Cost-Effective: Eliminate dependency on costly OpenAPI models.
  • Easy Setup: Simple and straightforward setup process.
  • Web UI: Browser-based interface for indexing, querying, and graph visualization.
  • 5 Query Modes: Global, Local, DRIFT (iterative reasoning), Basic (fast vector search), Lazy.
  • LazyGraphRAG Mode: Skip community summarization at index time β€” ~99% faster indexing, on-demand summarization at query time.
  • DRIFT Search: Dynamic Reasoning with Iterative Feedback and Tracking β€” iterative knowledge-graph exploration that generates and follows up sub-questions for deeper answers.
  • Basic Search: Lightweight vector-similarity RAG β€” no community reports needed, fast answers from entity embeddings.
  • JSON/JSONL Input: Index JSON and JSONL document collections in addition to .txt and .csv.
  • Multilingual: Full UTF-8 support with CJK-aware text chunking (Chinese, Japanese, Korean, Arabic, Cyrillic).
  • Docker: One-command setup with docker-compose up.

🐳 Quick Start (Docker)

The fastest way to get started β€” no Python environment needed.

# 1. Copy the example environment file
cp .env.example .env

# 2. Start Ollama + GraphRAG
docker-compose up --build

# 3. Drop your .txt documents into ./input/
#    Indexing runs automatically on container start.

# 4. Run a query
docker-compose run graphrag python -m graphrag.query \
  --root /app --method global "What are the main themes?"

GPU support: Add deploy.resources.reservations.devices to the ollama service in docker-compose.yml for GPU acceleration.


πŸ–₯️ Web UI

A browser-based interface for the full workflow (indexing β†’ querying β†’ visualization).

pip install -r requirements-ui.txt
python app.py
# Open http://localhost:7860

Tabs:

TabWhat it does
πŸ“‚ IndexUpload .txt files, run indexing, see live log output
πŸ” QueryAsk questions using Global / Local / DRIFT / Basic / Lazy search
πŸ—ΊοΈ GraphInteractive knowledge-graph visualizer (requires pyvis)
βš™οΈ SettingsEdit model names, chunk size, LazyGraphRAG toggle

⚑ LazyGraphRAG Mode

Inspired by Microsoft's LazyGraphRAG, this mode skips community summarization at index time and generates summaries on-the-fly during queries. Result: indexing is ~99% faster.

Enable in settings.yaml:

lazy_graph_rag: true

Query with the lazy method:

python -m graphrag.query --root ./ragtest --method lazy "What is machine learning?"

Trade-off: First-query responses are slightly slower than standard global search because summaries are computed at query time. For large datasets this is still dramatically cheaper overall.


Inspired by Microsoft GraphRAG v0.4+'s DRIFT (Dynamic Reasoning with Iterative Feedback and Tracking), this mode performs iterative knowledge-graph exploration:

  1. Primer phase β€” decomposes your question into scored sub-questions using community reports as context.
  2. Search loop β€” answers each sub-question using the entity graph, then generates follow-up questions from the answers (priority-queue, depth-limited).
  3. Reduce phase β€” synthesises all intermediate answers into a final comprehensive response.

DRIFT produces deeper, more exploratory answers than Global or Local search β€” especially useful for open-ended research questions.

Query with DRIFT:

python -m graphrag.query --root ./ragtest --method drift "What are the main causes of the conflict?"

Note: DRIFT works best when community reports are available. If lazy_graph_rag: true was used during indexing, the primer step will have limited context but the search loop still operates normally.


A lightweight vector-similarity RAG mode β€” no community reports required. Works immediately after a minimal index run.

  1. Embeds the query using your Ollama embedding model.
  2. Retrieves the top-10 most similar entities from the vector store.
  3. Builds context from those entities, their relationships, and source text units.
  4. Answers with the LLM.

Use Basic search for fast, factual lookups when you don't need the full graph-traversal reasoning.

Query with Basic search:

python -m graphrag.query --root ./ragtest --method basic "Who is Alan Turing?"

πŸ“„ JSON / JSONL Input

In addition to .txt and .csv, you can now index JSON and JSONL document collections.

JSON array format (documents.json):

[
  {"id": "doc1", "title": "Introduction", "text": "This is the document text..."},
  {"id": "doc2", "text": "Another document..."}
]

JSONL format (documents.jsonl, one JSON object per line):

{"id": "doc1", "title": "Chapter 1", "text": "Text of chapter 1..."}
{"id": "doc2", "title": "Chapter 2", "text": "Text of chapter 2..."}

Enable in settings.yaml:

input:
  type: file
  file_type: json          # was: text or csv
  base_dir: "input"
  file_pattern: ".*\\.(json|jsonl)$"

Required field: text. Optional fields: id, title, source.


🌐 Non-English Text

UTF-8 is enforced throughout the pipeline. CJK (Chinese, Japanese, Korean), Arabic, and Cyrillic documents are supported:

  • Text files are read with encoding="utf-8" (with graceful replacement for undecodable bytes).
  • CSVs default to utf-8 (was latin-1 β€” fixed in this release).
  • The text splitter automatically switches to character-count chunking for CJK-dominant text, avoiding BPE tokenization artifacts.

No configuration change needed β€” just drop your non-English .txt files in input/ and run normally.


πŸ“¦ Installation and Setup

Follow these steps to set up this repository and use GraphRag with local models provided by Ollama :

  1. Create and activate a new conda environment: (please stick to the given python version 3.10 for no errors)

    conda create -n graphrag-ollama-local python=3.10
    conda activate graphrag-ollama-local
    
  2. Install Ollama:

    curl -fsSL https://ollama.com/install.sh | sh #ollama for linux
    pip install ollama
    
  3. Download the required models using Ollama, we can choose from (mistral,gemma2, qwen2) for llm and any embedding model provided under Ollama:

    ollama pull mistral  #llm
    ollama pull nomic-embed-text  #embedding
    
  4. Clone the repository:

    git clone https://github.com/TheAiSingularity/graphrag-local-ollama.git
    
  5. Navigate to the repository directory:

    cd graphrag-local-ollama/
    
  6. Install the graphrag package ** This is the most important step :

    pip install -e .
    
  7. Create the required input directory: This is where the experiments data and results will be stored - ./ragtest

    mkdir -p ./ragtest/input
    
  8. Copy sample data folder input/ to ./ragtest. Input/ has the sample data to run the setup. You can add your own data here in .txt format.

    cp input/* ./ragtest/input
    
  9. Initialize the ./ragtest folder to create the required files:

    python -m graphrag.index --init --root ./ragtest
    
  10. Move the settings.yaml file, this is the main predefined config file configured with ollama local models :

    cp settings.yaml ./ragtest
    

Users can experiment by changing the models. The llm model expects language models like llama3, mistral, phi3, etc., and the embedding model section expects embedding models like mxbai-embed-large, nomic-embed-text, etc., which are provided by Ollama. You can find the complete list of models provided by Ollama here https://ollama.com/library, which can be deployed locally. Both LLM and embeddings use the OpenAI-compatible endpoint http://localhost:11434/v1. No API key is required β€” the config defaults to ollama as a placeholder.

LLM Configuration

Embedding Configuration

  1. Run the indexing, which creates a graph:

    python -m graphrag.index --root ./ragtest
    
  2. Run a query (five methods available):

    # Global β€” broad synthesis across the whole corpus (uses community reports)
    python -m graphrag.query --root ./ragtest --method global "What is machine learning?"
    
    # Local β€” entity-focused, uses knowledge graph + text chunks
    python -m graphrag.query --root ./ragtest --method local "What is machine learning?"
    
    # DRIFT β€” iterative graph reasoning (decomposes β†’ explores β†’ synthesises)
    python -m graphrag.query --root ./ragtest --method drift "What are the main themes?"
    
    # Basic β€” fast vector-similarity search (no community reports needed)
    python -m graphrag.query --root ./ragtest --method basic "Who invented the transformer?"
    
    # Lazy β€” on-demand community summarisation (use with lazy_graph_rag: true)
    python -m graphrag.query --root ./ragtest --method lazy "What is machine learning?"
    

Graphs can be saved which further can be used for visualization by changing the graphml to "true" in the settings.yaml :

snapshots:
graphml: true

To visualize the generated graphml files, you can use : https://gephi.org/users/download/ or the script provided in the repo visualize-graphml.py :

Pass the path to the .graphml file to the below line in visualize-graphml.py:

graph = nx.read_graphml('output/20240708-161630/artifacts/summarized_graph.graphml') 

13. Visualize .graphml :

```bash
python visualize-graphml.py
```

Citations


By following the above steps, you can set up and use local models with GraphRAG, making the process more cost-effective and efficient.