README.md

May 12, 2025 Β· View on GitHub

πŸ‘Ύ DIGIMON: Deep Analysis of Graph-Based Retrieval-Augmented Generation (RAG) Systems

GraphRAG is a popular πŸ”₯πŸ”₯πŸ”₯ and powerful πŸ’ͺπŸ’ͺπŸ’ͺ RAG system! πŸš€πŸ’‘ Inspired by systems like Microsoft's, graph-based RAG is unlocking endless possibilities in AI.

Our project focuses on modularizing and decoupling these methods 🧩 to unveil the mystery πŸ•΅οΈβ€β™‚οΈπŸ”βœ¨ behind them and share fun and valuable insights! πŸ€©πŸ’« Our projectπŸ”¨ is included in Awesome Graph-based RAG.

Workflow of GraphRAG



Quick Start πŸš€

From Source

# Clone the repository from GitHub
git clone https://github.com/JayLZhou/GraphRAG.git
cd GraphRAG

Run a Method

You can run different GraphRAG methods by specifying the corresponding configuration file (.yaml).

Example: Running RAPTOR

python main.py -opt Option/Method/RAPTOR.yaml -dataset_name your_dataset

Available Methods:

The following methods are available, and each can be run using the same command format:

python main.py -opt Option/Method/<METHOD>.yaml -dataset_name your_dataset

Replace <METHOD> with one of the following:

  • Dalk
  • GR
  • LGraphRAG (Local search in GraphRAG)
  • GGraphRAG (Global search in GraphRAG)
  • HippoRAG
  • KGP
  • LightRAG
  • RAPTOR
  • ToG

For example, to run GraphRAG:

python main.py -opt Option/Method/GraphRAG.yaml -dataset_name your_dataset

Dependencies

Ensure you have the required dependencies installed (The default experiment name is digimon):

conda env create -f experiment.yml -n your_experiment_name

Supported LLM Backends

GraphRAG supports both cloud-based and local deployment of LLMs:

  • Cloud-based models: OpenAI (e.g., gpt-4, gpt-3.5-turbo)
  • Locally deployed models: Ollama and LlamaFactory

To use a local model, set api_type to open_llm in the configuration file.

Example Configuration (config.yaml):
llm:
  api_type: "openai/open_llm"  # Options: "openai" or "open_llm" (For Ollama and LlamaFactory) 
  model: "YOUR_LOCAL_MODEL_NAME"
  base_url: "YOUR_LOCAL_URL"  # Change this for local models
  api_key: "YOUR_API_KEY"  # Not required for local models
For LlamaFactory or Ollama, ensure the model is correctly installed and running in your local environment.

You can refer to the Readme of LlamaFactory

llm:
  api_type: "open_llm"  # Options: "openai" or "open_llm" (For Ollama and LlamaFactory) 
  model: "YOUR_LOCAL_MODEL_NAME"
  base_url: "YOUR_LOCAL_URL"  # Change this for local models
  api_key: "ANY_THING_IS_OKAY"  # Not required for local models

Representative Methods

We select the following Graph RAG methods:

MethodDescriptionLinkGraph Type
RAPTORICLR 2024arXiv GitHubTree
KGPAAAI 2024arXiv GitHubPassage Graph
DALKEMNLP 2024arXiv GitHubKG
HippoRAGNIPS 2024arXiv GitHubKG
G-retrieverNIPS 2024arXiv GitHubKG
ToGICLR 2024arXiv GitHubKG
MS GraphRAGMicrosoft ProjectarXiv GitHubTKG
FastGraphRAGCircleMind ProjectGitHubTKG
LightRAGHigh Star ProjectarXiv GitHubRKG

Graph Types

Based on the entity and relation, we categorize the graph into the following types:

  • Chunk Tree: A tree structure formed by document content and summary.
  • Passage Graph: A relational network composed of passages, tables, and other elements within documents.
  • KG: knowledge graph (KG) is constructed by extracting entities and relationships from each chunk, which contains only entities and relations, is commonly represented as triples.
  • TKG: A textual knowledge graph (TKG) is a specialized KG (following the same construction step as KG), which enriches entities with detailed descriptions and type information.
  • RKG: A rich knowledge graph (RKG), which further incorporates keywords associated with relations.

The criteria for the classification of graph types are as follows:

Graph AttributesChunk TreePassage GraphKGTKGRKG
Original Contentβœ…βœ…βŒβŒβŒ
Entity NameβŒβŒβœ…βœ…βœ…
Entity TypeβŒβŒβŒβœ…βœ…
Entity DescriptionβŒβŒβŒβœ…βœ…
Relation NameβŒβŒβœ…βŒβœ…
Relation keywordβŒβŒβŒβŒβœ…
Relation DescriptionβŒβŒβŒβœ…βœ…
Edge WeightβŒβŒβœ…βœ…βœ…

Operators in the Retrieve Stage

The retrieval stage lies the key role ‼️ in the entire GraphRAG process. ✨ The goal is to identify query-relevant content that supports the generation phase, enabling the LLM to provide more accurate responses.

πŸ’‘πŸ’‘πŸ’‘ After thoroughly reviewing all implementations, we've distilled them into a set of 16 operators 🧩🧩. Each method then constructs its retrieval module by combining one or more of these operators 🧩.

Five Types of Operators

We classify the operators into five categories, each offering a different way to retrieve and structure relevant information from graph-based data.

⭕️ Entity Operators

Retrieve entities (e.g., people, places, organizations) that are most relevant to the given query.

NameDescriptionExample Methods
VDBSelect top-k nodes from the vector databaseG-retriever, RAPTOR, KGP
RelNodeExtract nodes from given relationshipsLightRAG
PPRRun PPR on the graph, return top-k nodes with PPR scoresFastGraphRAG
AgentUtilizes LLM to find the useful entitiesToG
OnehopSelects the one-hop neighbor entities of the given entitiesLightRAG
LinkReturn top-1 similar entity for each given entityHippoRAG
TF-IDFRank entities based on the TF-IFG matrixKGP

➑️ Relationship Operators

Extracting useful relationships for the given query.

NameDescriptionExample Methods
VDBRetrieve relationships by vector-databaseLightRAG、G-retriever
OnehopSelects relationships linked by one-hop neighbors of the given selected entitiesLocal Search for MS GraphRAG
AggregatorCompute relationship scores from entity PPR matrix, return top-kFastGraphRAG
AgentUtilizes LLM to find the useful entitiesToG

πŸ“„ Chunk Operators

Retrieve the most relevant text segments (chunks) related to the query.

NameDescriptionExample Methods
AggregatorUse the relationship scores and the relationship-chunk interactions to select the top-k chunksHippoRAG
FromRelReturn chunks containing given relationshipsLightRAG
OccurrenceRank top-k chunks based on occurrence of both entities in relationshipsLocal Search for MS GraphRAG

πŸ“ˆ Subgraph Operators

Extract a relevant subgraph for the given query

NameDescriptionExample Methods
KhopPathFind k-hop paths with start and endpoints in the given entity setDALK
SteinerCompute Steiner tree based on given entities and relationshipsG-retriever
AgentPathIdentify the most relevant π‘˜-hop paths to a given question, by using LLM to filter out the irrelevant pathsTOG

πŸ”— Community Operators

Identify high-level information, which is only used for MS GraphRAG.

NameDescriptionExample Methods
EntityDetects communities containing specified entitiesLocal Search for MS GraphRAG
LayerReturns all communities below a required layerGlobal Search for MS GraphRAG

You can freely πŸͺ½ combine those operators 🧩 to create more and more GraphRAG methods.

🌰 Examples

Below, we present some examples illustrating how existing algorithms leverage these operators.

NameOperators
HippoRAGChunk (Aggregator)
LightRAGChunk (FromRel) + Entity (RelNode) + Relationship (VDB)
FastGraphRAGChunk (Aggregator) + Entity (PPR) + Relationship (Aggregator)

🏹 Our future plans

  • Detailed readme
  • Support RoG, PathRAG, etc.
  • Provide a docker image for easy deployment.
  • Support more LLMs, such as AZURE.

🧭 Cite Our Paper

If you find this work useful, please consider citing our papers:

In-depth Analysis of Graph-based RAG in a Unified Framework

@article{zhou2025depth,
  title={In-depth Analysis of Graph-based RAG in a Unified Framework},
  author={Zhou, Yingli and Su, Yaodong and Sun, Youran and Wang, Shu and Wang, Taotao and He, Runyuan and Zhang, Yongwei and Liang, Sicong and Liu, Xilin and Ma, Yuchi and others},
  journal={arXiv preprint arXiv:2503.04338},
  year={2025}
}