README.md

November 4, 2025 · View on GitHub

🔥 Introduction

ROGRAG enhances LLM performance on specialized topics using a robust GraphRAG approach. It features a two-stage (dual-level and logic form methods) retrieval mechanism to improve accuracy without extra computation costs. ROGRAG achieves a 15% score boost on SeedBench, outperforming mainstream methods.

This repo has contributed to:

GraphGen for better knowledge graph construction
SeedLLM Rice for online service

Key Highlights:

Two-stage retrieval for robustness
Incremental database construction
Enhanced fuzzy matching and structured reasoning

Method	QA-1 (Accuracy)	QA-2 (F1)	QA-3 (Rouge)	QA-4 (Rouge)
vanilla (w/o RAG)	0.57	0.71	0.16	0.35
LangChain	0.68	0.68	0.15	0.04
BM25	0.65	0.69	0.23	0.03
RQ-RAG	0.59	0.62	0.17	0.33
ROGRAG (Ours)	0.75	0.79	0.36	0.38

Deployed on an online research platform, ROGRAG is ready for integration. Here is the technical report.

If it is useful to you, please star it ⭐

💡 Updates

[2025/11] Support multi-database, refactor gradio UI and server

📖 Documentation

🔆 Version Description

Compared to HuixiangDou, this repo improves accuracy:

Graph Schema. Dense retrieval is only for querying similar entities and relationships.
Ported/merged multiple open-source implementations, with code differences of nearly 18k lines:
- Data. Organized a set of real domain knowledge that LLM has not fully seen for testing (gpt accuracy < 0.6)
- Ablation. Confirmed the impact of different stages and parameters on accuracy

API remains compatible. That means Wechat/Lark/Web in v1 is also accessible.

# v1 API https://github.com/InternLM/HuixiangDou/blob/main/huixiangdou/service/parallel_pipeline.py#L290
async def generate(self,
            query: Union[Query, str],
            history: List[Tuple[str]]=[], 
            language: str='zh', 
            enable_web_search: bool=True,
            enable_code_search: bool=True):

# v2 API https://github.com/tpoisonooo/HuixiangDou2/blob/main/huixiangdou/pipeline/parallel.py#L135
async def generate(self,
                query: Union[Query, str],
                history: List[Pair] = [],
                request_id: str = 'default',
                language: str = 'zh_cn'):

🍀 Acknowledgements

SiliconCloud Abundant LLM API, some models are free
KAG Graph retrieval based on reasoning
DB-GPT LLM tool collection
LightRAG Simple and efficient graph retrieval solution
SeedBench A multi-task benchmark for evaluating LLMs in seed science
kimi-cli AI coding assistant by kimi

📝 Citation

!!! The impact of open-source on different fields/industries varies. Since licensing restriction, we can only give the code and test conclusions, and the test data cannot be provided.

@misc{kong2024huixiangdou,
      title={HuiXiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
      author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
      year={2024},
      eprint={2401.08772},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kong2024labelingsupervisedfinetuningdata,
      title={Labeling supervised fine-tuning data with the scaling law}, 
      author={Huanjun Kong},
      year={2024},
      eprint={2405.02817},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2405.02817}, 
}

@misc{kong2025huixiangdou2robustlyoptimizedgraphrag,
      title={HuixiangDou2: A Robustly Optimized GraphRAG Approach}, 
      author={Huanjun Kong and Zhefan Wang and Chenyang Wang and Zhe Ma and Nanqing Dong},
      year={2025},
      eprint={2503.06474},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2503.06474}, 
}