README.md

November 4, 2025 ยท View on GitHub

English | Simplified Chinese

Arxiv

๐Ÿ”ฅ Introduction

ROGRAG enhances LLM performance on specialized topics using a robust GraphRAG approach. It features a two-stage (dual-level and logic form methods) retrieval mechanism to improve accuracy without extra computation costs. ROGRAG achieves a 15% score boost on SeedBench, outperforming mainstream methods.

This repo has contributed to:

Key Highlights:

  • Two-stage retrieval for robustness
  • Incremental database construction
  • Enhanced fuzzy matching and structured reasoning
MethodQA-1 (Accuracy)QA-2 (F1)QA-3 (Rouge)QA-4 (Rouge)
vanilla (w/o RAG)0.570.710.160.35
LangChain0.680.680.150.04
BM250.650.690.230.03
RQ-RAG0.590.620.170.33
ROGRAG (Ours)0.750.790.360.38

Deployed on an online research platform, ROGRAG is ready for integration. Here is the technical report.

If it is useful to you, please star it โญ

๐Ÿ’ก Updates

  • [2025/11] Support multi-database, refactor gradio UI and server

๐Ÿ“– Documentation

๐Ÿ”† Version Description

Compared to HuixiangDou, this repo improves accuracy:

  1. Graph Schema. Dense retrieval is only for querying similar entities and relationships.

  2. Ported/merged multiple open-source implementations, with code differences of nearly 18k lines:

    • Data. Organized a set of real domain knowledge that LLM has not fully seen for testing (gpt accuracy < 0.6)
    • Ablation. Confirmed the impact of different stages and parameters on accuracy
  3. API remains compatible. That means Wechat/Lark/Web in v1 is also accessible.

    # v1 API https://github.com/InternLM/HuixiangDou/blob/main/huixiangdou/service/parallel_pipeline.py#L290
    async def generate(self,
                query: Union[Query, str],
                history: List[Tuple[str]]=[], 
                language: str='zh', 
                enable_web_search: bool=True,
                enable_code_search: bool=True):
    
    # v2 API https://github.com/tpoisonooo/HuixiangDou2/blob/main/huixiangdou/pipeline/parallel.py#L135
    async def generate(self,
                    query: Union[Query, str],
                    history: List[Pair] = [],
                    request_id: str = 'default',
                    language: str = 'zh_cn'):
    

๐Ÿ€ Acknowledgements

  • SiliconCloud Abundant LLM API, some models are free
  • KAG Graph retrieval based on reasoning
  • DB-GPT LLM tool collection
  • LightRAG Simple and efficient graph retrieval solution
  • SeedBench A multi-task benchmark for evaluating LLMs in seed science
  • kimi-cli AI coding assistant by kimi

๐Ÿ“ Citation

!!! The impact of open-source on different fields/industries varies. Since licensing restriction, we can only give the code and test conclusions, and the test data cannot be provided.

@misc{kong2024huixiangdou,
      title={HuiXiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
      author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
      year={2024},
      eprint={2401.08772},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kong2024labelingsupervisedfinetuningdata,
      title={Labeling supervised fine-tuning data with the scaling law}, 
      author={Huanjun Kong},
      year={2024},
      eprint={2405.02817},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2405.02817}, 
}

@misc{kong2025huixiangdou2robustlyoptimizedgraphrag,
      title={HuixiangDou2: A Robustly Optimized GraphRAG Approach}, 
      author={Huanjun Kong and Zhefan Wang and Chenyang Wang and Zhe Ma and Nanqing Dong},
      year={2025},
      eprint={2503.06474},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2503.06474}, 
}