🤖 Agentic RAG-R1: Enhance Agentic RAG Reasoning Capacity via Reinforcement Learning 🚀

June 29, 2026 · View on GitHub

🤖 Agentic RAG-R1: Enhance Agentic RAG Reasoning Capacity via Reinforcement Learning 🚀

Introduction 🌟

Agentic RAG‑R1 is an open‑source initiative to build an Agentic Retrieval‑Augmented Generation (RAG) system by endowing a base language model with autonomous search & reasoning skills through reinforcement learning (currently using the GRPO algorithm).

Chinese Language Version:

Chinese version results

English Language Version:

English version results

What is Agentic RAG? 💡

Agentic RAG combines two powerful concepts:

Retrieval‑Augmented Generation (RAG): Combines generative power with on‑the‑fly retrieval from external knowledge bases, ensuring factual and up‑to‑date answers.
Agentic AI: Gives the model the ability to decide when to retrieve, what to retrieve, and how to weave the retrieved evidence into its reasoning.

Agentic RAG concept

Architecture 🏗️

Our architecture is inspired by TC‑RAG and features an agent memory stack that orchestrates the full deliberation loop, supporting the following actions:

Plan (❌)
Reasoning (✅)
Backtrack (✅)
Summary (✅)
Tool Observation – wiki/document/knowledge‑graph search, etc. (✅)
Conclusion (✅)

Architecture diagram

Training Strategy 🧠

Motivated by DeepSeek-R1, we apply GRPO (Generalized Relevance Policy Optimization) to reinforce the agent's choice of reasoning steps and retrieval actions, effectively boosting both search depth and answer quality.

Training strategy diagram

Rollout Generation 🔄

Rollout generation diagram

Installation 🛠️

We use conda to manage the environment. Follow these steps to set up:

conda create -n AgenticRAG python=3.11 -y
conda activate AgenticRAG 
pip install -r requirements.txt

Tools Environment (Optional) 🧰

We provide our search tool repository ArtSearch as the search engine, which supports retrieval of information from Wikipedia. You can follow the instructions in that repository to deploy a local instance of the search system.

Folder Structure 📁

.
├── ArtSearch                 # Search tool integration
├── checkpoints               # Model checkpoints
├── examples                  # Example use cases
├── experiments
│   ├── evaluation            # Evaluation scripts and results
│   └── training              # Training configurations
├── README.md
├── requirements.txt
├── script
│   ├── evaluation            # Evaluation scripts
│   ├── run_server.sh         # Server deployment script
│   └── training              # Training scripts
├── service
│   ├── chat_client.py        # Client for interacting with the model
│   └── chat_server.py        # Server for hosting the model
├── src
│   ├── config                # Configuration files
│   ├── data                  # Data processing utilities
│   ├── evaluation            # Evaluation metrics and tools
│   ├── models                # Model definitions
│   ├── train.py              # Main training script
│   └── utils                 # Utility functions

Quick Start ⚡

Follow the steps below to get up and running with Agentic RAG‑R1.

Before you start, rename file ".env_format" to ".env" and fill the necessary os enviroment variables.

Training

Zero‑2 Mode

./script/training/train_zero2.sh

Zero‑3 Mode

./script/training/train_zero3.sh

Inference

Example Mode

comming soon~

Server Mode

Launch the chat server:

./script/run_server.sh

Features ✨

LoRA Tuning Support 🔧: Fine-tune efficiently with Low-Rank Adaptation
Model Quant Support 💻: Support model quant to nf4 and ..
Custom Agent Tools 🛠️: Integrate your own tools and personal RAG datasets
Distributed Training 🌐: Support for Deepspeed Zero 2 Stage and Zero 3 Stage
Efficient Resource Usage 💻: Support for models up to 32B parameters using only 2 A100 GPUs
Tool Calling Reward 🎯: Enhanced reward model that includes:
- Accuracy reward
- Format reward
- RAG accuracy reward using the RAGAS framework
The total reward is calculated as:

$r_{total} = r_{accuracy} + r_{format} + r_{rag}$
TCRAG Integration 🔗: Use TCRAG as the rollout generator

Results 📊

Experiment Log on Qwen 2.5-7B-Instruct

Experiment log

We have made our training logs publicly available at: SwanLab Training Log

Results on MedQA Test Set 🏥

Our Qwen 2.5-7B-Instruct model was evaluated on the MedQA test set using Qwen‑2.5‑72B as the judge:

Configuration	Format Accuracy	Answer Accuracy
Before fine-tuning	39%	84%
Before fine-tuning + search	56%	79%
After fine-tuning (200 steps) + search	92%	87%

Roadmap 🗺️

Add more tools
[Additional planned features]

Acknowledgements 🙏

The concept of Agentic-RAG-R1 is inspired by Deepseek-R1 and TC-RAG. We sincerely appreciate the efforts of these teams for their contributions to open-source research and development. This work is in the same period as work with Search-R1 and ReSearch.

Contributors📝

Supervisors: Junfeng Zhao, Xu Chu, Yasha Wang

Affiliation: Key Laboratory of High Confidence Software Technologies (Peking University), School of Computer Science, Peking University, China

Citation 📝

If you use this work in your research, please cite:

@misc{Agentic_RAG_R1,
  title       = {Agentic RAG-R1: Enhance Agentic RAG Reasoning Capacity via Reinforcement Learning},
  author      = {Xinke Jiang, Jiaran Gao, Rihong Qiu, Zhixin Zhang, Wentao Zhang, Yue Fang, Hongxin Ding},
  year        = {2025},
  howpublished= {\url{https://github.com/jiangxinke/Agentic-RAG-R1}},
  note        = {GitHub repository},
}

🌟 Star History

License 📄

This project is licensed under the Apache License. See the LICENSE file for details.