Research Vault

January 8, 2026 · View on GitHub

An agentic AI research assistant that extracts structured patterns from papers and enables cross-document synthesis through natural language queries. Built with LLM orchestration, RAG (Retrieval-Augmented Generation), and vector search.

Overview

Research Vault transforms unstructured research content (PDFs, articles, notes) into a queryable knowledge base. Unlike traditional RAG systems that chunk and retrieve raw text, Research Vault uses structured extraction to pull discrete patterns with metadata, enabling true cross-paper synthesis.

Key differentiator: Instead of retrieving text chunks, Research Vault extracts findings (Claim → Evidence → Context) and synthesizes answers across your entire library with citations.

Why I Built This

I was drowning in research papers. I'd read something important, forget where I saw it, and spend hours searching through PDFs.

Existing tools didn't cut it. Note-taking apps require too much manual effort. Tools like NotebookLM are great for Q&A but don't extract structured patterns I can query later. Traditional RAG systems just return text chunks without real synthesis.

I didn't want to manually organize my research—I wanted a system that does it for me. So I built Research Vault. It automatically extracts structured insights from papers, stores them locally, and lets me ask questions like "where do authors disagree about agent memory?" and get synthesized answers with citations.

This is an open-source version of my personal tool, simplified for others facing the same problem.

Core Features

Ingest research papers (PDF or text)
Extract structured patterns (Claim, Evidence, Context)
Store in hybrid database (relational + vector)
Query with natural language across your entire library
Synthesize cross-document patterns with source attribution

Tech Stack

Layer	Technology
Backend	Python 3.12, FastAPI
LLM Orchestration	LangGraph, LangChain
Frontend	Next.js 16, React 19, TypeScript
Database	SQLite (MVP) → PostgreSQL
Vector Database	Qdrant (semantic search)
LLM	Claude (Anthropic API)
Embeddings	OpenAI text-embedding-3-small

Architecture highlights:

Agentic workflow with multi-step LLM orchestration
Hybrid RAG combining relational + vector storage
3-pass extraction pipeline with citation verification
Async Python throughout (aiosqlite, async clients)

Status

✅ Production-ready for personal use - Beta for multi-user features

619 backend tests + 23 frontend tests
7 comprehensive docs (Architecture, API, Operations, etc.)
Docker deployment with health checks
CI/CD pipeline (linting, testing, building)

Contributions welcome! See CONTRIBUTING.md

Design Notes

Single-user system - Designed for individual researchers
Local-first - All data stored on your machine, no cloud dependency
Extensible - Fork and customize for your research needs

Quick Start

See Getting Started for detailed setup instructions.

TL;DR:

git clone https://github.com/aakashsharan/research-vault.git
cd research-vault
cp .env.example .env  # Add your API keys
docker compose up --build

Open http://localhost:3000

What You'll See

Upload page - Drop a PDF or paste text
Review page - Approve/reject extracted patterns
Library page - Browse your knowledge base
Query page - Ask questions, get synthesized answers

Production Features

Research Vault isn't just a demo—it's built for daily use:

✅ Partial success handling - Paper saved even if extraction fails
✅ Graceful degradation - Continues working if vector DB is unavailable
✅ Retry logic - LLM calls retry with exponential backoff
✅ Error boundaries - Frontend handles API failures gracefully
✅ Health checks - /health endpoint monitors all dependencies
✅ Comprehensive logging - Debug production issues easily
✅ Integration tests - Full pipeline tested end-to-end
✅ Security scanning - Pre-commit hooks prevent secret leaks

See OPERATIONS.md for deployment and troubleshooting.

Documentation

Requirements - Functional and non-functional specs
Architecture - System design and flows
Domain Model - Entity schemas
Operations - Deployment and troubleshooting
Contributing - How to contribute

Community & Support

🐛 Bug reports: GitHub Issues
📖 Documentation: docs/
🤝 Contributing: See CONTRIBUTING.md

Built by @aakashsharan - Applying 12+ years of distributed systems experience to agentic AI challenges.

Research Vault builds on ideas from:

ReAct (Yao et al.) - Interleaving reasoning and action
NotebookLM - Excellent Q&A over documents, different approach to synthesis
Obsidian/Roam - Graph-based note-taking, manual organization
Zotero/Mendeley - Reference management, no LLM synthesis

See SRAL Framework Github and SRAL Framework Paper for the evaluation framework that inspired this work.

License

MIT License - see LICENSE for details.

Keywords: agentic ai, llm rag, rag system, vector database, semantic search, knowledge base, research tool, paper management, structured extraction, langchain, langgraph, ai agent, llm orchestration, python fastapi, anthropic claude