Research Vault
January 8, 2026 · View on GitHub
An agentic AI research assistant that extracts structured patterns from papers and enables cross-document synthesis through natural language queries. Built with LLM orchestration, RAG (Retrieval-Augmented Generation), and vector search.
Overview
Research Vault transforms unstructured research content (PDFs, articles, notes) into a queryable knowledge base. Unlike traditional RAG systems that chunk and retrieve raw text, Research Vault uses structured extraction to pull discrete patterns with metadata, enabling true cross-paper synthesis.
Key differentiator: Instead of retrieving text chunks, Research Vault extracts findings (Claim → Evidence → Context) and synthesizes answers across your entire library with citations.
Why I Built This
I was drowning in research papers. I'd read something important, forget where I saw it, and spend hours searching through PDFs.
Existing tools didn't cut it. Note-taking apps require too much manual effort. Tools like NotebookLM are great for Q&A but don't extract structured patterns I can query later. Traditional RAG systems just return text chunks without real synthesis.
I didn't want to manually organize my research—I wanted a system that does it for me. So I built Research Vault. It automatically extracts structured insights from papers, stores them locally, and lets me ask questions like "where do authors disagree about agent memory?" and get synthesized answers with citations.
This is an open-source version of my personal tool, simplified for others facing the same problem.
Core Features
- Ingest research papers (PDF or text)
- Extract structured patterns (Claim, Evidence, Context)
- Store in hybrid database (relational + vector)
- Query with natural language across your entire library
- Synthesize cross-document patterns with source attribution
Tech Stack
| Layer | Technology |
|---|---|
| Backend | Python 3.12, FastAPI |
| LLM Orchestration | LangGraph, LangChain |
| Frontend | Next.js 16, React 19, TypeScript |
| Database | SQLite (MVP) → PostgreSQL |
| Vector Database | Qdrant (semantic search) |
| LLM | Claude (Anthropic API) |
| Embeddings | OpenAI text-embedding-3-small |
Architecture highlights:
- Agentic workflow with multi-step LLM orchestration
- Hybrid RAG combining relational + vector storage
- 3-pass extraction pipeline with citation verification
- Async Python throughout (aiosqlite, async clients)
Status
✅ Production-ready for personal use - Beta for multi-user features
- 619 backend tests + 23 frontend tests
- 7 comprehensive docs (Architecture, API, Operations, etc.)
- Docker deployment with health checks
- CI/CD pipeline (linting, testing, building)
Contributions welcome! See CONTRIBUTING.md
Design Notes
- Single-user system - Designed for individual researchers
- Local-first - All data stored on your machine, no cloud dependency
- Extensible - Fork and customize for your research needs
Quick Start
See Getting Started for detailed setup instructions.
TL;DR:
git clone https://github.com/aakashsharan/research-vault.git
cd research-vault
cp .env.example .env # Add your API keys
docker compose up --build
What You'll See
- Upload page - Drop a PDF or paste text
- Review page - Approve/reject extracted patterns
- Library page - Browse your knowledge base
- Query page - Ask questions, get synthesized answers
Production Features
Research Vault isn't just a demo—it's built for daily use:
- ✅ Partial success handling - Paper saved even if extraction fails
- ✅ Graceful degradation - Continues working if vector DB is unavailable
- ✅ Retry logic - LLM calls retry with exponential backoff
- ✅ Error boundaries - Frontend handles API failures gracefully
- ✅ Health checks -
/healthendpoint monitors all dependencies - ✅ Comprehensive logging - Debug production issues easily
- ✅ Integration tests - Full pipeline tested end-to-end
- ✅ Security scanning - Pre-commit hooks prevent secret leaks
See OPERATIONS.md for deployment and troubleshooting.
Documentation
- Requirements - Functional and non-functional specs
- Architecture - System design and flows
- Domain Model - Entity schemas
- Operations - Deployment and troubleshooting
- Contributing - How to contribute
Community & Support
- 🐛 Bug reports: GitHub Issues
- 📖 Documentation: docs/
- 🤝 Contributing: See CONTRIBUTING.md
Built by @aakashsharan - Applying 12+ years of distributed systems experience to agentic AI challenges.
Related Work
Research Vault builds on ideas from:
- ReAct (Yao et al.) - Interleaving reasoning and action
- NotebookLM - Excellent Q&A over documents, different approach to synthesis
- Obsidian/Roam - Graph-based note-taking, manual organization
- Zotero/Mendeley - Reference management, no LLM synthesis
See SRAL Framework Github and SRAL Framework Paper for the evaluation framework that inspired this work.
License
MIT License - see LICENSE for details.
Keywords: agentic ai, llm rag, rag system, vector database, semantic search, knowledge base, research tool, paper management, structured extraction, langchain, langgraph, ai agent, llm orchestration, python fastapi, anthropic claude