README.md
May 12, 2026 ยท View on GitHub
DOI Citation Verifier MCP Server
A Model Context Protocol (MCP) server that prevents citation hallucination by verifying academic citations against 9 authoritative databases. This server enables AI assistants to verify every citation against real publications before citing them.
๐ Quick Install
npx -y github:tfscharff/doi-mcp
Or add to your Claude Desktop config:
{
"mcpServers": {
"doi-mcp": {
"command": "npx",
"args": ["-y", "github:tfscharff/doi-mcp"]
}
}
}
The Problem This Solves
Large language models sometimes "hallucinate" academic citations - citing papers that don't exist, misattributing real titles to wrong authors, or mixing up publication details. This MCP server eliminates that problem by:
- 9-database verification: Checks citations across CrossRef, OpenAlex, PubMed, zbMATH, ERIC, HAL, INSPIRE-HEP, Semantic Scholar, and DBLP
- Parallel search: Queries all databases simultaneously for fast results (~1 second)
- Comprehensive coverage: 600+ million publications across all disciplines including STEM, humanities, social sciences, and education
- DOI-backed citations: Every verified citation includes a valid, clickable DOI
Features
- 9 Database Search: CrossRef, OpenAlex, PubMed, zbMATH, ERIC, HAL, INSPIRE-HEP, Semantic Scholar, DBLP
- Verify Citations: Check if a paper with specific details actually exists across all databases
- Find Verified Papers: Search for real papers on a topic and get only verified citations
- Parallel Processing: All database queries run simultaneously for maximum speed
- LRU Caching: Repeated queries return instantly (5-minute TTL)
- Early Exit: High-confidence matches (score โฅ8) return immediately without waiting for all databases
- Source Selection: Search all databases or target specific sources
- Citation Formatting: Returns properly formatted citations with DOIs
- Zero Configuration: All databases work out-of-the-box with no API keys required
- Fully Tested: 41 tests covering scoring, caching, database adapters, and tool integration
How It Works
When an AI assistant is asked about research or for citations:
- Without this MCP: The assistant might cite "According to Smith et al. (2023) in Nature..." referencing a paper that doesn't exist
- With this MCP: The assistant uses
verifyCitationfirst, which searches across 9 databases in parallel and returns:- Verified match with full DOI โ Can be cited
- No match found โ Cannot cite; must search for real papers instead
Tools
verifyCitation
Primary anti-hallucination tool - Verifies a citation exists across multiple databases before it can be mentioned.
Input:
title(string, optional): Paper title (partial matches accepted)authors(array, optional): Author names (last names sufficient)year(number, optional): Publication yeardoi(string, optional): DOI if knownjournal(string, optional): Journal name
Returns JSON with:
verified: true/false- If verified=true: DOI, title, authors, year, journal, URL, source database
- If verified=false: Warning message that no matching publication was found
- Match quality indicators for transparency
Example successful verification:
{
"verified": true,
"doi": "10.1038/s41586-023-06004-9",
"title": "Accurate structure prediction of biomolecular interactions...",
"authors": ["John Jumper", "Richard Evans", "..."],
"year": 2023,
"journal": "Nature",
"url": "https://doi.org/10.1038/s41586-023-06004-9",
"source": "crossref",
"message": "โ Citation verified"
}
findVerifiedPapers
Search for real papers on a topic and return only verified citations with DOIs from multiple databases.
Input:
query(string): Search query (topic, keywords, author names)source(string, optional): Which database to search - "all" (default), "crossref", "openalex", "pubmed", "zbmath", "eric", "hal", "inspirehep", "semanticscholar", or "dblp"limit(number, optional): Number of results per source (1-20, default: 5)yearFrom(number, optional): Minimum publication yearyearTo(number, optional): Maximum publication year
Returns: Array of verified papers from the specified database(s) with complete citation information including source
Example:
// Search all 9 databases
findVerifiedPapers({ query: "CRISPR gene editing", limit: 5 })
// Search only PubMed for biomedical papers
findVerifiedPapers({ query: "cancer immunotherapy", source: "pubmed", limit: 10 })
// Search zbMATH for mathematics papers
findVerifiedPapers({ query: "algebraic topology", source: "zbmath" })
// Search DBLP for computer science papers
findVerifiedPapers({ query: "neural networks", source: "dblp", yearFrom: 2020 })
// Search ERIC for education research
findVerifiedPapers({ query: "active learning pedagogy", source: "eric" })
// Search HAL for French/European humanities research
findVerifiedPapers({ query: "phenomenology Husserl", source: "hal" })
// Search INSPIRE-HEP for high-energy physics papers
findVerifiedPapers({ query: "Higgs boson", source: "inspirehep" })
Installation
Add to your Claude Desktop config file:
Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"doi-mcp": {
"command": "npx",
"args": ["-y", "github:tfscharff/doi-mcp"]
}
}
}
Restart Claude Desktop and the server will be available.
Alternative: Global Install
npm install -g github:tfscharff/doi-mcp
Then use this config:
{
"mcpServers": {
"doi-mcp": {
"command": "doi-mcp"
}
}
}
Alternative: Clone Locally
git clone https://github.com/tfscharff/doi-mcp.git
cd doi-mcp
npm install
npm run build
Config for local install:
{
"mcpServers": {
"doi-mcp": {
"command": "node",
"args": ["/absolute/path/to/doi-mcp/dist/index.js"]
}
}
}
Troubleshooting
Server not connecting
- Check Node.js is installed:
node --version(requires v18+) - Check Claude Desktop logs:
- Windows:
%APPDATA%\Claude\logs\ - macOS:
~/Library/Logs/Claude/ - Linux:
~/.config/Claude/logs/
- Windows:
npx command fails
npm cache clean --force
Testing locally
npx @modelcontextprotocol/inspector node dist/index.js
Development
# Install dependencies
npm install
# Build
npm run build
# Development with watch mode
npm run dev
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverage
Architecture
Version 4.0 uses a modular architecture for maintainability and testability:
src/
โโโ index.ts # Entry point
โโโ server.ts # MCP server setup
โโโ types.ts # Shared interfaces
โโโ scoring.ts # Match scoring algorithm
โโโ cache.ts # LRU cache (5-min TTL)
โโโ http.ts # Fetch utilities
โโโ tools/ # Tool handlers
โ โโโ verifyCitation.ts
โ โโโ batchVerifyCitations.ts
โ โโโ findVerifiedPapers.ts
โโโ databases/ # Database adapters
โโโ index.ts # Parallel query orchestrator
โโโ crossref.ts
โโโ openalex.ts
โโโ pubmed.ts
โโโ ... (9 adapters total)
Adding a new database:
- Create
src/databases/newdb.tswithconfig,search(), andnormalize() - Import and add to
src/databases/index.ts - Add tests in
tests/databases/newdb.test.ts
Example Usage
Before this MCP (citation hallucination):
User: "Tell me about recent AlphaFold research"
Assistant: "According to Johnson et al. (2024) in Science, AlphaFold3 achieved..."
โ This paper doesn't exist
After this MCP (verified citations only):
User: "Tell me about recent AlphaFold research"
Assistant: [Uses findVerifiedPapers tool]
"According to Jumper et al. (2023) in Nature (DOI: 10.1038/s41586-023-06004-9),
AlphaFold3 achieved..."
โ Real paper with valid DOI verified across databases
Verification catches fake citations:
User: "Can you verify this citation: Smith et al. (2024), 'Quantum AI', Nature"
Assistant: [Uses verifyCitation tool - searches all 9 databases in parallel]
"โ I cannot verify this citation - no matching publication found in
any of the 9 databases. This citation may be incorrect."
Database Coverage
All databases are queried in parallel for maximum speed (~1 second total):
General Databases
- CrossRef: 150+ million scholarly publications across all disciplines
- OpenAlex: 250+ million scholarly works across all disciplines
- Semantic Scholar: 200+ million papers with AI-powered search
Specialized Databases
- PubMed: 35+ million biomedical and life sciences publications
- zbMATH: 4+ million mathematics publications
- DBLP: Comprehensive computer science bibliography (journals and conferences)
- ERIC: 1.7+ million education research publications
- HAL: 4.4+ million French/European scholarly documents (2.5M English)
- INSPIRE-HEP: 1.7+ million high-energy physics publications
Total Coverage
600+ million publications across all academic disciplines with specialized depth in STEM, computer science, biomedical sciences, mathematics, and education research.
License
MIT
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Related
API Documentation
- CrossRef API
- OpenAlex API
- PubMed API
- zbMATH API
- ERIC API
- HAL API
- INSPIRE-HEP API
- Semantic Scholar API
- DBLP API