MS MARCO
April 23, 2026 ยท View on GitHub
If you like our Framework, don't hesitate to โญ star this repository โญ. This helps us to make the Framework more better and scalable to different models and methods ๐ค.
A modular and efficient retrieval, reranking and RAG framework designed to work with state-of-the-art models for retrieval, ranking and rag tasks.
๐ Demo
To run the demo locally:
# Make sure Rankify is installed
pip install streamlit
# Then run the demo
streamlit run demo.py
https://github.com/user-attachments/assets/13184943-55db-4f0c-b509-fde920b809bc
:link: Navigation
- Features
- Roadmap
- Installation
- Quick Start
- Indexing
- Retrievers
- Re-Rankers
- Generators
- Evaluation
- Documentation
- Community Contributing
- Contributing
- License
- Acknowledgments
- Citation
๐News
-
[2026-02-16] Huge thanks to @JamieHoldcroft for integrating 15+ new dense retrievers, including SOTA LLM-based bi-encoders (SFR, E5, GritLM) and reasoning-augmented models (RaDeR, ReasonIR, ReasonEmbed, BGE-Reasoner).
-
[2025-10-14] Updated installation with optional extras:
retriever,reranking,rag, andall. -
[2025-10-14] New CLI (
rankify-index) syntax & examples for BM25, DPR, ANCE, Contriever, ColBERT, BGE. -
[2025-06-11] Many thanks to @tobias124 for implementing Indexing for Custom Dataset.
-
[2025-06-01] Many thanks to @aherzinger for implementing and refactoring the Generator and RAG models.
-
[2025-05-30] Huge thanks to @baraayusry for implementing the Online Retriever using CrawAI and ReACT.
-
[2025-02-10] Released reranking-datasets and reranking-datasets-light on Hugging Face.
-
[2025-02-04] Our paper is released on arXiv.
๐ง Installation
Set up the virtual environment
First, create and activate a conda environment with Python 3.10:
conda create -n rankify python=3.10
conda activate rankify
Install PyTorch 2.5.1
we recommend installing Rankify with PyTorch 2.5.1 for Rankify. Refer to the PyTorch installation page for platform-specific installation commands.
If you have access to GPUs, it's recommended to install the CUDA version 12.4 or 12.6 of PyTorch, as many of the evaluation metrics are optimized for GPU use.
To install Pytorch 2.5.1 you can install it from the following cmd
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
Basic Installation
To install Rankify, simply use pip (requires Python 3.10+):
pip install rankify
Recommended Installation
For full functionality, we recommend installing Rankify with all dependencies:
pip install "rankify[all]"
This ensures you have all necessary modules, including retrieval, re-ranking, and RAG support.
Optional Dependencies
If you prefer to install only specific components, choose from the following:
# Retrieval stack (BM25, dense retrievers, web tools)
pip install "rankify[retriever]"
# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.
pip install "rankify[reranking]"
# RAG endpoints (OpenAI, LiteLLM, vLLM clients)
pip install "rankify[rag]"
Or, to install from GitHub for the latest development version:
git clone https://github.com/DataScienceUIBK/rankify.git
cd rankify
pip install -e .
# For full functionality we recommend installing Rankify with all dependencies:
pip install -e ".[all]"
# Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install -e ".[retriever]"
# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.
pip install -e ".[reranking]"
# RAG endpoints (OpenAI, LiteLLM, vLLM clients)
pip install -e ".[rag]"
Using ColBERT Retriever
If you want to use ColBERT Retriever, follow these additional setup steps:
# Install GCC and required libraries
conda install -c conda-forge gcc=9.4.0 gxx=9.4.0
conda install -c conda-forge libstdcxx-ng
# Export necessary environment variables
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
export CC=gcc
export CXX=g++
export PATH=$CONDA_PREFIX/bin:$PATH
# Clear cached torch extensions
rm -rf ~/.cache/torch_extensions/*
:rocket: Quick Start
๐ One-Line Pipeline API (Recommended)
The simplest way to use Rankify - HuggingFace-style one-line interface:
from rankify import pipeline
# Create a RAG pipeline with intelligent defaults
rag = pipeline("rag")
answers = rag("What is machine learning?", documents)
# Or customize your configuration
rag = pipeline(
"rag",
retriever="bge", # State-of-the-art dense retriever
reranker="flashrank", # Ultra-fast reranker
generator="basic-rag"
)
Available Pipeline Types:
pipeline("search")- Document retrieval onlypipeline("rerank")- Retrieve + rerankpipeline("rag")- Full RAG pipeline (retrieve + rerank + generate)
๐ Pipeline API Documentation
๐ค RankifyAgent - AI-Powered Model Selection
Let AI help you choose the best models for your use case:
from rankify.agent import RankifyAgent, recommend
# Quick recommendation
result = recommend(task="qa", gpu=True)
print(f"Best Retriever: {result.retriever.name}")
print(f"Best Reranker: {result.reranker.name}")
# Conversational agent
agent = RankifyAgent(backend="azure") # or "openai", "litellm", "local"
response = agent.chat("I need a fast search system for production")
print(response.message)
print(response.code_snippet) # Ready-to-use code
๐ RankifyAgent Documentation
๐ Rankify Server - Deploy as REST API
Start a production-ready server in one command:
# CLI
rankify serve --port 8000 --retriever bge --reranker flashrank
# Or in Python
from rankify.server import RankifyServer
server = RankifyServer(retriever="bge", reranker="flashrank")
server.start(port=8000)
API Endpoints:
POST /retrieve- Document retrievalPOST /rerank- Rerank documentsPOST /rag- Full RAG generationGET /health- Health check
# Example API call
curl -X POST http://localhost:8000/rag \
-H "Content-Type: application/json" \
-d '{"query": "What is AI?", "n_contexts": 5}'
๐ Server Documentation
๐ Integrations - Use with Your Stack
Seamlessly integrate with LangChain, LlamaIndex, and more:
# LangChain
from rankify.integrations import LangChainRetriever
from langchain.chains import RetrievalQA
retriever = LangChainRetriever(method="bge", reranker="flashrank")
chain = RetrievalQA.from_chain_type(llm=your_llm, retriever=retriever)
# LlamaIndex
from rankify.integrations import LlamaIndexRetriever
retriever = LlamaIndexRetriever(method="colbert", reranker="monot5")
๐ Integrations Documentation
๐จ Web Playground - Interactive UI
Launch an interactive Gradio interface:
from rankify.ui import launch_playground
launch_playground(port=7860)
Try models, compare results, and export code - all in your browser!
1๏ธโฃ Traditional Workflow (For Advanced Users)
Pre-retrieved Datasets
We provide 40+ benchmark datasets with 1,000 pre-retrieved documents each:
๐ Hugging Face Dataset Repository
Dataset Format
[
{
"question": "...",
"answers": ["...", "...", ...],
"ctxs": [
{
"id": "...", // Passage ID
"score": "...", // Retriever score
"has_answer": true|false
}
]
}
]
List Available Datasets
from rankify.dataset.dataset import Dataset
Dataset.available_dataset() # Fixed typo: avaiable -> available
Download Datasets
from rankify.dataset.dataset import Dataset
# Download BM25-retrieved documents
dataset = Dataset(retriever="bm25", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Load from file
documents = Dataset.load_dataset('./path/to/dataset.json', n_docs=100)
๐งฑ Indexing via CLI
The CLI entrypoint is rankify-index with a subcommand index.
Common flags
corpus_path(positional): path to JSONL corpus.--retriever {bm25,dpr,ance,contriever,colbert,bge}.--output PATH(default:rankify_indices).--index_type {wiki,msmarco}(default:wiki).--threads INT(default:32, sparse & some dense prep).--device {cpu,cuda}(default: retrieverโspecific, typicallycuda).--batch_size INT(dense encoders / Faiss add batches).--encoder MODEL(dense encoders only; sensible defaults used if omitted).
Index layout
- BM25 โ
<output>/<stem>/bm25_index- DPR โ
<output>/<stem>/dpr_index_<index_type>- ANCE โ
<output>/<stem>/ance_index_<index_type>- BGE โ
<output>/<stem>/bge_index_<index_type>- Contriever โ
<output>/<stem>/contriever_index_<index_type>- ColBERT โ
<output>/<stem>/colbert_index_<index_type>
BM25
rankify-index index data/wikipedia_10k.jsonl \
--retriever bm25 \
--output ./indices
DPR (singleโencoder by default)
# Wikipedia style
rankify-index index data/wikipedia_100.jsonl \
--retriever dpr \
--encoder facebook/dpr-ctx_encoder-single-nq-base \
--batch_size 16 --device cuda \
--output ./indices
# MS MARCO
rankify-index index data/msmarco_100.jsonl \
--retriever dpr --index_type msmarco \
--encoder facebook/dpr-ctx_encoder-single-nq-base \
--batch_size 16 --device cuda \
--output ./indices
ANCE
rankify-index index data/wikipedia_100.jsonl \
--retriever ance \
--encoder castorini/ance-dpr-context-multi \
--batch_size 16 --device cuda \
--output ./indices
Contriever
rankify-index index data/wikipedia_100.jsonl \
--retriever contriever \
--encoder facebook/contriever-msmarco \
--batch_size 16 --device cuda \
--output ./indices
ColBERT
rankify-index index data/wikipedia_100.jsonl \
--retriever colbert \
--batch_size 32 --device cuda \
--output ./indices
BGE
rankify-index index data/wikipedia_100.jsonl \
--retriever bge \
--encoder BAAI/bge-large-en-v1.5 \
--batch_size 16 --device cuda \
--output ./indices
2๏ธโฃ Running Retrieval
To perform retrieval using Rankify, you can choose from various retrieval methods such as BM25, DPR, ANCE, Contriever, ColBERT, BGE, Sbert, Nomic, Instructor, DiverRetriever, SRF, E5, RaDeR, M2, GritLM, ReasonEmbed, ReasonIR and BGEReasoner.
Step 1: Setup example queries
Example: Running Retrieval on Sample Queries
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.retrievers.retriever import Retriever
# Sample Documents
documents = [
Document(question=Question("the cast of a good day to die hard?"), answers=Answer([
"Jai Courtney",
"Sebastian Koch",
"Radivoje Bukviฤ",
"Yuliya Snigir",
"Sergei Kolesnikov",
"Mary Elizabeth Winstead",
"Bruce Willis"
]), contexts=[]),
Document(question=Question("Who wrote Hamlet?"), answers=Answer(["Shakespeare"]), contexts=[])
]
Step 2: Choose Retrieval Option
Option A:
Retrieval index_type (e.g., "wiki", "msmarco") to load pre-computed FAISS indices.
# BM25 retrieval on Wikipedia
bm25_retriever_wiki = Retriever(method="bm25", n_docs=5, index_type="wiki")
# BM25 retrieval on MS MARCO
bm25_retriever_msmarco = Retriever(method="bm25", n_docs=5, index_type="msmarco")
# DPR (multi-encoder) retrieval on Wikipedia
dpr_retriever_wiki = Retriever(method="dpr", model="dpr-multi", n_docs=5, index_type="wiki")
# DPR (multi-encoder) retrieval on MS MARCO
dpr_retriever_msmarco = Retriever(method="dpr", model="dpr-multi", n_docs=5, index_type="msmarco")
# DPR (single-encoder) retrieval on Wikipedia
dpr_retriever_wiki = Retriever(method="dpr", model="dpr-single", n_docs=5, index_type="wiki")
# DPR (single-encoder) retrieval on MS MARCO
dpr_retriever_msmarco = Retriever(method="dpr", model="dpr-single", n_docs=5, index_type="msmarco")
# ANCE retrieval on Wikipedia
ance_retriever_wiki = Retriever(method="ance", model="ance-multi", n_docs=5, index_type="wiki")
# ANCE retrieval on MS MARCO
ance_retriever_msmarco = Retriever(method="ance", model="ance-multi", n_docs=5, index_type="msmarco")
# Contriever retrieval on Wikipedia
contriever_retriever_wiki = Retriever(method="contriever", model="facebook/contriever-msmarco", n_docs=5, index_type="wiki")
# Contriever retrieval on MS MARCO
contriever_retriever_msmarco = Retriever(method="contriever", model="facebook/contriever-msmarco", n_docs=5, index_type="msmarco")
# ColBERT retrieval on Wikipedia
colbert_retriever_wiki = Retriever(method="colbert", model="colbert-ir/colbertv2.0", n_docs=5, index_type="wiki")
# ColBERT retrieval on MS MARCO
colbert_retriever_msmarco = Retriever(method="colbert", model="colbert-ir/colbertv2.0", n_docs=5, index_type="msmarco")
# BGE retrieval on Wikipedia
bge_retriever_wiki = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", n_docs=5, index_type="wiki")
# BGE retrieval on MS MARCO
bge_retriever_msmarco = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", n_docs=5, index_type="msmarco")
# Hyde retrieval on Wikipedia
hyde_retriever_wiki = Retriever(method="hyde" , n_docs=5, index_type="wiki", api_key=OPENAI_API_KEY )
# Hyde retrieval on MS MARCO
hyde_retriever_msmarco = Retriever(method="hyde", n_docs=5, index_type="msmarco", api_key=OPENAI_API_KEY)
Option B: Retrieval with custom datasets and automated caching.
Featuring some of the latest 7B+ parameter models, all of the models below are purposed only for usage with custom datasets.
Simply pass a .jsonl file to corpus_path, ensuring your data maps to the required id: and text: fields, and the model will embed and cache the data locally on the first run.
# Bi-encoders as implemented in the diver framework (11 configurable models, specified by model_id)
bge_large_retriever = Retriever(method="diver-dense", model_id="bge", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
sbert_retriever = Retriever(method="diver-dense", model_id="sbert", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
inst_l_retriever = Retriever(method="diver-dense", model_id="inst-l", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
inst_xl_retriever = Retriever(method="diver-dense", model_id="inst-xl", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
sfr_retriever = Retriever(method="diver-dense", model_id="sf", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
e5_retriever = Retriever(method="diver-dense", model_id="e5", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
contriever_retriever = Retriever(method="diver-dense", model_id="contriever", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
m2_retriever = Retriever(method="diver-dense", model_id="m2", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
grit_retriever = Retriever(method="diver-dense", model_id="grit", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
rader_retriever = Retriever(method="diver-dense", model_id="rader", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
nomic_retriever = Retriever(method="diver-dense", model_id="nomic", corpus_path="data/my_corpus.jsonl", encode_batch_size=4, n_docs=5)
diver_retriever = Retriever(method="diver-dense", model_id="diver", corpus_path="data/my_corpus.jsonl", encode_batch_size=4, n_docs=5)
# Reasonir retrieval
reasonir_retriever = Retriever(method="reasonir", corpus_path="data/my_corpus.jsonl", encode_batch_size=4, n_docs=5)
# ReasonEmbed retrieval (3 configurable models specified by model_id)
reasonembed_qwen8b_retriever = Retriever(method="reason-embed", model_id="qwen3-8b", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
reasonembed_llama8b_retriever = Retriever(method="reason-embed", model_id="qwen3-4b", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
reasonembed_qwen4b_retriever = Retriever(method="reason-embed", model_id="llama-8b", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
# BgeReasonEmbed retrieval
bge_reasoner_retriever = Retriever(method="bge-reasoner-embed", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)
Retrieval Example: ReasonIR on the BRIGHT Benchmark (Biology queries)
This example demonstrates how to evaluate the reasonir/ReasonIR-8B model on the reasoning-intensive BRIGHT benchmark.
from datasets import load_dataset
from rankify.dataset.dataset import Document, Question, Answer
from rankify.retrievers.retriever import Retriever
corpus_path = "bright_biology_corpus.jsonl" # .jsonl corpus for retrieval
docs = load_dataset("xlangai/BRIGHT", "documents", split="biology")
docs.to_json(corpus_path, force_ascii=False)
queries = load_dataset("xlangai/BRIGHT", "examples", split="biology")
documents = []
for item in queries:
doc = Document(id=item["id"],
question=Question(question=item["query"]),
answers=Answer(answers=item.get("gold_ids", [])))
documents.append(doc)
retriever = Retriever(
method="reasonir", # Use ReasonIR retriever
n_docs=3, # Retrieve top 3 documents per query
corpus_path=corpus_path, # Path to the JSONL we just created
text_field="content", # BRIGHT uses 'content' instead of 'text'
batch_size=4,
)
results = retriever.retrieve(documents)
Step 3: Execute and View Results
Running Retrieval
After defining the retriever, you can retrieve documents using:
retrieved_documents = bm25_retriever_wiki.retrieve(documents)
for i, doc in enumerate(retrieved_documents):
print(f"\nDocument {i+1}:")
print(doc)
3๏ธโฃ Running Reranking
Rankify provides support for multiple reranking models. Below are examples of how to use each model.
Example: Reranking a Document
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.models.reranking import Reranking
# Sample document setup
question = Question("When did Thomas Edison invent the light bulb?")
answers = Answer(["1879"])
contexts = [
Context(text="Lightning strike at Seoul National University", id=1),
Context(text="Thomas Edison tried to invent a device for cars but failed", id=2),
Context(text="Coffee is good for diet", id=3),
Context(text="Thomas Edison invented the light bulb in 1879", id=4),
Context(text="Thomas Edison worked with electricity", id=5),
]
document = Document(question=question, answers=answers, contexts=contexts)
# Initialize the reranker
reranker = Reranking(method="monot5", model_name="monot5-base-msmarco")
# Apply reranking
reranker.rank([document])
# Print reordered contexts
for context in document.reorder_contexts:
print(f" - {context.text}")
Examples of Using Different Reranking Models
# UPR
model = Reranking(method='upr', model_name='t5-base')
# API-Based Rerankers
model = Reranking(method='apiranker', model_name='voyage', api_key='your-api-key')
model = Reranking(method='apiranker', model_name='jina', api_key='your-api-key')
model = Reranking(method='apiranker', model_name='mixedbread.ai', api_key='your-api-key')
# Blender Reranker
model = Reranking(method='blender_reranker', model_name='PairRM')
# ColBERT Reranker
model = Reranking(method='colbert_ranker', model_name='Colbert')
# EchoRank
model = Reranking(method='echorank', model_name='flan-t5-large')
# First Ranker
model = Reranking(method='first_ranker', model_name='base')
# FlashRank
model = Reranking(method='flashrank', model_name='ms-marco-TinyBERT-L-2-v2')
# InContext Reranker
Reranking(method='incontext_reranker', model_name='llamav3.1-8b')
# InRanker
model = Reranking(method='inranker', model_name='inranker-small')
# ListT5
model = Reranking(method='listt5', model_name='listt5-base')
# LiT5 Distill
model = Reranking(method='lit5distill', model_name='LiT5-Distill-base')
# LiT5 Score
model = Reranking(method='lit5score', model_name='LiT5-Distill-base')
# LLM Layerwise Ranker
model = Reranking(method='llm_layerwise_ranker', model_name='bge-multilingual-gemma2')
# LLM2Vec
model = Reranking(method='llm2vec', model_name='Meta-Llama-31-8B')
# MonoBERT
model = Reranking(method='monobert', model_name='monobert-large')
# MonoT5
Reranking(method='monot5', model_name='monot5-base-msmarco')
# RankGPT
model = Reranking(method='rankgpt', model_name='llamav3.1-8b')
# RankGPT API
model = Reranking(method='rankgpt-api', model_name='gpt-3.5', api_key="gpt-api-key")
model = Reranking(method='rankgpt-api', model_name='gpt-4', api_key="gpt-api-key")
model = Reranking(method='rankgpt-api', model_name='llamav3.1-8b', api_key="together-api-key")
model = Reranking(method='rankgpt-api', model_name='claude-3-5', api_key="claude-api-key")
# RankT5
model = Reranking(method='rankt5', model_name='rankt5-base')
# Sentence Transformer Reranker
model = Reranking(method='sentence_transformer_reranker', model_name='all-MiniLM-L6-v2')
model = Reranking(method='sentence_transformer_reranker', model_name='gtr-t5-base')
model = Reranking(method='sentence_transformer_reranker', model_name='sentence-t5-base')
model = Reranking(method='sentence_transformer_reranker', model_name='distilbert-multilingual-nli-stsb-quora-ranking')
model = Reranking(method='sentence_transformer_reranker', model_name='msmarco-bert-co-condensor')
# SPLADE
model = Reranking(method='splade', model_name='splade-cocondenser')
# Transformer Ranker
model = Reranking(method='transformer_ranker', model_name='mxbai-rerank-xsmall')
model = Reranking(method='transformer_ranker', model_name='bge-reranker-base')
model = Reranking(method='transformer_ranker', model_name='bce-reranker-base')
model = Reranking(method='transformer_ranker', model_name='jina-reranker-tiny')
model = Reranking(method='transformer_ranker', model_name='gte-multilingual-reranker-base')
model = Reranking(method='transformer_ranker', model_name='nli-deberta-v3-large')
model = Reranking(method='transformer_ranker', model_name='ms-marco-TinyBERT-L-6')
model = Reranking(method='transformer_ranker', model_name='msmarco-MiniLM-L12-en-de-v1')
# TwoLAR
model = Reranking(method='twolar', model_name='twolar-xl')
# Vicuna Reranker
model = Reranking(method='vicuna_reranker', model_name='rank_vicuna_7b_v1')
# Zephyr Reranker
model = Reranking(method='zephyr_reranker', model_name='rank_zephyr_7b_v1_full')
# DuoT5 (pairwise T5-based reranker)
model = Reranking(method='duot5', model_name='duot5-base-msmarco')
# RankLLaMA (LLaMA-based passage reranker)
model = Reranking(method='rankllama', model_name='rankllama-v1-7b-lora-passage')
# DeAR (Decoder-only Autoregressive Reranker)
model = Reranking(method='dear_reranker', model_name='dear-3b-reranker-ce-v1')
# TART (Task-Aware Reranker with Instructions)
model = Reranking(method='tart', model_name='tart-full-flan-t5-xl')
# PRP (Pairwise Ranking Prompting) โ local LLM
model = Reranking(method='prp', model_name='llamav3.1-8b')
# PRP โ API-based LLM
model = Reranking(method='prp-api', model_name='gpt-4', api_key="gpt-api-key")
# RankGemma (Gemma-based listwise reranker)
model = Reranking(method='rankgemma', model_name='gemma-2-2b')
# RankMistral (Mistral-based listwise reranker)
model = Reranking(method='rankmistral', model_name='mistral-7b')
4๏ธโฃ Using Generator Module
Rankify provides a Generator Module for retrieval-augmented generation (RAG), integrating retrieved documents with generative models like OpenAI, LiteLLM, vLLM, and Hugging Face. Its modular design allows easy addition of new RAG methods and endpoints, enabling seamless experimentation with approaches like zero-shot RAG, chain-of-thought RAG, and FiD-based RAG. Below there are examples of how to use different RAG methods and how to include different LLM endpoints.
Please note that in order to use API-based endpoints (OpenAI, LiteLLM), you need to specify an api-key. See how to do this in our example below.
Examples of Using Different RAG methods and backends
# Zero-shot with Huggingface endpoint
generator = Generator(method="zero-shot", model_name='meta-llama/Meta-Llama-3.1-8B-Instruct', backend="huggingface")
# Basic RAG with LiteLLM endpoint
generator = Generator(method="basic-rag", model_name='ollama/mistral', backend="litellm", api_key=api_key)
# Chain-of-Thought RAG with vLLM endpoint
generator = Generator(method="chain-of-thought-rag", model_name='mistralai/Mistral-7B-v0.1', backend="vllm")
# In-context-RALM with OpenAI endpoint
generator = Generator(method="in-context-ralm", model_name='gpt-3.5-turbo', backend="openai", api_keys=[api_key])
Usage example without API-inference
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.generator.generator import Generator
# Define question and answer
question = Question("What is the capital of Austria?")
answers=Answer("")
contexts = [
Context(id=1, title="France", text="The capital of France is Paris.", score=0.9),
Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5)
]
# Construct document
doc = Document(question=question, answers=answers, contexts=contexts)
# Initialize Generator (e.g., Meta Llama)
generator = Generator(method="basic-rag", model_name='meta-llama/Meta-Llama-3.1-8B-Instruct', backend="huggingface")
# Generate answer
generated_answers = generator.generate([doc])
print(generated_answers) # Output: ["Paris"]
Usage example with API-inference
Saving your API-keys in a .env.local file, you can access them via the listed methods:
# in .env.local:
OPENAI_API_KEY=your-api-key
LITELLM_API_KEY=your-api-key
Usage
# load LiteLLM api-key
api_key = get_litellm_api_key()
# load OpenAI api-key
api_key = get_openai_api_key()
Full example using LiteLLM:
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.generator.generator import Generator
from rankify.utils.models.rank_llm.rerank.api_keys import get_litellm_api_key
# Define question and answer
question = Question("What is the capital of France?")
answers = Answer([""])
contexts = [
Context(id=1, title="France", text="The capital of France is Paris.", score=0.9),
Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5)
]
# Construct document
doc = Document(question=question, answers=answers, contexts=contexts)
#load api-key
api_key = get_litellm_api_key()
# Initialize Generator (e.g., Meta Llama)
generator = Generator(method="basic-rag", model_name='ollama/mistral', backend="litellm", api_key=api_key)
# Generate answer
generated_answers = generator.generate([doc])
print(generated_answers) # Output: ["Paris"]
5๏ธโฃ Evaluating with Metrics
Rankify provides built-in evaluation metrics for retrieval, re-ranking, and retrieval-augmented generation (RAG). These metrics help assess the quality of retrieved documents, the effectiveness of ranking models, and the accuracy of generated answers.
Evaluating Generated Answers
You can evaluate the quality of retrieval-augmented generation (RAG) results by comparing generated answers with ground-truth answers.
from rankify.metrics.metrics import Metrics
from rankify.dataset.dataset import Dataset
# Load dataset
dataset = Dataset('bm25', 'nq-test', 100)
documents = dataset.download(force_download=False)
# Initialize Generator
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')
# Generate answers
generated_answers = generator.generate(documents)
# Evaluate generated answers
metrics = Metrics(documents)
print(metrics.calculate_generation_metrics(generated_answers))
Evaluating Retrieval Performance
# Calculate retrieval metrics before reranking
metrics = Metrics(documents)
before_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=False)
print(before_ranking_metrics)
Evaluating Reranked Results
# Calculate retrieval metrics after reranking
after_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=True)
print(after_ranking_metrics)
๐งช BEIR & TREC DL19/DL20 with BM25
Rankify ships convenient hooks to run BM25 baselines on BEIR tasks and TREC DL'19/20, and to evaluate with TREC-style metrics (nDCG, MAP, MRR).
Quick start (single dataset)
from rankify.dataset.dataset import Dataset
from rankify.metrics.metrics import Metrics
# Download pre-retrieved BM25 results (top-k per query)
docs = Dataset('bm25', 'dl19', n_docs=1000).download(force_download=False)
# Evaluate with TREC metrics (nDCG@10/100 by default shown here)
metrics = Metrics(docs)
print(metrics.calculate_trec_metrics(ndcg_cuts=[10, 100], use_reordered=False))
Notes
- Supported names include
dl19,dl20, and BEIR tasks with thebeir-prefix, e.g.:beir-arguana,beir-covid,beir-dbpedia,beir-fever,beir-fiqa,beir-news,beir-nfc,beir-quora,beir-robust04,beir-scidocs,beir-scifact,beir-signal,beir-touche.- If you need explicit qrels selection, pass
qrel=name.replace("beir-", "")tocalculate_trec_metrics.
Batch over BEIR & DL datasets
from rankify.dataset.dataset import Dataset
from rankify.metrics.metrics import Metrics
BEIR_TASKS = [
"beir-arguana", "beir-covid", "beir-dbpedia", "beir-fever", "beir-fiqa", "beir-news",
"beir-nfc", "beir-quora", "beir-robust04", "beir-scidocs", "beir-scifact",
"beir-signal", "beir-touche",
]
for name in ["dl19", "dl20", *BEIR_TASKS]:
docs = Dataset('bm25', name, n_docs=100).download(force_download=False)
m = Metrics(docs)
res = m.calculate_trec_metrics(ndcg_cuts=[10, 100], use_reordered=False)
print(name, res)
(Optional) Add a reranker, then evaluate
from rankify.models.reranking import Reranking
from rankify.dataset.dataset import Dataset
from rankify.metrics.metrics import Metrics
name = "beir-arguana"
docs = Dataset('bm25', name, n_docs=100).download(force_download=False)
reranker = Reranking(method='transformer_ranker', model_name='bge-reranker-base')
reranker.rank(docs)
m = Metrics(docs)
print("Before:", m.calculate_trec_metrics(ndcg_cuts=[10, 100], use_reordered=False))
print("After :", m.calculate_trec_metrics(ndcg_cuts=[10, 100], use_reordered=True))
๐ Evaluating RAG with RAGAS
Rankify ships a thin wrapper around ragas to make quality evaluation of generated answers simple and flexibleโwhether you judge with a local HF model or a hosted API like OpenAI. You can run fast defaults, pick specific metrics, or simulate predictions when compute is tight.
โ Install
# core Rankify RAG deps
pip install bert-score
pip install ragas
pip install langchain_huggingface
pip install rouge-score
import torch
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.generator.generator import Generator
from rankify.metrics.generator_metrics import GeneratorMetrics
from rankify.metrics.ragas_bridge import RagasModels
# 1) Build a tiny document
question = Question("What is the capital of France?")
answers = Answer(["Paris"])
contexts = [
Context(id=1, title="France", text="The capital of France is Paris.", score=0.9),
Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5),
]
doc = Document(question=question, answers=answers, contexts=contexts)
# 2) Generate an answer (or skip and provide your own predictions list)
generator = Generator(method="basic-rag",
model_name="meta-llama/Meta-Llama-3.1-8B-Instruct",
backend="huggingface",
torch_dtype=torch.float16)
predictions = generator.generate([doc])
print("Generated:", predictions)
# 3) Evaluate with RAGAS (HF judge)
gen_metrics = GeneratorMetrics([doc])
ragas_hf = RagasModels(
llm_kind="hf",
llm_name="meta-llama/Meta-Llama-3.1-8B-Instruct",
embeddings_kind="hf",
embeddings_name="sentence-transformers/all-MiniLM-L6-v2",
torch_dtype="float16",
max_new_tokens=256, # shorter outputs = faster + cheaper
timeout=180, # seconds per metric call
max_retries=1,
max_workers=2, # keep small on limited hardware
)
# (A) Fast defaults
scores_fast = gen_metrics.all(predictions, ragas_models=ragas_hf)
print("RAGAS (fast):", scores_fast)
# (B) Pick specific metrics
scores_specific = gen_metrics.ragas_generator(
predictions,
judge=ragas_hf,
metrics=["faithfulness", "response_relevancy", "context_precision", "context_recall"],
)
print("RAGAS (specific):", scores_specific)
# (C) OpenAI judge (much faster if you have an API key)
ragas_openai = RagasModels(llm_kind="openai", llm_name="gpt-4o-mini", timeout=30)
scores_openai = gen_metrics.all(predictions, ragas_models=ragas_openai)
print("RAGAS (OpenAI):", {k: v for k, v in scores_openai.items() if k.startswith("ragas_")})
๐ Supported Models
1๏ธโฃ Index
- โ Wikipedia
- โ MS-MARCO
- ๐ Online Search
1๏ธโฃ Retrievers
- โ BM25
- โ DPR
- โ ColBERT
- โ ANCE
- โ BGE
- โ Contriever
- โ BPR
- โ HYDE
- โ SFR
- โ E5
- โ GritLM
- โ M2
- โ Nomic
- โ Instructor
- โ RaDeR
- โ ReasonIR
- โ BGE-Reasoner
- โ ReasonEmbed
- โ DiverRetriever
- ๐ RepLlama
- ๐ coCondenser
- ๐ Spar
- ๐ Dragon
- ๐ Hybrid
- โ TAS-B
- โ UniCOIL
- โ SPLADE-v2
- โ OpenAI Embedding Retriever
- โ Cohere Embedding Retriever
- โ Voyage AI Retriever
2๏ธโฃ Rerankers
- โ Cross-Encoders
- โ RankGPT
- โ RankGPT-API
- โ MonoT5
- โ MonoBert
- โ RankT5
- โ ListT5
- โ LiT5Score
- โ LiT5Dist
- โ Vicuna Reranker
- โ Zephyr Reranker
- โ Sentence Transformer-based
- โ FlashRank Models
- โ API-Based Rerankers
- โ ColBERT Reranker
- โ LLM Layerwise Ranker
- โ Splade Reranker
- โ UPR Reranker
- โ Inranker Reranker
- โ Transformer Reranker
- โ FIRST Reranker
- โ Blender Reranker
- โ LLM2VEC Reranker
- โ ECHO Reranker
- โ Incontext Reranker
- โ DuoT5
- โ RankLLaMA
- โ DeAR
- ๐ DynRank
- ๐ ASRank
- โ PRP (Pairwise Ranking Prompting)
- โ RankMistral
- โ RankGemma
- ๐ SetRank
- ๐ Cohere Rerank API
- โ TART
- ๐ PolyEncoder
3๏ธโฃ Generator
RAG-Methods
- โ Zero-shot
- โ Basic-RAG
- โ Chain-of-Thought-RAG
- โ Fusion-in-Decoder (FiD) with T5
- โ In-Context Learning RALM
- ๐ Self-Consistency RAG
- ๐ Retrieval Chain-of-Thought
LLM-Endpoints
- โ Hugging Face
- โ vLLM
- โ LiteLLM
- โ OpenAI
โจ Features
- ๐ฅ Unified Framework: Combines retrieval, re-ranking, and retrieval-augmented generation (RAG) into a single modular toolkit.
- ๐ Rich Dataset Support: Includes 40+ benchmark datasets with pre-retrieved documents for seamless experimentation.
- ๐งฒ Diverse Retrieval Methods: Supports BM25, DPR, ANCE, BPR, ColBERT, BGE, Contriever, SFR, E5, GritLM, M2, Nomic, Instructor, RaDeR, ReasonIR, BGE-Reasoner and ReasonEmbed for flexible retrieval strategies.
- ๐ฏ Powerful Re-Ranking: Implements 28 advanced models with 44 sub-methods to optimize ranking performance.
- ๐๏ธ Prebuilt Indices: Provides Wikipedia and MS MARCO corpora, eliminating indexing overhead and speeding up retrieval.
- ๐ฎ Seamless RAG Integration: Works with backends like Hugging Face, OpenAI, vLLM, LiteLLM inferening models like GPT, LLAMA, T5, and Fusion-in-Decoder (FiD) for multiple retrieval-augmented generation methods.
- ๐ Extensible & Modular: Easily integrates custom datasets, retrievers, ranking models, and RAG pipelines.
- ๐ Built-in Evaluation Suite: Includes retrieval, ranking, and RAG metrics for robust benchmarking.
- ๐ User-Friendly Documentation: Access detailed ๐ online docs, example notebooks, and tutorials for easy adoption.
๐ Roadmap
Rankify is still under development, and this is our first release (v0.1.0). While it already supports a wide range of retrieval, re-ranking, and RAG techniques, we are actively enhancing its capabilities by adding more retrievers, rankers, datasets, and features.
๐ Documentation
For full API documentation, visit the Rankify Docs.
๐ก Contributing
Follow these steps to get involved:
-
Fork this repository to your GitHub account.
-
Create a new branch for your feature or fix:
git checkout -b feature/YourFeatureName -
Make your changes and commit them:
git commit -m "Add YourFeatureName" -
Push the changes to your branch:
git push origin feature/YourFeatureName -
Submit a Pull Request to propose your changes.
Thank you for helping make this project better!
๐ Community Contributions
Chinese community resources available!
Special thanks to Xiumao for writing two exceptional Chinese blog posts about Rankify:
These articles were crafted with high-traffic optimization in mind and are widely recommended in Chinese academic and developer circles.
We updated the ไธญๆ็ๆฌ to reflect these blog contributions while keeping original content intactโthank you Xiumao for your continued support!
:bookmark: License
Rankify is licensed under the Apache-2.0 License - see the LICENSE file for details.
๐ Acknowledgments
We would like to express our gratitude to the following libraries, which have greatly contributed to the development of Rankify:
-
Diver โ For the reference implementation of the dense retriever routing and caching logic used to integrate various bi-encoders.
๐ GitHub Repository -
Rerankers โ A powerful Python library for integrating various reranking methods.
๐ GitHub Repository -
Pyserini โ A toolkit for supporting BM25-based retrieval and integration with sparse/dense retrievers.
๐ GitHub Repository -
FlashRAG โ A modular framework for Retrieval-Augmented Generation (RAG) research.
๐ GitHub Repository
:star2: Citation
Please kindly cite our paper if helps your research:
@article{abdallah2025rankify,
title={Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation},
author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Ali, Mohammed and Jatowt, Adam},
journal={arXiv preprint arXiv:2502.02464},
year={2025}
}