MS MARCO

April 23, 2026 · View on GitHub

[ English | 中文]

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥

If you like our Framework, don't hesitate to ⭐ star this repository ⭐. This helps us to make the Framework more better and scalable to different models and methods 🤗.

A modular and efficient retrieval, reranking and RAG framework designed to work with state-of-the-art models for retrieval, ranking and rag tasks.

🚀 Demo

To run the demo locally:

# Make sure Rankify is installed
pip install streamlit

# Then run the demo
streamlit run demo.py

https://github.com/user-attachments/assets/13184943-55db-4f0c-b509-fde920b809bc

Features
Roadmap
Installation
Quick Start
Indexing
Retrievers
Re-Rankers
Generators
Evaluation
Documentation
Community Contributing
Contributing
License
Acknowledgments
Citation

🎉News

[2026-02-16] Huge thanks to @JamieHoldcroft for integrating 15+ new dense retrievers, including SOTA LLM-based bi-encoders (SFR, E5, GritLM) and reasoning-augmented models (RaDeR, ReasonIR, ReasonEmbed, BGE-Reasoner).
[2025-10-14] Updated installation with optional extras: retriever, reranking, rag, and all.
[2025-10-14] New CLI (rankify-index) syntax & examples for BM25, DPR, ANCE, Contriever, ColBERT, BGE.
[2025-06-11] Many thanks to @tobias124 for implementing Indexing for Custom Dataset.
[2025-06-01] Many thanks to @aherzinger for implementing and refactoring the Generator and RAG models.
[2025-05-30] Huge thanks to @baraayusry for implementing the Online Retriever using CrawAI and ReACT.
[2025-02-10] Released reranking-datasets and reranking-datasets-light on Hugging Face.
[2025-02-04] Our paper is released on arXiv.

🔧 Installation

Set up the virtual environment

First, create and activate a conda environment with Python 3.10:

conda create -n rankify python=3.10
conda activate rankify

Install PyTorch 2.5.1

we recommend installing Rankify with PyTorch 2.5.1 for Rankify. Refer to the PyTorch installation page for platform-specific installation commands.

If you have access to GPUs, it's recommended to install the CUDA version 12.4 or 12.6 of PyTorch, as many of the evaluation metrics are optimized for GPU use.

To install Pytorch 2.5.1 you can install it from the following cmd

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Basic Installation

To install Rankify, simply use pip (requires Python 3.10+):

pip install rankify

Recommended Installation

For full functionality, we recommend installing Rankify with all dependencies:

pip install "rankify[all]"

This ensures you have all necessary modules, including retrieval, re-ranking, and RAG support.

Optional Dependencies

If you prefer to install only specific components, choose from the following:

# Retrieval stack (BM25, dense retrievers, web tools)
pip install "rankify[retriever]"

# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.
pip install "rankify[reranking]"

# RAG endpoints (OpenAI, LiteLLM, vLLM clients)
pip install "rankify[rag]"

Or, to install from GitHub for the latest development version:

git clone https://github.com/DataScienceUIBK/rankify.git
cd rankify
pip install -e .
# For full functionality we recommend installing Rankify with all dependencies:
pip install -e ".[all]"
# Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install -e ".[retriever]"
# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.
pip install -e ".[reranking]"
# RAG endpoints (OpenAI, LiteLLM, vLLM clients)
pip install -e ".[rag]"

Using ColBERT Retriever

If you want to use ColBERT Retriever, follow these additional setup steps:

# Install GCC and required libraries
conda install -c conda-forge gcc=9.4.0 gxx=9.4.0
conda install -c conda-forge libstdcxx-ng

# Export necessary environment variables
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
export CC=gcc
export CXX=g++
export PATH=$CONDA_PREFIX/bin:$PATH

# Clear cached torch extensions
rm -rf ~/.cache/torch_extensions/*

:rocket: Quick Start

🚀 One-Line Pipeline API (Recommended)

The simplest way to use Rankify - HuggingFace-style one-line interface:

from rankify import pipeline

# Create a RAG pipeline with intelligent defaults
rag = pipeline("rag")
answers = rag("What is machine learning?", documents)

# Or customize your configuration
rag = pipeline(
    "rag",
    retriever="bge",           # State-of-the-art dense retriever
    reranker="flashrank",      # Ultra-fast reranker
    generator="basic-rag"
)

Available Pipeline Types:

pipeline("search") - Document retrieval only
pipeline("rerank") - Retrieve + rerank
pipeline("rag") - Full RAG pipeline (retrieve + rerank + generate)

📖 Pipeline API Documentation

🤖 RankifyAgent - AI-Powered Model Selection

Let AI help you choose the best models for your use case:

from rankify.agent import RankifyAgent, recommend

# Quick recommendation
result = recommend(task="qa", gpu=True)
print(f"Best Retriever: {result.retriever.name}")
print(f"Best Reranker: {result.reranker.name}")

# Conversational agent
agent = RankifyAgent(backend="azure")  # or "openai", "litellm", "local"
response = agent.chat("I need a fast search system for production")
print(response.message)
print(response.code_snippet)  # Ready-to-use code

📖 RankifyAgent Documentation

🌐 Rankify Server - Deploy as REST API

Start a production-ready server in one command:

# CLI
rankify serve --port 8000 --retriever bge --reranker flashrank

# Or in Python
from rankify.server import RankifyServer
server = RankifyServer(retriever="bge", reranker="flashrank")
server.start(port=8000)

API Endpoints:

POST /retrieve - Document retrieval
POST /rerank - Rerank documents
POST /rag - Full RAG generation
GET /health - Health check

# Example API call
curl -X POST http://localhost:8000/rag \
  -H "Content-Type: application/json" \
  -d '{"query": "What is AI?", "n_contexts": 5}'

📖 Server Documentation

🔌 Integrations - Use with Your Stack

Seamlessly integrate with LangChain, LlamaIndex, and more:

# LangChain
from rankify.integrations import LangChainRetriever
from langchain.chains import RetrievalQA

retriever = LangChainRetriever(method="bge", reranker="flashrank")
chain = RetrievalQA.from_chain_type(llm=your_llm, retriever=retriever)

# LlamaIndex
from rankify.integrations import LlamaIndexRetriever
retriever = LlamaIndexRetriever(method="colbert", reranker="monot5")

📖 Integrations Documentation

🎨 Web Playground - Interactive UI

Launch an interactive Gradio interface:

from rankify.ui import launch_playground
launch_playground(port=7860)

Try models, compare results, and export code - all in your browser!

1️⃣ Traditional Workflow (For Advanced Users)

Pre-retrieved Datasets

We provide 40+ benchmark datasets with 1,000 pre-retrieved documents each:

🔗 Hugging Face Dataset Repository

Dataset Format

[
    {
        "question": "...",
        "answers": ["...", "...", ...],
        "ctxs": [
            {
                "id": "...",         // Passage ID
                "score": "...",      // Retriever score
                "has_answer": true|false
            }
        ]
    }
]

List Available Datasets

from rankify.dataset.dataset import Dataset 
Dataset.available_dataset()  # Fixed typo: avaiable -> available

Download Datasets

from rankify.dataset.dataset import Dataset

# Download BM25-retrieved documents
dataset = Dataset(retriever="bm25", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)

# Load from file
documents = Dataset.load_dataset('./path/to/dataset.json', n_docs=100)

🧱 Indexing via CLI

The CLI entrypoint is rankify-index with a subcommand index.

Common flags

corpus_path (positional): path to JSONL corpus.
--retriever {bm25,dpr,ance,contriever,colbert,bge}.
--output PATH (default: rankify_indices).
--index_type {wiki,msmarco} (default: wiki).
--threads INT (default: 32, sparse & some dense prep).
--device {cpu,cuda} (default: retriever‑specific, typically cuda).
--batch_size INT (dense encoders / Faiss add batches).
--encoder MODEL (dense encoders only; sensible defaults used if omitted).

Index layout

BM25 → <output>/<stem>/bm25_index

DPR → <output>/<stem>/dpr_index_<index_type>

ANCE → <output>/<stem>/ance_index_<index_type>

BGE → <output>/<stem>/bge_index_<index_type>

Contriever → <output>/<stem>/contriever_index_<index_type>

ColBERT → <output>/<stem>/colbert_index_<index_type>

BM25

rankify-index index data/wikipedia_10k.jsonl \
  --retriever bm25 \
  --output ./indices

DPR (single‑encoder by default)

# Wikipedia style
rankify-index index data/wikipedia_100.jsonl \
  --retriever dpr \
  --encoder facebook/dpr-ctx_encoder-single-nq-base \
  --batch_size 16 --device cuda \
  --output ./indices

# MS MARCO
rankify-index index data/msmarco_100.jsonl \
  --retriever dpr --index_type msmarco \
  --encoder facebook/dpr-ctx_encoder-single-nq-base \
  --batch_size 16 --device cuda \
  --output ./indices

ANCE

rankify-index index data/wikipedia_100.jsonl \
  --retriever ance \
  --encoder castorini/ance-dpr-context-multi \
  --batch_size 16 --device cuda \
  --output ./indices

Contriever

rankify-index index data/wikipedia_100.jsonl \
  --retriever contriever \
  --encoder facebook/contriever-msmarco \
  --batch_size 16 --device cuda \
  --output ./indices

ColBERT

rankify-index index data/wikipedia_100.jsonl \
  --retriever colbert \
  --batch_size 32 --device cuda \
  --output ./indices

BGE

rankify-index index data/wikipedia_100.jsonl \
  --retriever bge \
  --encoder BAAI/bge-large-en-v1.5 \
  --batch_size 16 --device cuda \
  --output ./indices

To perform retrieval using Rankify, you can choose from various retrieval methods such as BM25, DPR, ANCE, Contriever, ColBERT, BGE, Sbert, Nomic, Instructor, DiverRetriever, SRF, E5, RaDeR, M2, GritLM, ReasonEmbed, ReasonIR and BGEReasoner.

Step 1: Setup example queries

Example: Running Retrieval on Sample Queries

from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.retrievers.retriever import Retriever

# Sample Documents
documents = [
    Document(question=Question("the cast of a good day to die hard?"), answers=Answer([
            "Jai Courtney",
            "Sebastian Koch",
            "Radivoje Bukvić",
            "Yuliya Snigir",
            "Sergei Kolesnikov",
            "Mary Elizabeth Winstead",
            "Bruce Willis"
        ]), contexts=[]),
    Document(question=Question("Who wrote Hamlet?"), answers=Answer(["Shakespeare"]), contexts=[])
]

Step 2: Choose Retrieval Option

Option A: Retrieval index_type (e.g., "wiki", "msmarco") to load pre-computed FAISS indices.

# BM25 retrieval on Wikipedia
bm25_retriever_wiki = Retriever(method="bm25", n_docs=5, index_type="wiki")

# BM25 retrieval on MS MARCO
bm25_retriever_msmarco = Retriever(method="bm25", n_docs=5, index_type="msmarco")


# DPR (multi-encoder) retrieval on Wikipedia
dpr_retriever_wiki = Retriever(method="dpr", model="dpr-multi", n_docs=5, index_type="wiki")

# DPR (multi-encoder) retrieval on MS MARCO
dpr_retriever_msmarco = Retriever(method="dpr", model="dpr-multi", n_docs=5, index_type="msmarco")


# DPR (single-encoder) retrieval on Wikipedia
dpr_retriever_wiki = Retriever(method="dpr", model="dpr-single", n_docs=5, index_type="wiki")

# DPR (single-encoder) retrieval on MS MARCO
dpr_retriever_msmarco = Retriever(method="dpr", model="dpr-single", n_docs=5, index_type="msmarco")


# ANCE retrieval on Wikipedia
ance_retriever_wiki = Retriever(method="ance", model="ance-multi", n_docs=5, index_type="wiki")

# ANCE retrieval on MS MARCO
ance_retriever_msmarco = Retriever(method="ance", model="ance-multi", n_docs=5, index_type="msmarco")


# Contriever retrieval on Wikipedia
contriever_retriever_wiki = Retriever(method="contriever", model="facebook/contriever-msmarco", n_docs=5, index_type="wiki")

# Contriever retrieval on MS MARCO
contriever_retriever_msmarco = Retriever(method="contriever", model="facebook/contriever-msmarco", n_docs=5, index_type="msmarco")


# ColBERT retrieval on Wikipedia
colbert_retriever_wiki = Retriever(method="colbert", model="colbert-ir/colbertv2.0", n_docs=5, index_type="wiki")

# ColBERT retrieval on MS MARCO
colbert_retriever_msmarco = Retriever(method="colbert", model="colbert-ir/colbertv2.0", n_docs=5, index_type="msmarco")


# BGE retrieval on Wikipedia
bge_retriever_wiki = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", n_docs=5, index_type="wiki")

# BGE retrieval on MS MARCO
bge_retriever_msmarco = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", n_docs=5, index_type="msmarco")


# Hyde retrieval on Wikipedia
hyde_retriever_wiki = Retriever(method="hyde" , n_docs=5, index_type="wiki", api_key=OPENAI_API_KEY )

# Hyde retrieval on MS MARCO
hyde_retriever_msmarco = Retriever(method="hyde", n_docs=5, index_type="msmarco", api_key=OPENAI_API_KEY)

Option B: Retrieval with custom datasets and automated caching.

Featuring some of the latest 7B+ parameter models, all of the models below are purposed only for usage with custom datasets.

Simply pass a .jsonl file to corpus_path, ensuring your data maps to the required id: and text: fields, and the model will embed and cache the data locally on the first run.

# Bi-encoders as implemented in the diver framework (11 configurable models, specified by model_id)
bge_large_retriever = Retriever(method="diver-dense", model_id="bge", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

sbert_retriever = Retriever(method="diver-dense", model_id="sbert", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

inst_l_retriever = Retriever(method="diver-dense", model_id="inst-l", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

inst_xl_retriever = Retriever(method="diver-dense", model_id="inst-xl", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

sfr_retriever = Retriever(method="diver-dense", model_id="sf", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

e5_retriever = Retriever(method="diver-dense", model_id="e5", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

contriever_retriever = Retriever(method="diver-dense", model_id="contriever", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

m2_retriever = Retriever(method="diver-dense", model_id="m2", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

grit_retriever = Retriever(method="diver-dense", model_id="grit", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

rader_retriever = Retriever(method="diver-dense", model_id="rader", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

nomic_retriever = Retriever(method="diver-dense", model_id="nomic", corpus_path="data/my_corpus.jsonl", encode_batch_size=4, n_docs=5)

diver_retriever = Retriever(method="diver-dense", model_id="diver", corpus_path="data/my_corpus.jsonl", encode_batch_size=4, n_docs=5)


# Reasonir retrieval 
reasonir_retriever = Retriever(method="reasonir", corpus_path="data/my_corpus.jsonl", encode_batch_size=4, n_docs=5)


# ReasonEmbed retrieval (3 configurable models specified by model_id)
reasonembed_qwen8b_retriever = Retriever(method="reason-embed", model_id="qwen3-8b", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

reasonembed_llama8b_retriever = Retriever(method="reason-embed", model_id="qwen3-4b", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

reasonembed_qwen4b_retriever = Retriever(method="reason-embed", model_id="llama-8b", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)


# BgeReasonEmbed retrieval
bge_reasoner_retriever = Retriever(method="bge-reasoner-embed", corpus_path="data/my_corpus.jsonl", encode_batch_size=8, n_docs=5)

Retrieval Example: ReasonIR on the BRIGHT Benchmark (Biology queries)

This example demonstrates how to evaluate the reasonir/ReasonIR-8B model on the reasoning-intensive BRIGHT benchmark.

from datasets import load_dataset
from rankify.dataset.dataset import Document, Question, Answer
from rankify.retrievers.retriever import Retriever

corpus_path = "bright_biology_corpus.jsonl"       # .jsonl corpus for retrieval

docs = load_dataset("xlangai/BRIGHT", "documents", split="biology")
docs.to_json(corpus_path, force_ascii=False)  

queries = load_dataset("xlangai/BRIGHT", "examples", split="biology")
    
documents = []
for item in queries:
    doc = Document(id=item["id"], 
                   question=Question(question=item["query"]), 
                   answers=Answer(answers=item.get("gold_ids", [])))
    documents.append(doc)
    
retriever = Retriever(
    method="reasonir",            # Use ReasonIR retriever
    n_docs=3,                     # Retrieve top 3 documents per query
    corpus_path=corpus_path,      # Path to the JSONL we just created
    text_field="content",         # BRIGHT uses 'content' instead of 'text'
    batch_size=4,
)

results = retriever.retrieve(documents)

Step 3: Execute and View Results

Running Retrieval

After defining the retriever, you can retrieve documents using:

retrieved_documents = bm25_retriever_wiki.retrieve(documents)

for i, doc in enumerate(retrieved_documents):
    print(f"\nDocument {i+1}:")
    print(doc)

3️⃣ Running Reranking

Rankify provides support for multiple reranking models. Below are examples of how to use each model.

Example: Reranking a Document

from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.models.reranking import Reranking

# Sample document setup
question = Question("When did Thomas Edison invent the light bulb?")
answers = Answer(["1879"])
contexts = [
    Context(text="Lightning strike at Seoul National University", id=1),
    Context(text="Thomas Edison tried to invent a device for cars but failed", id=2),
    Context(text="Coffee is good for diet", id=3),
    Context(text="Thomas Edison invented the light bulb in 1879", id=4),
    Context(text="Thomas Edison worked with electricity", id=5),
]
document = Document(question=question, answers=answers, contexts=contexts)

# Initialize the reranker
reranker = Reranking(method="monot5", model_name="monot5-base-msmarco")

# Apply reranking
reranker.rank([document])

# Print reordered contexts
for context in document.reorder_contexts:
    print(f"  - {context.text}")

Examples of Using Different Reranking Models

# UPR
model = Reranking(method='upr', model_name='t5-base')

# API-Based Rerankers
model = Reranking(method='apiranker', model_name='voyage', api_key='your-api-key')
model = Reranking(method='apiranker', model_name='jina', api_key='your-api-key')
model = Reranking(method='apiranker', model_name='mixedbread.ai', api_key='your-api-key')

# Blender Reranker
model = Reranking(method='blender_reranker', model_name='PairRM')

# ColBERT Reranker
model = Reranking(method='colbert_ranker', model_name='Colbert')

# EchoRank
model = Reranking(method='echorank', model_name='flan-t5-large')

# First Ranker
model = Reranking(method='first_ranker', model_name='base')

# FlashRank
model = Reranking(method='flashrank', model_name='ms-marco-TinyBERT-L-2-v2')

# InContext Reranker
Reranking(method='incontext_reranker', model_name='llamav3.1-8b')

# InRanker
model = Reranking(method='inranker', model_name='inranker-small')

# ListT5
model = Reranking(method='listt5', model_name='listt5-base')

# LiT5 Distill
model = Reranking(method='lit5distill', model_name='LiT5-Distill-base')

# LiT5 Score
model = Reranking(method='lit5score', model_name='LiT5-Distill-base')

# LLM Layerwise Ranker
model = Reranking(method='llm_layerwise_ranker', model_name='bge-multilingual-gemma2')

# LLM2Vec
model = Reranking(method='llm2vec', model_name='Meta-Llama-31-8B')

# MonoBERT
model = Reranking(method='monobert', model_name='monobert-large')

# MonoT5
Reranking(method='monot5', model_name='monot5-base-msmarco')

# RankGPT
model = Reranking(method='rankgpt', model_name='llamav3.1-8b')

# RankGPT API
model = Reranking(method='rankgpt-api', model_name='gpt-3.5', api_key="gpt-api-key")
model = Reranking(method='rankgpt-api', model_name='gpt-4', api_key="gpt-api-key")
model = Reranking(method='rankgpt-api', model_name='llamav3.1-8b', api_key="together-api-key")
model = Reranking(method='rankgpt-api', model_name='claude-3-5', api_key="claude-api-key")

# RankT5
model = Reranking(method='rankt5', model_name='rankt5-base')

# Sentence Transformer Reranker
model = Reranking(method='sentence_transformer_reranker', model_name='all-MiniLM-L6-v2')
model = Reranking(method='sentence_transformer_reranker', model_name='gtr-t5-base')
model = Reranking(method='sentence_transformer_reranker', model_name='sentence-t5-base')
model = Reranking(method='sentence_transformer_reranker', model_name='distilbert-multilingual-nli-stsb-quora-ranking')
model = Reranking(method='sentence_transformer_reranker', model_name='msmarco-bert-co-condensor')

# SPLADE
model = Reranking(method='splade', model_name='splade-cocondenser')

# Transformer Ranker
model = Reranking(method='transformer_ranker', model_name='mxbai-rerank-xsmall')
model = Reranking(method='transformer_ranker', model_name='bge-reranker-base')
model = Reranking(method='transformer_ranker', model_name='bce-reranker-base')
model = Reranking(method='transformer_ranker', model_name='jina-reranker-tiny')
model = Reranking(method='transformer_ranker', model_name='gte-multilingual-reranker-base')
model = Reranking(method='transformer_ranker', model_name='nli-deberta-v3-large')
model = Reranking(method='transformer_ranker', model_name='ms-marco-TinyBERT-L-6')
model = Reranking(method='transformer_ranker', model_name='msmarco-MiniLM-L12-en-de-v1')

# TwoLAR
model = Reranking(method='twolar', model_name='twolar-xl')

# Vicuna Reranker
model = Reranking(method='vicuna_reranker', model_name='rank_vicuna_7b_v1')

# Zephyr Reranker
model = Reranking(method='zephyr_reranker', model_name='rank_zephyr_7b_v1_full')

# DuoT5 (pairwise T5-based reranker)
model = Reranking(method='duot5', model_name='duot5-base-msmarco')

# RankLLaMA (LLaMA-based passage reranker)
model = Reranking(method='rankllama', model_name='rankllama-v1-7b-lora-passage')

# DeAR (Decoder-only Autoregressive Reranker)
model = Reranking(method='dear_reranker', model_name='dear-3b-reranker-ce-v1')

# TART (Task-Aware Reranker with Instructions)
model = Reranking(method='tart', model_name='tart-full-flan-t5-xl')

# PRP (Pairwise Ranking Prompting) — local LLM
model = Reranking(method='prp', model_name='llamav3.1-8b')

# PRP — API-based LLM
model = Reranking(method='prp-api', model_name='gpt-4', api_key="gpt-api-key")

# RankGemma (Gemma-based listwise reranker)
model = Reranking(method='rankgemma', model_name='gemma-2-2b')

# RankMistral (Mistral-based listwise reranker)
model = Reranking(method='rankmistral', model_name='mistral-7b')

4️⃣ Using Generator Module

Rankify provides a Generator Module for retrieval-augmented generation (RAG), integrating retrieved documents with generative models like OpenAI, LiteLLM, vLLM, and Hugging Face. Its modular design allows easy addition of new RAG methods and endpoints, enabling seamless experimentation with approaches like zero-shot RAG, chain-of-thought RAG, and FiD-based RAG. Below there are examples of how to use different RAG methods and how to include different LLM endpoints.

Please note that in order to use API-based endpoints (OpenAI, LiteLLM), you need to specify an api-key. See how to do this in our example below.

Examples of Using Different RAG methods and backends

# Zero-shot with Huggingface endpoint
generator = Generator(method="zero-shot", model_name='meta-llama/Meta-Llama-3.1-8B-Instruct', backend="huggingface")

# Basic RAG with LiteLLM endpoint
generator = Generator(method="basic-rag", model_name='ollama/mistral', backend="litellm", api_key=api_key)

# Chain-of-Thought RAG with vLLM endpoint
generator = Generator(method="chain-of-thought-rag", model_name='mistralai/Mistral-7B-v0.1', backend="vllm")

# In-context-RALM with OpenAI endpoint
generator = Generator(method="in-context-ralm", model_name='gpt-3.5-turbo', backend="openai", api_keys=[api_key])

Usage example without API-inference

from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.generator.generator import Generator

# Define question and answer
question = Question("What is the capital of Austria?")
answers=Answer("")
contexts = [
    Context(id=1, title="France", text="The capital of France is Paris.", score=0.9),
    Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5)
]

# Construct document
doc = Document(question=question, answers=answers, contexts=contexts)

# Initialize Generator (e.g., Meta Llama)
generator = Generator(method="basic-rag", model_name='meta-llama/Meta-Llama-3.1-8B-Instruct', backend="huggingface")

# Generate answer
generated_answers = generator.generate([doc])
print(generated_answers)  # Output: ["Paris"]

Usage example with API-inference

Saving your API-keys in a .env.local file, you can access them via the listed methods:

# in .env.local:
OPENAI_API_KEY=your-api-key
LITELLM_API_KEY=your-api-key

Usage

# load LiteLLM api-key
api_key = get_litellm_api_key()
# load OpenAI api-key
api_key = get_openai_api_key()

Full example using LiteLLM:

from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.generator.generator import Generator
from rankify.utils.models.rank_llm.rerank.api_keys import get_litellm_api_key

# Define question and answer
question = Question("What is the capital of France?")
answers = Answer([""])
contexts = [
    Context(id=1, title="France", text="The capital of France is Paris.", score=0.9),
    Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5)
]

# Construct document
doc = Document(question=question, answers=answers, contexts=contexts)

#load api-key
api_key = get_litellm_api_key()

# Initialize Generator (e.g., Meta Llama)
generator = Generator(method="basic-rag", model_name='ollama/mistral', backend="litellm", api_key=api_key)

# Generate answer
generated_answers = generator.generate([doc])
print(generated_answers)  # Output: ["Paris"]

5️⃣ Evaluating with Metrics

Rankify provides built-in evaluation metrics for retrieval, re-ranking, and retrieval-augmented generation (RAG). These metrics help assess the quality of retrieved documents, the effectiveness of ranking models, and the accuracy of generated answers.

Evaluating Generated Answers

You can evaluate the quality of retrieval-augmented generation (RAG) results by comparing generated answers with ground-truth answers.

from rankify.metrics.metrics import Metrics
from rankify.dataset.dataset import Dataset

# Load dataset
dataset = Dataset('bm25', 'nq-test', 100)
documents = dataset.download(force_download=False)

# Initialize Generator
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')

# Generate answers
generated_answers = generator.generate(documents)

# Evaluate generated answers
metrics = Metrics(documents)
print(metrics.calculate_generation_metrics(generated_answers))

Evaluating Retrieval Performance

# Calculate retrieval metrics before reranking
metrics = Metrics(documents)
before_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=False)

print(before_ranking_metrics)

Evaluating Reranked Results

# Calculate retrieval metrics after reranking
after_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=True)
print(after_ranking_metrics)

🧪 BEIR & TREC DL19/DL20 with BM25

Rankify ships convenient hooks to run BM25 baselines on BEIR tasks and TREC DL'19/20, and to evaluate with TREC-style metrics (nDCG, MAP, MRR).

Quick start (single dataset)

from rankify.dataset.dataset import Dataset
from rankify.metrics.metrics import Metrics

# Download pre-retrieved BM25 results (top-k per query)
docs = Dataset('bm25', 'dl19', n_docs=1000).download(force_download=False)

# Evaluate with TREC metrics (nDCG@10/100 by default shown here)
metrics = Metrics(docs)
print(metrics.calculate_trec_metrics(ndcg_cuts=[10, 100], use_reordered=False))

Notes

Supported names include dl19, dl20, and BEIR tasks with the beir- prefix, e.g.: beir-arguana, beir-covid, beir-dbpedia, beir-fever, beir-fiqa, beir-news, beir-nfc, beir-quora, beir-robust04, beir-scidocs, beir-scifact, beir-signal, beir-touche.

If you need explicit qrels selection, pass qrel=name.replace("beir-", "") to calculate_trec_metrics.

Batch over BEIR & DL datasets

from rankify.dataset.dataset import Dataset
from rankify.metrics.metrics import Metrics

BEIR_TASKS = [
    "beir-arguana", "beir-covid", "beir-dbpedia", "beir-fever", "beir-fiqa", "beir-news",
    "beir-nfc", "beir-quora", "beir-robust04", "beir-scidocs", "beir-scifact",
    "beir-signal", "beir-touche",
]

for name in ["dl19", "dl20", *BEIR_TASKS]:
    docs = Dataset('bm25', name, n_docs=100).download(force_download=False)
    m = Metrics(docs)
    res = m.calculate_trec_metrics(ndcg_cuts=[10, 100], use_reordered=False)
    print(name, res)

(Optional) Add a reranker, then evaluate

from rankify.models.reranking import Reranking
from rankify.dataset.dataset import Dataset
from rankify.metrics.metrics import Metrics

name = "beir-arguana"
docs = Dataset('bm25', name, n_docs=100).download(force_download=False)
reranker = Reranking(method='transformer_ranker', model_name='bge-reranker-base')
reranker.rank(docs)

m = Metrics(docs)
print("Before:", m.calculate_trec_metrics(ndcg_cuts=[10, 100], use_reordered=False))
print("After :", m.calculate_trec_metrics(ndcg_cuts=[10, 100], use_reordered=True))

📏 Evaluating RAG with RAGAS

Rankify ships a thin wrapper around ragas to make quality evaluation of generated answers simple and flexible—whether you judge with a local HF model or a hosted API like OpenAI. You can run fast defaults, pick specific metrics, or simulate predictions when compute is tight.

✅ Install

# core Rankify RAG deps
pip install bert-score
pip install ragas
pip install langchain_huggingface
pip install rouge-score

import torch
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.generator.generator import Generator
from rankify.metrics.generator_metrics import GeneratorMetrics
from rankify.metrics.ragas_bridge import RagasModels

# 1) Build a tiny document
question = Question("What is the capital of France?")
answers  = Answer(["Paris"])
contexts = [
    Context(id=1, title="France",   text="The capital of France is Paris.", score=0.9),
    Context(id=2, title="Germany",  text="Berlin is the capital of Germany.", score=0.5),
]
doc = Document(question=question, answers=answers, contexts=contexts)

# 2) Generate an answer (or skip and provide your own predictions list)
generator   = Generator(method="basic-rag",
                        model_name="meta-llama/Meta-Llama-3.1-8B-Instruct",
                        backend="huggingface",
                        torch_dtype=torch.float16)
predictions = generator.generate([doc])
print("Generated:", predictions)

# 3) Evaluate with RAGAS (HF judge)
gen_metrics = GeneratorMetrics([doc])

ragas_hf = RagasModels(
    llm_kind="hf",
    llm_name="meta-llama/Meta-Llama-3.1-8B-Instruct",
    embeddings_kind="hf",
    embeddings_name="sentence-transformers/all-MiniLM-L6-v2",
    torch_dtype="float16",
    max_new_tokens=256,  # shorter outputs = faster + cheaper
    timeout=180,         # seconds per metric call
    max_retries=1,
    max_workers=2,       # keep small on limited hardware
)

# (A) Fast defaults
scores_fast = gen_metrics.all(predictions, ragas_models=ragas_hf)
print("RAGAS (fast):", scores_fast)

# (B) Pick specific metrics
scores_specific = gen_metrics.ragas_generator(
    predictions,
    judge=ragas_hf,
    metrics=["faithfulness", "response_relevancy", "context_precision", "context_recall"],
)
print("RAGAS (specific):", scores_specific)

# (C) OpenAI judge (much faster if you have an API key)
ragas_openai = RagasModels(llm_kind="openai", llm_name="gpt-4o-mini", timeout=30)
scores_openai = gen_metrics.all(predictions, ragas_models=ragas_openai)
print("RAGAS (OpenAI):", {k: v for k, v in scores_openai.items() if k.startswith("ragas_")})

📜 Supported Models

1️⃣ Index

✅ Wikipedia
✅ MS-MARCO
🕒 Online Search

1️⃣ Retrievers

✅ BM25
✅ DPR
✅ ColBERT
✅ ANCE
✅ BGE
✅ Contriever
✅ BPR
✅ HYDE
✅ SFR
✅ E5
✅ GritLM
✅ M2
✅ Nomic
✅ Instructor
✅ RaDeR
✅ ReasonIR
✅ BGE-Reasoner
✅ ReasonEmbed
✅ DiverRetriever
🕒 RepLlama
🕒 coCondenser
🕒 Spar
🕒 Dragon
🕒 Hybrid
✅ TAS-B
✅ UniCOIL
✅ SPLADE-v2
✅ OpenAI Embedding Retriever
✅ Cohere Embedding Retriever
✅ Voyage AI Retriever

2️⃣ Rerankers

✅ Cross-Encoders
✅ RankGPT
✅ RankGPT-API
✅ MonoT5
✅ MonoBert
✅ RankT5
✅ ListT5
✅ LiT5Score
✅ LiT5Dist
✅ Vicuna Reranker
✅ Zephyr Reranker
✅ Sentence Transformer-based
✅ FlashRank Models
✅ API-Based Rerankers
✅ ColBERT Reranker
✅ LLM Layerwise Ranker
✅ Splade Reranker
✅ UPR Reranker
✅ Inranker Reranker
✅ Transformer Reranker
✅ FIRST Reranker
✅ Blender Reranker
✅ LLM2VEC Reranker
✅ ECHO Reranker
✅ Incontext Reranker
✅ DuoT5
✅ RankLLaMA
✅ DeAR
🕒 DynRank
🕒 ASRank
✅ PRP (Pairwise Ranking Prompting)
✅ RankMistral
✅ RankGemma
🕒 SetRank
🕒 Cohere Rerank API
✅ TART
🕒 PolyEncoder

3️⃣ Generator

RAG-Methods

✅ Zero-shot
✅ Basic-RAG
✅ Chain-of-Thought-RAG
✅ Fusion-in-Decoder (FiD) with T5
✅ In-Context Learning RALM
🕒 Self-Consistency RAG
🕒 Retrieval Chain-of-Thought

LLM-Endpoints

✅ Hugging Face
✅ vLLM
✅ LiteLLM
✅ OpenAI

✨ Features

🔥 Unified Framework: Combines retrieval, re-ranking, and retrieval-augmented generation (RAG) into a single modular toolkit.
📚 Rich Dataset Support: Includes 40+ benchmark datasets with pre-retrieved documents for seamless experimentation.
🧲 Diverse Retrieval Methods: Supports BM25, DPR, ANCE, BPR, ColBERT, BGE, Contriever, SFR, E5, GritLM, M2, Nomic, Instructor, RaDeR, ReasonIR, BGE-Reasoner and ReasonEmbed for flexible retrieval strategies.
🎯 Powerful Re-Ranking: Implements 28 advanced models with 44 sub-methods to optimize ranking performance.
🏗️ Prebuilt Indices: Provides Wikipedia and MS MARCO corpora, eliminating indexing overhead and speeding up retrieval.
🔮 Seamless RAG Integration: Works with backends like Hugging Face, OpenAI, vLLM, LiteLLM inferening models like GPT, LLAMA, T5, and Fusion-in-Decoder (FiD) for multiple retrieval-augmented generation methods.
🛠 Extensible & Modular: Easily integrates custom datasets, retrievers, ranking models, and RAG pipelines.
📊 Built-in Evaluation Suite: Includes retrieval, ranking, and RAG metrics for robust benchmarking.
📖 User-Friendly Documentation: Access detailed 📖 online docs, example notebooks, and tutorials for easy adoption.

🔍 Roadmap

Rankify is still under development, and this is our first release (v0.1.0). While it already supports a wide range of retrieval, re-ranking, and RAG techniques, we are actively enhancing its capabilities by adding more retrievers, rankers, datasets, and features.

📖 Documentation

For full API documentation, visit the Rankify Docs.

💡 Contributing

Follow these steps to get involved:

Fork this repository to your GitHub account.

Create a new branch for your feature or fix:

git checkout -b feature/YourFeatureName

Make your changes and commit them:
```
git commit -m "Add YourFeatureName"
```

Push the changes to your branch:

git push origin feature/YourFeatureName

Submit a Pull Request to propose your changes.

Thank you for helping make this project better!

🌐 Community Contributions

Chinese community resources available!

Special thanks to Xiumao for writing two exceptional Chinese blog posts about Rankify:

📘 Introduction to Rankify

📘 Deep dive into re-ranking models in Rankify

These articles were crafted with high-traffic optimization in mind and are widely recommended in Chinese academic and developer circles.

We updated the 中文版本 to reflect these blog contributions while keeping original content intact—thank you Xiumao for your continued support!

:bookmark: License

Rankify is licensed under the Apache-2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

We would like to express our gratitude to the following libraries, which have greatly contributed to the development of Rankify:

Diver – For the reference implementation of the dense retriever routing and caching logic used to integrate various bi-encoders.
🔗 GitHub Repository
Rerankers – A powerful Python library for integrating various reranking methods.
🔗 GitHub Repository
Pyserini – A toolkit for supporting BM25-based retrieval and integration with sparse/dense retrievers.
🔗 GitHub Repository
FlashRAG – A modular framework for Retrieval-Augmented Generation (RAG) research.
🔗 GitHub Repository

:star2: Citation

Please kindly cite our paper if helps your research:

@article{abdallah2025rankify,
  title={Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Ali, Mohammed and Jatowt, Adam},
  journal={arXiv preprint arXiv:2502.02464},
  year={2025}
}