DDRO: Direct Document Relevance Optimization for Generative Information Retrieval
May 21, 2026 · View on GitHub
Official implementation of our SIGIR 2025 paper: Lightweight and Direct Document Relevance Optimization for Generative IR
Table of Contents
- Motivation
- Method
- Project Structure
- Setup
- Data Preparation
- Training
- Evaluation
- Datasets and Checkpoints
- Acknowledgments
- Citation
Motivation
Generative IR models are typically trained via next-token prediction (cross-entropy loss) over docid tokens. While effective for language modeling, this objective optimizes token-level generation , not document-level ranking, which is the core requirement in IR systems.
DDRO addresses this misalignment by directly optimizing the model for document-level ranking using pairwise preference learning, without reinforcement learning or reward modeling.
Method
DDRO trains in two phases:
Phase 1 — Supervised Fine-Tuning (SFT)
The model learns to generate the correct docid sequence for a given query via autoregressive next-token prediction across three stages:
- Pretraining — document content to docid (
doc → docid) - Search pretraining — pseudo queries to docid (
pseudoquery → docid) - Fine-tuning — real queries to docid using qrels supervision (
query → docid)
Phase 2 — Pairwise Ranking Optimization (DDRO Loss)
The model is fine-tuned with a pairwise learning-to-rank objective inspired by Direct Preference Optimization (Rafailov et al., 2023), adapted for structured docid generation under beam decoding constraints.
The DDRO loss trains the model to prefer relevant documents (docid+) over non-relevant ones (docid-) relative to a frozen SFT reference policy:
| Symbol | Description |
|---|---|
docid+ | Relevant document for query q |
docid- | Non-relevant document |
π_θ | Current model being optimized |
π_ref | Frozen SFT reference model |
β | Temperature controlling preference sensitivity |
Why DDRO differs from standard DPO
| DPO | DDRO | |
|---|---|---|
| Architecture | Decoder-only | Encoder-decoder |
| Output | Free-form text | Structured docid sequences |
| Decoding | Greedy/sampling | Constrained beam search |
| Objective | Open-ended preference | Document-level ranking |
Project Structure
src/
├── data/ # Downloading, preprocessing, and docid instance generation
├── pretrain/ # Model training and evaluation
├── scripts/ # Shell scripts for SFT, DDRO, BM25, and preprocessing
└── utils/ # Tokenization, trie, metrics, trainers
ddro_env.yml # Conda environment for training
pyserini.yml # Conda environment for BM25 retrieval
requirements.txt # Python dependencies
Each subdirectory contains a detailed
README.mdwith further instructions.
Setup
1. Install environment
git clone https://github.com/kidist-amde/ddro.git
cd ddro
conda env create -f ddro_env.yml
conda activate ddro_env
2. Download datasets and pretrained model
We use MS MARCO (top-300K) and Natural Questions (NQ-320K), plus a pretrained T5-base model.
bash ./src/data/download/download_msmarco_datasets.sh
bash ./src/data/download/download_nq_datasets.sh
python ./src/data/download/download_t5_model.py
See src/data/download/README.md for details.
Data Preparation
MS MARCO — sample top-300K subset
bash scripts/preprocess/sample_top_docs.sh
Output: resources/datasets/processed/msmarco-docs-sents.top.300k.json.gz
Expected directory structure
resources/
├── datasets/
│ ├── raw/
│ │ ├── msmarco-data/
│ │ └── nq-data/
│ └── processed/
└── transformer_models/
└── t5-base/
For full preprocessing instructions (docid generation, training/eval instance creation): src/data/data_prep/README.md
Training
Phase 1 — SFT
Run all three SFT stages with a single command:
bash src/scripts/sft/launch_SFT_training.sh
The --encoding flag controls the docid format (pq or url_title).
Phase 2 — DDRO
After SFT, run pairwise ranking optimization:
bash scripts/ddro/slurm_submit_ddro_training.sh
DDRO is implemented using a custom version of HuggingFace's DPOTrainer.
Evaluation
Option A — Quick evaluation via launcher (recommended)
# SLURM
sbatch src/pretrain/hf_eval/slurm_submit_hf_eval.sh
# Direct
python src/pretrain/hf_eval/launch_hf_eval_from_config.py \
--dataset msmarco \
--encoding pq \
--scale top_300k \
--hf_docids_repo kiyam/ddro-docids \
--hf_tests_repo kiyam/ddro-testsets
Option B — Manual evaluation with HF URIs
NQ + Title+URL
python src/pretrain/hf_eval/eval_hf_docid_ranking.py \
--pretrain_model_path kiyam/ddro-nq-tu \
--docid_path "hf:dataset:kiyam/ddro-docids:tu_nq_docids.txt" \
--test_file_path "hf:dataset:kiyam/ddro-testsets:nq/test_data/query_dev.t5_128_1.url_title_nq.json" \
--dataset_script_dir src/data/data_scripts \
--dataset_cache_dir ./cache \
--num_beams 50 --add_doc_num 6144 \
--max_seq_length 64 --max_docid_length 100 \
--use_docid_rank True --docid_format nq \
--lookup_fallback True --device cuda:0
MS MARCO + PQ
python src/pretrain/hf_eval/eval_hf_docid_ranking.py \
--pretrain_model_path kiyam/ddro-msmarco-pq \
--docid_path "hf:dataset:kiyam/ddro-docids:pq_msmarco_docids.txt" \
--test_file_path "hf:dataset:kiyam/ddro-testsets:msmarco/test_data_top_300k/query_dev.t5_128_1.pq.top_300k.json" \
--dataset_script_dir src/data/data_scripts \
--dataset_cache_dir ./cache \
--num_beams 80 --add_doc_num 6144 \
--max_seq_length 64 --max_docid_length 24 \
--use_docid_rank True --docid_format msmarco \
--lookup_fallback True --device cuda:0
Results
| Dataset | Docid | Model | MRR@10 | R@10 |
|---|---|---|---|---|
| MS MARCO | PQ | kiyam/ddro-msmarco-pq | 45.76 | 73.02 |
| MS MARCO | TU | kiyam/ddro-msmarco-tu | 50.07 | 74.01 |
| NQ | PQ | kiyam/ddro-nq-pq | 55.51 | 67.31 |
| NQ | TU | kiyam/ddro-nq-tu | 45.99 | 55.98 |
Notes
- Recommended:
transformers==4.37.2,tokenizers==0.15.2 - NQ–PQ uses canonical integer docids; NQ–TU uses lowercased url_title strings — do not mix assets across sources
- Default beam counts: NQ-PQ (100), NQ-TU (50), MS MARCO-PQ (80)
- Logs saved to
logs/<dataset>/dpo_*.logandlogs/<dataset>/dpo_*.csv
Datasets and Checkpoints
All preprocessed datasets, docid encodings, and model checkpoints: DDRO Generative IR Collection on Hugging Face
| Resource | Link |
|---|---|
| MS MARCO Top-300K dataset | kiyam/ddro-ms-dataset |
| NQ-320K dataset | kiyam/ddro-nq-dataset |
| DocID tables | kiyam/ddro-docids |
| Eval test sets | kiyam/ddro-testsets |
Acknowledgments
License
This project is licensed under the Apache 2.0 License.
Citation
@inproceedings{mekonnen2025lightweight,
title={Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval},
author={Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten},
booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages={1327--1338},
year={2025}
}
For questions, please open an issue.
© 2025 Kidist Amde Mekonnen — IRLab, University of Amsterdam