🌈 Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
October 12, 2025 · View on GitHub
Official PyTorch implementation of Rainbow Padding, a simple yet powerful strategy that resolves <eos> overflow in diffusion language models (dLLMs).
Visit our Project page and arxiv for paper if you are interested! This repository provides a step-by-step pipeline for SFT LoRA training, evaluation using Rainbow Padding.
If you have any questions, please contact the authors.
Demo
Rainbow Padding
LLaDA Instruct
1. Setup
1️⃣ Create the Conda Environment
conda env create -f environment.yaml
2️⃣ Activate the Environment
conda activate rainbow
2. Dataset Preparation
We follow the curation recipe introduced in Dream (arXiv:2508.15487).
The training corpus consists of 0.5M public examples curated from:
Details are provided in Appendix C.1 of the paper.
⚠️ Note: Specific SFT configurations for both Dream and LLaDA were not publicly released (to the best of our knowledge).
Download pre-tokenized data (recommended)
You can directly download our preprocessed datasets from Google Drive:
# Same data with tokenization per model type.
# LLaDA SFT data
gdown --folder 1U8kVGYiWRsqWCDRsHUjeKTiDPrL0FsMp
# Dream SFT data
gdown --folder 1-oei1KRTFADMRljPqX5rPuTGcJ7fpHdI
3. LoRA SFT Training
We use 🤗 Accelerate for multi-GPU training.
Key Arguments for main.py
| Argument | Description |
|---|---|
batch_size | Batch size per GPU. Control the total batch size using gradient_accumulation_steps in ./method/sft.py. |
pad_num | Number of cyclic padding tokens. Use 0 for <eos> padding or any positive integer (e.g., 3, 7) for Rainbow Padding. |
Example: Training with 4 GPUs and 7 Rainbow Padding Tokens
1️⃣ Initial Training
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --num_processes=4 main.py --model_type=llada_base --pad_num=7
2️⃣ Continue Training from a Checkpoint
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --num_processes=4 main.py --model_type=llada_base --pad_num=7 --resume_dir model/llada_base/sft_5e-05_lora_epoch3_rank32_pad7
4. Evaluation
We upload our checkpoint to Hugging Face: quasar529/rainbow-padding-llada.
We use widely used library LM-Eval-Harness, and we modified the evaluation script from LLaDA's eval script.
You can find the evaluation script in eval/eval_llada_instruct.py.
⚠️ Dependency Notice
To run evaluation, you must install specific versions of datasets and lm-eval due to dependency constraints:
pip install datasets==3.6.0 lm-eval==0.4.9.1
- lm-eval==0.4.9.1 requires datasets>=2.16.0,<4.0.
- However, the latest datasets (≥4.0.0) is incompatible.
- Therefore, you need to downgrade datasets to 3.6.0, which satisfies lm-eval's requirements and ensures stable evaluation. If you skip this step, evaluation scripts may still run but can break unexpectedly due to mismatched APIs.
Example Command
# Example: Humaneval_instruct
accelerate launch --num_processes=1 eval/eval_llada_instruct.py \
--tasks humaneval_instruct \
--model llada_dist \
--batch_size 1 \
--log_samples \
--output_path "/home/quasar529/rainbow-padding/eval/output" \
--confirm_run_unsafe_code \
# If you want to use wandb, set wandb_log, wandb_project, wandb_entity
--model_args model_path='GSAI-ML/LLaDA-8B-Base',steps=1024,gen_length=1024,block_length=1024,lora_path='quasar529/rainbow-padding-llada',device='cuda',wandb_log=True,wandb_project='llada-eval',wandb_entity='your-wandb-entity'
If you want to reproduce all evaluation tasks performed in the paper at once, you can simply run the provided shell script:
sh eval/eval.sh
5. Citation
If you find this work useful, please cite:
@article{kim2025rainbow,
title={Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs},
author={Kim, Bumjun and Jeon, Dongjae and Kim, Dueun and Jeung, Wonje and No, Albert},
journal={arXiv preprint arXiv:2510.03680},
year={2025}
}
6. Acknowledgements
This code builds upon the open-sourced implementations of
Dream and LLaDA.
We thank the authors for releasing their resources and inspiring this work.