🌈 Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs

October 12, 2025 · View on GitHub

Official PyTorch implementation of Rainbow Padding, a simple yet powerful strategy that resolves <eos> overflow in diffusion language models (dLLMs).

Visit our Project page and arxiv for paper if you are interested! This repository provides a step-by-step pipeline for SFT LoRA training, evaluation using Rainbow Padding.

If you have any questions, please contact the authors.

Demo

Rainbow Padding

LLaDA Instruct

1. Setup

1️⃣ Create the Conda Environment

conda env create -f environment.yaml

2️⃣ Activate the Environment

conda activate rainbow

2. Dataset Preparation

We follow the curation recipe introduced in Dream (arXiv:2508.15487).
The training corpus consists of 0.5M public examples curated from:

Details are provided in Appendix C.1 of the paper.

⚠️ Note: Specific SFT configurations for both Dream and LLaDA were not publicly released (to the best of our knowledge).

Download pre-tokenized data (recommended)

You can directly download our preprocessed datasets from Google Drive:

# Same data with tokenization per model type.

# LLaDA SFT data
gdown --folder 1U8kVGYiWRsqWCDRsHUjeKTiDPrL0FsMp

# Dream SFT data
gdown --folder 1-oei1KRTFADMRljPqX5rPuTGcJ7fpHdI

3. LoRA SFT Training

We use 🤗 Accelerate for multi-GPU training.

Key Arguments for `main.py`

Argument	Description
`batch_size`	Batch size per GPU. Control the total batch size using `gradient_accumulation_steps` in `./method/sft.py`.
`pad_num`	Number of cyclic padding tokens. Use `0` for `<eos>` padding or any positive integer (e.g., `3`, `7`) for Rainbow Padding.

Example: Training with 4 GPUs and 7 Rainbow Padding Tokens

1️⃣ Initial Training

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --num_processes=4 main.py --model_type=llada_base --pad_num=7

2️⃣ Continue Training from a Checkpoint

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --num_processes=4 main.py --model_type=llada_base  --pad_num=7  --resume_dir model/llada_base/sft_5e-05_lora_epoch3_rank32_pad7

4. Evaluation

We upload our checkpoint to Hugging Face: quasar529/rainbow-padding-llada.

We use widely used library LM-Eval-Harness, and we modified the evaluation script from LLaDA's eval script. You can find the evaluation script in eval/eval_llada_instruct.py.

⚠️ Dependency Notice

To run evaluation, you must install specific versions of datasets and lm-eval due to dependency constraints:

pip install datasets==3.6.0 lm-eval==0.4.9.1

lm-eval==0.4.9.1 requires datasets>=2.16.0,<4.0.
However, the latest datasets (≥4.0.0) is incompatible.
Therefore, you need to downgrade datasets to 3.6.0, which satisfies lm-eval's requirements and ensures stable evaluation. If you skip this step, evaluation scripts may still run but can break unexpectedly due to mismatched APIs.

Example Command

# Example: Humaneval_instruct
accelerate launch --num_processes=1 eval/eval_llada_instruct.py \
  --tasks humaneval_instruct \
  --model llada_dist \
  --batch_size 1 \
  --log_samples \
  --output_path "/home/quasar529/rainbow-padding/eval/output" \
  --confirm_run_unsafe_code \
  # If you want to use wandb, set wandb_log, wandb_project, wandb_entity
  --model_args model_path='GSAI-ML/LLaDA-8B-Base',steps=1024,gen_length=1024,block_length=1024,lora_path='quasar529/rainbow-padding-llada',device='cuda',wandb_log=True,wandb_project='llada-eval',wandb_entity='your-wandb-entity'

If you want to reproduce all evaluation tasks performed in the paper at once, you can simply run the provided shell script:

sh eval/eval.sh

5. Citation

If you find this work useful, please cite:

@article{kim2025rainbow,
  title={Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs},
  author={Kim, Bumjun and Jeon, Dongjae and Kim, Dueun and Jeung, Wonje and No, Albert},
  journal={arXiv preprint arXiv:2510.03680},
  year={2025}
}

6. Acknowledgements

This code builds upon the open-sourced implementations of
Dream and LLaDA.
We thank the authors for releasing their resources and inspiring this work.