TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

May 13, 2026 · View on GitHub

TAD Results

1. Environment Setup

conda create -n tad python=3.10 -y
conda activate tad

pip install -r requirements.txt

2. Data Preparation

2.1 Generate Math Trajectories

For LLaDA:

python prepare/llada_gen_math_traj.py \
    --model_name <your_llada_model_path> \
    --output_path data/llada_math_traj.jsonl \
    --dataset_name gsm8k \
    --dataset_split train \
    --max_new_tokens 256 \
    --steps 256 \
    --block_length 32 \
    --num_samples 1 \
    --limit 0

For Dream:

python prepare/dream_gen_math_traj.py \
    --model_name <your_dream_model_path> \
    --output_path data/dream_math_traj.jsonl \
    --dataset_name gsm8k \
    --dataset_split train \
    --max_new_tokens 256 \
    --block_length 32 \
    --top_p 0.95 \
    --alg entropy \
    --num_samples 1 \
    --limit 0

2.2 Generate Code Trajectories

For LLaDA:

python prepare/llada_gen_code_traj.py \
    --model_name <your_llada_model_path> \
    --parquet_path <path_to_kodcode_parquet> \
    --output_path data/llada_code_traj.jsonl \
    --max_new_tokens 256 \
    --steps 256 \
    --block_length 32 \
    --num_samples 1 \

For Dream:

python prepare/dream_gen_code_traj.py \
    --model_name <your_dream_model_path> \
    --parquet_path <path_to_kodcode_parquet> \
    --output_path data/dream_code_traj.jsonl \
    --max_new_tokens 256 \
    --block_length 32 \
    --top_p 0.95 \
    --alg entropy \
    --num_samples 1 \

3. Training

3.1 Configure

Edit the config file to set your model path and training hyperparameters:

LLaDA: LLaDA/configs/config_llada.yaml
Dream: Dream/configs/config_dream.yaml

3.2 Launch Training

Train LLaDA:

cd LLaDA
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 deepspeed train_llada.py

Train Dream:

cd Dream
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 deepspeed train_dream.py

4. Evaluation

4.1 Merge LoRA Weights

After training, merge the LoRA adapter into the base model. Edit the paths in merge_lora.py and run:

For LLaDA:

cd LLaDA
# Edit merge_lora.py:
#   - name = "<your_base_model_path>"
#   - PeftModel.from_pretrained(base_model, "<your_checkpoint_path>")
#   - merged_model.save_pretrained("<save_path>")
python merge_lora.py

4.2 Run Evaluation

Edit the model_path variable in the evaluation script to point to your merged model, then run:

Evaluate LLaDA:

cd eval
# Edit eval_llada.sh: set model_path="<your_merged_model_path>"
bash eval_llada.sh

The evaluation scripts cover the following benchmarks:

Math: GSM8K, MATH (Minerva)
Code: HumanEval, MBPP

Results will be saved under evals_results/.

Acknowledgments

This project builds upon excellent open-source work:

LLaDA - Large Language Diffusion Models
Dream - Diffusion Large Language Models
Fast-dLLM - Training-free acceleration
D2F - Discrete diffusion forcing
dParallel - Distilled dLLMs
lm-evaluation-harness - Evaluation framework
d3LLM - Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation