ReDi: Rectified Discrete Flow

May 11, 2026 · View on GitHub

ReDi: Rectified Discrete Flow

Jaehoon Yoo, Wonjung Kim, Seunghoon Hong

KAIST

[Project Page] | [Paper] | [Checkpoint]

TL;DR

We introduce ReDi, a novel iterative method that reduces factorization error by rectifying the coupling between source and target distribution.

Overview

Discrete Flow-based Models (DFMs) are powerful generative models for high-quality discrete data but typically suffer from slow sampling speeds due to their reliance on iterative decoding processes. This reliance on a multi-step process originates from the factorization approximation of DFMs, which is necessary for handling high-dimensional data. In this paper, we analyze the factorization approximation error using Conditional Total Correlation (TC), and reveal its dependence on the coupling. To address the challenge of efficient few-step generation, we propose Rectified Discrete Flow (ReDi), a novel iterative method that reduces the underlying factorization error (measured as Conditional TC) by rectifying the coupling between source and target distributions. We theoretically prove that each ReDi step guarantees a monotonic decreasing Conditional TC, ensuring its convergence. Empirically, ReDi significantly reduces Conditional TC and enables few-step generation. Moreover, we demonstrate that the rectified couplings are well-suited for training efficient one-step models on image generation. ReDi offers a simple and theoretically grounded approach for tackling the few-step challenge, providing a new perspective on efficient discrete data synthesis.

Project Structure

  ├ image/                                        <- MaskGIT based ReDi-image code
  |    ├── Metrics/                               <- Evaluation tool
  |    |      ├── inception_metrics.py                  
  |    |      └── sample_and_eval.py
  |    |    
  |    ├── Network/                             
  |    |      ├── Taming/                         <- VQGAN architecture   
  |    |      ├── reweight_mlp.py                 <- Embedding architecture  
  |    |      └── transformer.py                  <- Transformer architecture  
  |    |
  |    ├── Trainer/                               <- Main class for training
  |    |      ├── trainer.py                      <- Abstract trainer     
  |    |      └── vit.py                          <- Trainer of MaskGIT
  |    |
  |    ├── Scripts/                               <- Shell scripts for training/evaluation.
  |    |      ├── create_rectified_dataset.sh     <- Rectify the model 
  |    |      ├── finetune_model.sh               <- Finetune the model (from origin MaskGIT)     
  |    |      ├── test_model.sh                   <- Test the model
  |    |      └── train_model.sh                  <- Train the model
  |    |
  |    ├── compute_tc.py                          <- Compute the TC
  |    ├── download_models.py                     <- Download the pretrained models
  |    ├── LICENSE.txt                            <- MIT license
  |    ├── requirements.txt                       <- Help to install env 
  |    ├── README.md                              
  |    └── main.py                                <- Main
  |
  ├ text/                                         <- Duo based ReDi-text code
  |    ├── config/                                <- Config files for datasets/denoising networks/noise schedules/LR schedules.
  |    |      └── config.yaml                     <- Main config file
  |    |
  |    ├── integral/
  |    |    
  |    ├── models/                                <- Denoising network architectures. Supports [DiT](https://arxiv.org/abs/2212.09748) and AR transformer.
  |    |      ├── dit.py                          <- DiT structure
  |    |      ├── ema.py                          <- EMA model
  |    |      └── unit_test_attention.py          <- Attention module
  |    |
  |    ├── scripts/                               <- Shell scripts for training/evaluation.
  |    |      ├── distil_*                        <- Distillate the model    
  |    |      ├── eval_*                          <- Evaluate the model     
  |    |      ├── gen_ppl_*                       <- Measure the generation perplexity     
  |    |      ├── gen_ppl_tc_*                    <- Measure the generation perplexity and total correlation score
  |    |      ├── rectifi_*                       <- Rectify the model
  |    |      ├── train_*                         <- Train the model 
  |    |      └── zero_shot_*                     
  |    |
  |    ├── algo.py                                <- Main model structures: Algorithms such as DUO, MDLM, AR, SEDD, D3PM, ReDi.
  |    ├── dataloader.py                          <- Dataloader and tokenizer module
  |    ├── LICENSE                                <- Apache License 2.0
  |    ├── main.py                                <- Main
  |    ├── metrics.py                             <- Metrics module
  |    ├── README.md                              
  |    ├── requirements.txt                       <- Help to install env 
  |    ├── trainer_base.py                        <- Boiler plate trainer using pytorch lightning.
  |    └── utils.py                               <- LR scheduler, logging, `fsspec` handling.

Usage

To get started, you can follow the process in "Usage" part of each image and text README.

Experiment score

Image

StepModelFID(↓)IS(↑)Prec.(↑)Rec.(↑)Den.(↑)Cov.(↑)
1MaskGIT95.16120.260.120.170.35
SDTT90.40140.310.130.210.34
Di4C90.32130.260.240.170.33
ReDi137.43490.630.510.780.86
ReDi221.80900.740.521.050.93
ReDi3-distill11.681820.830.441.250.96
4MaskGIT10.901840.830.461.180.96
SDTT8.972050.880.411.430.97
Di4C6.202160.870.521.330.98
ReDi17.582280.870.461.330.98
ReDi27.862400.870.441.310.97
8MaskGIT6.512270.890.481.380.98

Text

License

The ReDi-image(Halton-MaskGIT-based ReDi) is licensed under the MIT License, and the ReDi-text(DUO-based ReDi) is licensed under the Apache License 2.0. From this, this project is licensed under the Apache License 2.0.

Acknowledgments

In this repo, we integrate Halton-MaskGIT and DUO as the structural foundation for our implementation, which involves both image and text models to demonstrate the ReDi method.

The pretrained VQGAN ImageNet is from the Halton-MaskGIT and LlamaGen official repository.

Citation

Cite our paper using:

@misc{yoo2025redirectifieddiscreteflow,
      title={ReDi: Rectified Discrete Flow}, 
      author={Yoo, Jaehoon and Kim, Wonjung and Hong, Seunghoon},
      year={2025},
      eprint={2507.15897},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.15897}, 
}