RELAY: Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning

March 20, 2026 · View on GitHub

This is the official implementation of the EACL 2026 paper: Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning.

Abstract

Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model's reasoning capabilities. However, generating long and correct CoT trajectories is challenging. Recent studies have demonstrated that Looped Transformers, a standard Transformer with cross-block parameter-sharing architecture, possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions. To better leverage the strengths of Looped Transformers, we propose RELAY (REasoning through Loop Alignment iterativelY). Specifically, we align the steps of Chain-of-Thought (CoT) reasoning with loop iterations and apply intermediate supervision during the training of Looped Transformers. This additional iteration-wise supervision not only preserves the Looped Transformer's ability for length generalization but also enables it to predict CoT reasoning steps for unseen data. Therefore, we leverage this Looped Transformer to generate accurate reasoning chains for complex problems that exceed the training length, which will then be used to fine-tune an auto-regressive model. We conduct extensive experiments, and the results demonstrate the effectiveness of our approach, with significant improvements in the performance of the auto-regressive model.

Repository Structure

The codebase is organized as follows:

arithmetic/, ED/, LIS/: Task-specific data generation and logic for Arithmetic, Edit Distance, and Longest Increasing Subsequence.
general/: General utilities and shared functions.
scripts/: Contains shell scripts for data generation, training, and evaluation.
- gen_data/: Scripts to generate task data (e.g., get_ari_data_left1st.sh).
- train/: Scripts to train models (e.g., train_mix_loop_align_cot.sh).
- eval/: Scripts to evaluate models (e.g., test_mix_loop_align_cot_rope.sh).
model_*.py: Core model implementations (e.g., model_align_flash.py, model_flash.py).
train_*.py: Training loops and setups (e.g., train_mix_align_flash.py).
infer_*.py & test_mix.py: Inference and evaluation scripts.

How to Start

Clone the repository:

git clone https://github.com/qifanyu/RELAY.git
cd RELAY

Environment Setup: We provide a setup.sh script to install the required dependencies (requires Python 3).
```
bash setup.sh
```
This will install necessary packages like torch and xformers.

Generate Data: Run the scripts in scripts/gen_data/ to generate the required datasets for different tasks.

bash scripts/gen_data/get_ari_data_left1st.sh
bash scripts/gen_data/get_ed_data_under.sh
bash scripts/gen_data/get_lis_data_align10.sh

Training: To train the RELAY model, use the training scripts provided in scripts/train/. For example, to train the loop-aligned model:
```
bash scripts/train/train_mix_loop_align_cot.sh
```
Other baseline training scripts like train_mix_cot.sh and train_mix_loop.sh are also available.
Evaluation: To evaluate the trained models, use the scripts in scripts/eval/. For example:
```
bash scripts/eval/test_mix_loop_align_cot_rope.sh
```

Citation

If you find this code or our paper useful, please cite our work:

@misc{yu2025enhancingautoregressivechainofthoughtloopaligned, 
      title={Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning}, 
      author={Qifan Yu and Zhenyu He and Sijie Li and Xun Zhou and Jun Zhang and Jingjing Xu and Di He}, 
      year={2025}, 
      eprint={2502.08482}, 
      archivePrefix={arXiv}, 
      primaryClass={cs.CL}, 
      url={https://arxiv.org/abs/2502.08482}
}