RELAY: Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
March 20, 2026 ยท View on GitHub
This is the official implementation of the EACL 2026 paper: Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning.
Abstract
Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model's reasoning capabilities. However, generating long and correct CoT trajectories is challenging. Recent studies have demonstrated that Looped Transformers, a standard Transformer with cross-block parameter-sharing architecture, possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions. To better leverage the strengths of Looped Transformers, we propose RELAY (REasoning through Loop Alignment iterativelY). Specifically, we align the steps of Chain-of-Thought (CoT) reasoning with loop iterations and apply intermediate supervision during the training of Looped Transformers. This additional iteration-wise supervision not only preserves the Looped Transformer's ability for length generalization but also enables it to predict CoT reasoning steps for unseen data. Therefore, we leverage this Looped Transformer to generate accurate reasoning chains for complex problems that exceed the training length, which will then be used to fine-tune an auto-regressive model. We conduct extensive experiments, and the results demonstrate the effectiveness of our approach, with significant improvements in the performance of the auto-regressive model.
Repository Structure
The codebase is organized as follows:
arithmetic/,ED/,LIS/: Task-specific data generation and logic for Arithmetic, Edit Distance, and Longest Increasing Subsequence.general/: General utilities and shared functions.scripts/: Contains shell scripts for data generation, training, and evaluation.gen_data/: Scripts to generate task data (e.g.,get_ari_data_left1st.sh).train/: Scripts to train models (e.g.,train_mix_loop_align_cot.sh).eval/: Scripts to evaluate models (e.g.,test_mix_loop_align_cot_rope.sh).
model_*.py: Core model implementations (e.g.,model_align_flash.py,model_flash.py).train_*.py: Training loops and setups (e.g.,train_mix_align_flash.py).infer_*.py&test_mix.py: Inference and evaluation scripts.
How to Start
-
Clone the repository:
git clone https://github.com/qifanyu/RELAY.git cd RELAY -
Environment Setup: We provide a
setup.shscript to install the required dependencies (requires Python 3).bash setup.shThis will install necessary packages like
torchandxformers. -
Generate Data: Run the scripts in
scripts/gen_data/to generate the required datasets for different tasks.bash scripts/gen_data/get_ari_data_left1st.sh bash scripts/gen_data/get_ed_data_under.sh bash scripts/gen_data/get_lis_data_align10.sh -
Training: To train the RELAY model, use the training scripts provided in
scripts/train/. For example, to train the loop-aligned model:bash scripts/train/train_mix_loop_align_cot.shOther baseline training scripts like
train_mix_cot.shandtrain_mix_loop.share also available. -
Evaluation: To evaluate the trained models, use the scripts in
scripts/eval/. For example:bash scripts/eval/test_mix_loop_align_cot_rope.sh
Citation
If you find this code or our paper useful, please cite our work:
@misc{yu2025enhancingautoregressivechainofthoughtloopaligned,
title={Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning},
author={Qifan Yu and Zhenyu He and Sijie Li and Xun Zhou and Jun Zhang and Jingjing Xu and Di He},
year={2025},
eprint={2502.08482},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.08482}
}