README.md

February 14, 2025 · View on GitHub

CoT-Valve: Length-Compressible Chain-of-Thought Tuning


The reasoning model, after the length-compressible CoT tuning, can generate reasoning paths from long to short, leveraging LoRA as a `Valve'.

Xinyin Ma*, Guangnian Wan*, Runpeng Yu, Gongfan Fang, Xinchao Wang
Learning and Vision Lab, National University of Singapore
🥯[Arxiv] 🎄[Dataset] 🤖[Models] (coming soon)
* Equal Contribution

Introduction

We propose a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths.

  • We propose to identify a direction in the parameter space that, when manipulated, can effectively control the length of generated CoT.
  • We construct datasets with chains from long to short for the same questions and explore two enhanced strategies for CoT-Valve: (1) a precise length-compressible CoT tuning method, and (2) a progressive chain length compression approach.
  • CoT-Valve successfully enables controllability and compressibility of the chain and shows better performance than the prompt-based control.
  • We applied this method to QwQ-32B-Preview, reducing reasoning chains on GSM8K from 741 to 225 tokens with a minor performance drop (95.07% to 94.92%) and on AIME from 6827 to 4629 tokens, with only one additional incorrect answer.

TODO

  • Release the dataset
  • Release the model
  • Release the trainng code

🤗Datasets

We release the following datasets on Huggingface:

Dataset NameLinkDescription
MixChain-Z-GSM8KLinkMixChain-Z-GSM8K is a dataset containing 6,863 samples, with each sample containing five different solutions.
MixChain-Z-PRM12KLinkMixChain-Z-PRM12K is a dataset containing 12,000 samples (unfiltered), with each sample containing five different solutions
MixChain-C-LIMOLinkMixChain-C-LIMO contains two distinct solutions for each question from the LIMO dataset. These solutions vary in the number of samples and the average length of their CoT.

Training Code

To be released

Models

To be released