README.md
May 21, 2025 Β· View on GitHub
Unlocking the Boundaries of Thought: A Reasoning Granularity Framework to Quantify and Optimize Chain-of-Thought
| [NeurIPS(Oral)] | [ArXiv-RBF] | [ArXiv-RBF++] | [π€HuggingFace] |
π Any contributions via PRs, issues, emails or other methods are greatly appreciated.
π₯News
- ποΈ We update our work to RBF++ and introduce BigGSM++ for quantify the reasoning boundary in multimodal and long chain-of-thought reasoning scenarios (Data is available at Google Drive).
- ποΈ Our work is accepted by NeurIPS 2024 (Oral).
- π₯ We have release benchmark on [π€HuggingFace].
- π₯ The paper is also available on [ArXiv].
π‘ Motivation
Chain-of-Thought (CoT) reasoning has emerged as a promising approach for enhancing the performance of large language models (LLMs) on complex reasoning tasks. Recently, a series of studies attempt to explain the mechanisms underlying CoT, aiming to deepen the understanding and enhance its efficacy. Nevertheless, the existing research faces two major challenges:
- (1) A lack of quantitative metrics to assess CoT capabilities
- (2) A dearth of guidance on optimizing CoT performance.
Motivated by this, in this work, we introduce a novel reasoning granularities (RG) methodological framework to address these challenges. To solve the lack of quantification, we first define an RG to quantify the upper bound of CoT and establish a combination law for RG, enabling a practical quantitative approach applicable to various real-world CoT tasks. To address the lack of optimization, we propose three categories of RGs. We further optimize these categories with combination laws focused on RG promotion and reasoning path optimization for CoT improvement. Through extensive experiments on 25 models and 4 tasks, the study validates the existence and rationality of the proposed framework. Furthermore, it explains the effectiveness of 10 CoT strategies and guides optimization from two perspectives.
We hope this work can provide a comprehensive understanding of the boundaries and optimization strategies for reasoning in LLMs.
π― Installation
1. Dataset Preparation
Load Dataset from Huggingface
import datasets
dataset = datasets.load_dataset("LightChen2333/BigGSM")
2. Install from git
Our code requires Python>=3.10
git clone https://github.com/LightChen233/reasoning-boundary.git && cd reasoning-boundary/
pip install -r requirements.txt
3. Evaluation for reproduction
python evaluate.py --data_split CoT
where --data_split can be selected from [CoT, Tool-Usage, PoT, Complex-CoT, LtM, MARP, PoT-MARP, gpt-4o, gpt-4o-MARP, o1-preview].
4. Evaluation for your results
python evaluate.py --data_split custom \
--K 0.301 \
--K2 0.92 \
--mode nl \
--result_path [PREDICTION_PATH]
PREDICTION_PATH consists the results predicted by model which save as jsonl format. Among them, each line of file must meet the following format:
{
"index": "str",
"pred": [
{
"role": "user",
"content": [{"type": "text", "text": "str"}]
},
{
"role": "assistant",
"content": [{"type": "text", "text": "str"}]
}
],
"origin": {
"index": "str",
"question": "str",
"answer": "str",
}
}
π¨οΈFile Structure
root
βββ data # data folder where the BigGSM dataset is loaded
βββ experiment # All experimental data
β βββ RBF # Experimental results for RBF.
β βββ RBF++ # Experimental results under RBF++.
βββ utils # Tool library folder
β βββ data.py # Dataset loading class
β βββ request_tool.py # API request tool
β βββ tools.py # Common-used tools
βββ draw_bound_*.py # Draw reasoning boundary script
βββ evaluate_*.py # Evaluation script
βοΈ Reference
If you find this project useful for your research, please kindly consider citing the following paper:
@inproceedings{chen-etal-2024-rg,
title = "Unlocking the Boundaries of Thought: A Reasoning Granularity Framework to Quantify and Optimize Chain-of-Thought",
author = "Chen, Qiguang and
Qin, Libo and
Jiaqi, Wang and
Jinxuan, Zhou and
Che, Wanxiang",
booktitle = "Proc. of NeurIPS",
year = "2024",
}
π² Contact
Please create Github issues here or email Qiguang Chen if you have any questions or suggestions.