README.md

May 21, 2025 Β· View on GitHub

SVG Image Unlocking the Boundaries of Thought: A Reasoning Granularity Framework to Quantify and Optimize Chain-of-Thought

version PRs-Welcome stars FORK Issues

| [NeurIPS(Oral)] | [ArXiv-RBF] | [ArXiv-RBF++] | [πŸ€—HuggingFace] |

🌟 Any contributions via PRs, issues, emails or other methods are greatly appreciated.

πŸ”₯News

  • πŸŽ–οΈ We update our work to RBF++ and introduce BigGSM++ for quantify the reasoning boundary in multimodal and long chain-of-thought reasoning scenarios (Data is available at Google Drive).
  • πŸŽ–οΈ Our work is accepted by NeurIPS 2024 (Oral).
  • πŸ”₯ We have release benchmark on [πŸ€—HuggingFace].
  • πŸ”₯ The paper is also available on [ArXiv].

πŸ’‘ Motivation

Chain-of-Thought (CoT) reasoning has emerged as a promising approach for enhancing the performance of large language models (LLMs) on complex reasoning tasks. Recently, a series of studies attempt to explain the mechanisms underlying CoT, aiming to deepen the understanding and enhance its efficacy. Nevertheless, the existing research faces two major challenges:

  • (1) A lack of quantitative metrics to assess CoT capabilities
  • (2) A dearth of guidance on optimizing CoT performance.

Motivated by this, in this work, we introduce a novel reasoning granularities (RG) methodological framework to address these challenges. To solve the lack of quantification, we first define an RG to quantify the upper bound of CoT and establish a combination law for RG, enabling a practical quantitative approach applicable to various real-world CoT tasks. To address the lack of optimization, we propose three categories of RGs. We further optimize these categories with combination laws focused on RG promotion and reasoning path optimization for CoT improvement. Through extensive experiments on 25 models and 4 tasks, the study validates the existence and rationality of the proposed framework. Furthermore, it explains the effectiveness of 10 CoT strategies and guides optimization from two perspectives.

We hope this work can provide a comprehensive understanding of the boundaries and optimization strategies for reasoning in LLMs.

🎯 Installation

1. Dataset Preparation

Load Dataset from Huggingface

import datasets
dataset = datasets.load_dataset("LightChen2333/BigGSM")

2. Install from git

Our code requires Python>=3.10

git clone https://github.com/LightChen233/reasoning-boundary.git && cd reasoning-boundary/
pip install -r requirements.txt

3. Evaluation for reproduction

python evaluate.py --data_split CoT

where --data_split can be selected from [CoT, Tool-Usage, PoT, Complex-CoT, LtM, MARP, PoT-MARP, gpt-4o, gpt-4o-MARP, o1-preview].

4. Evaluation for your results

python evaluate.py --data_split custom \
                   --K 0.301 \
                   --K2 0.92 \
                   --mode nl \
                   --result_path [PREDICTION_PATH]

PREDICTION_PATH consists the results predicted by model which save as jsonl format. Among them, each line of file must meet the following format:

{
    "index": "str",
    "pred": [
        {
            "role": "user",
            "content": [{"type": "text", "text": "str"}]
        },
        {
            "role": "assistant",
            "content": [{"type": "text", "text": "str"}]
        }
    ],
    "origin": {
        "index": "str",
        "question": "str",
        "answer": "str",
    }
}

πŸ–¨οΈFile Structure

root
β”œβ”€β”€ data                            # data folder where the BigGSM dataset is loaded
β”œβ”€β”€ experiment                      # All experimental data
β”‚   β”œβ”€β”€ RBF                         # Experimental results for RBF.
β”‚   └── RBF++                       # Experimental results under RBF++.
β”œβ”€β”€ utils                           # Tool library folder
β”‚   β”œβ”€β”€ data.py                     # Dataset loading class
β”‚   β”œβ”€β”€ request_tool.py             # API request tool
β”‚   └── tools.py                    # Common-used tools
β”œβ”€β”€ draw_bound_*.py                      # Draw reasoning boundary script
└── evaluate_*.py                     # Evaluation script

βœ’οΈ Reference

If you find this project useful for your research, please kindly consider citing the following paper:

@inproceedings{chen-etal-2024-rg,
    title = "Unlocking the Boundaries of Thought: A Reasoning Granularity Framework to Quantify and Optimize Chain-of-Thought",
    author = "Chen, Qiguang  and
      Qin, Libo  and
      Jiaqi, Wang  and
      Jinxuan, Zhou  and
      Che, Wanxiang",
    booktitle = "Proc. of NeurIPS",
    year = "2024",
}

πŸ“² Contact

Please create Github issues here or email Qiguang Chen if you have any questions or suggestions.