Chain of Unconscious Thought (CoUT)

May 27, 2025 ยท View on GitHub

Overview

Large Reasoning Models (LRMs) achieve promising performance but compromise token efficiency due to verbose reasoning processes. Unconscious Thought Theory (UTT) posits that complex problems can be solved more efficiently through internalized cognitive processes.

Inspired by UTT, we propose a new reasoning paradigm, termed Chain of Unconscious Thought (CoUT), to improve the token efficiency of LRMs by guiding them to mimic human unconscious thought and internalize reasoning processes.

Concretely, we first prompt the model to internalize the reasoning by thinking in the hidden layer. Then, we design a bag of token-efficient strategies to further help models reduce unnecessary tokens yet preserve the performance. Our work reveals that models may possess beneficial unconscious thought, enabling improved efficiency without sacrificing performance.

Extensive experiments demonstrate the effectiveness of CoUT. Remarkably, it surpasses CoT by reducing token usage by 47.62% while maintaining comparable accuracy.

๐Ÿš€ Key Features

  • Token Efficiency: Reduces token usage by up to 47.62% compared to Chain-of-Thought (CoT)
  • Maintained Performance: Preserves accuracy while significantly improving efficiency
  • Multiple Datasets: Supports evaluation on 4 mathematical reasoning datasets
  • Flexible Baselines: Easy modification of baseline prompts for comparative studies
  • Zero-Shot Approach: Implements zero-shot methodology for practical real-world scenarios

๐Ÿ“Š Supported Datasets

Our framework supports evaluation on four mathematical reasoning datasets:

  1. GSM8K: Grade School Math 8K - Elementary mathematical word problems
  2. AQUA: Algebraic Question Answering - Multiple choice algebraic reasoning
  3. SVAMP: Simple Variations on Arithmetic Math word Problems
  4. MathQA: Mathematical reasoning with multiple choice questions

๐Ÿ› ๏ธ Installation

Prerequisites

  • Python 3.8 or higher
  • OpenAI API key (for GPT models)
  • Anthropic API key (for Claude models)

Environment Setup

  1. Clone the repository:
git clone https://github.com/Rohan-GRH/CoUT.git
cd CoUT
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up API keys:
# For OpenAI models
export OPENAI_API_KEY="your-openai-api-key"

# For Anthropic models  
export ANTHROPIC_API_KEY="your-anthropic-api-key"

๐Ÿ“‹ Requirements

The project requires the following Python packages:

  • anthropic==0.42.0 - Anthropic Claude API client
  • datasets==3.2.0 - Hugging Face datasets library
  • names-dataset==3.1.0 - Name datasets for evaluation
  • openai==1.59.4 - OpenAI API client
  • PyYAML==6.0.2 - YAML configuration file parsing
  • tqdm==4.67.1 - Progress bars for long-running processes

๐ŸŽฏ Usage

Basic Evaluation

Run evaluation on a specific dataset with CoUT method:

python evaluate.py --task gsm8k --model gpt-4o --prompt CoUT --shot 0

Available Options

  • Tasks: gsm8k, aqua, svamp, mathqa
  • Models: gpt-4o, sonnet, o3-mini, qwen-qwq-32b
  • Prompting Methods: baseline, cot, cod, CoUT, tale_ep
  • Shot: Always use 0 (zero-shot approach)

Example Commands


python evaluate.py --task gsm8k --model gpt-4o --prompt CoUT --shot 0


python evaluate.py --task gsm8k --model sonnet --prompt cot --shot 0 --max_samples 100 --api-key

python evaluate.py --task mathqa --model o3-mini --prompt tale_ep --shot 0 --max_samples max --url https://api.openai.com/v1/ --api-key



## ๐Ÿ”ง Customizing Prompts

You can easily modify baseline prompts by editing the configuration files in the `configs/` directory:

### Configuration Structure

Each dataset has separate configuration files for different methods:
- `{dataset}_baseline.yaml` - Basic prompting
- `{dataset}_cot.yaml` - Chain-of-Thought prompting  
- `{dataset}_cod.yaml` - Chain-of-Draft prompting
- `{dataset}_CoUT.yaml` - Our Chain of Unconscious Thought method
- `{dataset}_tale_ep.yaml` - TALE-EP method

### Example Configuration

```yaml
# configs/gsm8k_CoUT.yaml
system_prompt: |
  TOKEN CONSERVATION MODE ACTIVE. Use symbols/abbreviations when clear (e.g., &, w/, =, โ†’). 
  Omit articles (a, an, the) when meaning remains clear. Strip all non-essential words 
  including greetings, acknowledgments, and explanations. Each saved token equals +1 
  efficiency point while each accuracy error costs -100 efficiency points. Focus 
  exclusively on maximum precision with minimum verbosity.
  Return the answer at the end of the response after a separator ####.

format: |
  Q: {question}
  A: {answer}

fewshot: 
  - question: "Sample question"
    answer: "Sample answer"

๐Ÿ” TALE-EP Method Special Notes

Token Usage Characteristics of TALE-EP Method:

The TALE-EP method employs a two-stage query strategy:

  1. First Query: Ask the large language model how many tokens might be needed to solve the problem
  2. Second Query: Include the estimated token count in the prompt and query the model again for actual problem solving

Due to this dual-query mechanism, the TALE-EP method consumes two times the tokens in total.

Calculating Average Tokens for Second Query Only

If you want to know the average token usage for the second query only in the TALE-EP method, use the add_avg_tokens.py tool:

python add_avg_tokens.py <path_to_generated_json_file>

Example:

python add_avg_tokens.py ./results/20241201/gsm8k/20241201_143022_gsm8k-gpt-4o-tale_ep_detailed.json

This tool will automatically add an avg_second_query_tokens field to the results JSON file, showing the average token consumption for the second query.

๐Ÿ“ˆ Results Analysis

The framework automatically generates comprehensive results:

Output Files

Results are saved in ./results/{date}/{task}/ directory:

  • {timestamp}_{task}-{model}-{prompt}.csv - Summary statistics
  • {timestamp}_{task}-{model}-{prompt}_detailed.json - Detailed results

Metrics Reported

  • Accuracy: Percentage of correct answers
  • Average Token Usage: Mean tokens per query
  • Latency Statistics: P90, P95, P99 latency percentiles
  • Detailed Per-Sample Results: Individual question analysis

๐ŸŽฏ Zero-Shot Methodology

We adopt a zero-shot approach (--shot 0) for all evaluations because:

  1. Real-world Practicality: Few-shot examples are rarely available in practical applications
  2. Fair Comparison: Eliminates bias from example selection
  3. Efficiency Focus: Aligns with our goal of reducing token usage
  4. Generalization: Tests true model reasoning capabilities

๐Ÿ“ Project Structure

CoUT/
โ”œโ”€โ”€ configs/                 # Configuration files for different methods
โ”‚   โ”œโ”€โ”€ {dataset}_{method}.yaml
โ”œโ”€โ”€ tasks/                   # Dataset-specific task implementations
โ”‚   โ”œโ”€โ”€ base.py             # Base task class
โ”‚   โ”œโ”€โ”€ gsm8k.py            # GSM8K dataset handler
โ”‚   โ”œโ”€โ”€ aqua.py             # AQUA dataset handler
โ”‚   โ”œโ”€โ”€ svamp.py            # SVAMP dataset handler
โ”‚   โ””โ”€โ”€ mathqa.py           # MathQA dataset handler
โ”œโ”€โ”€ results/                # Generated result files
โ”œโ”€โ”€ evaluate.py             # Main evaluation script
โ”œโ”€โ”€ llm_client.py           # LLM API client
โ”œโ”€โ”€ utils.py                # Utility functions
โ””โ”€โ”€ requirements.txt        # Python dependencies

๐Ÿ“š Citation

If you find this work useful for your research, please consider citing our paper:

@misc{gong2025efficientreasoningchainunconscious, 
       title={Efficient Reasoning via Chain of Unconscious Thought},  
       author={Ruihan Gong and Yue Liu and Wenjie Qu and Mingzhe Du and Yufei He and Yingwei Ma and Yulin Chen and Xiang Liu and Yi Wen and Xinfeng Li and Ruidong Wang and Xinzhong Zhu and Bryan Hooi and Jiaheng Zhang}, 
       year={2025}, 
       eprint={2505.19756}, 
       archivePrefix={arXiv}, 
       primaryClass={cs.CL}, 
       url={https://arxiv.org/abs/2505.19756}
}

๐Ÿค Contributing

We welcome contributions! Please feel free to:

  1. Report bugs or issues
  2. Suggest new features or improvements
  3. Submit pull requests with enhancements
  4. Add support for new datasets or models

๐Ÿ™ Acknowledgments

  • Inspired by Unconscious Thought Theory (UTT) from cognitive psychology
  • Built upon the Chain-of-Thought reasoning paradigm
  • Thanks to the open-source community for the foundational tools and datasets

Note: This framework is designed for research purposes and requires appropriate API keys for the language models. Please ensure you comply with the respective API usage terms and conditions.