README.md

May 30, 2026 · View on GitHub

Efficient Reasoning with Balanced Thinking

Yulin Li^1†, Tengyao Tu^1,5†, Li Ding¹, Junjie Wang¹, Huiling Zhen², Yixin Chen⁴ Yong Li^3,5 Zhuotao Tian^1,6*
¹ Harbin Institute of Technology (Shenzhen)     ² Huawei Noah's Ark Lab     ³ Tsinghua University
⁴ The Chinese University of Hong Kong     ⁵ Zhongguancun Academy     ⁶ Shenzhen Loop Area Institute
^†Equal Contribution     ^*Corresponding Author

🎉 News

[2026.05.30] We add support for coding tasks with one-command inference and code evaluation.
[2026.05.04] We release the code and steering vectors for Qwen3-14B. Please enjoy~
[2026.03.19] We release an interactive online demo to intuitively showcase how our dynamic control function adjusts steering weights based on real-time model reasoning states. Try it out!
[2026.03.19] Our paper ranks #1 Paper of the Day on Hugging Face Daily Papers.
[2026.03.12] We release the code and steering vectors for DeepSeek-R1-Distill-Qwen (1.5B, 7B), QwQ-32B, and openPangu-Embedded-7B-V1.1. Happy coding!
[2026.01.26] Our paper has been accepted by ICLR 2026🎖️, with a 7.0 initial average rating (top 0.8% in Paper Copilot statistics) and a 10/10 review (confidence 5/5).
[2025.10.30] ReBalance is honored with the HUAWEI Spark Award 🥇.

🏆 Why ReBalance

Qualitative Comparison Quantitative Comparison

Balanced Thinking Unlocking Smarter Reasoning. Given the question ``For what real values of $x$ is $-4 < x^{4} + 4x^{2} < 21$ ?'', the model first obtains intervals $(-\sqrt{3}, 0)$ and $(0, \sqrt{3})$ , and then verifies if $x = 0$ is included. However, the model redundantly checks irrelevant values after correctly validating $x = 0$ , causing overthinking. Current mitigation methods overly suppress necessary reflection, leading to underthinking. Our ReBalance dynamically controls the reasoning state, effectively balancing these two extremes.
Superior Performance. ReBalance outperforms previous state-of-the-art methods across multiple mathematical reasoning datasets and model scales (0.5B–32B), reducing reasoning length while simultaneously improving accuracy.

🎯 Motivation

Motivation

Effects of overthinking mitigation on reasoning modes. We compare the distributions of reasoning lengths for correct and incorrect predictions before and after applying overthinking mitigation methods. The reduction in reasoning lengths for correct and incorrect predictions indicates the degree to which overthinking is mitigated and underthinking is introduced, respectively. Existing methods significantly introduce underthinking, whereas our ReBalance effectively achieves a balanced reduction of both.
Correlation between confidence and reasoning modes. We observe that the overthinking samples exhibit higher confidence variance compared to normal samples, while underthinking samples show persistently high confidence levels.

🌈 Method

Illustration of ReBalance

One-Pass Data Collection. We first perform offline one-pass data collection on a small-scale seen dataset. At each step, the steering vector is extracted at the first token of the specified layer based on confidence, and a dynamic function is fitted according to model behaviors.
Inference with Dynamic Steering. During deployment, the dynamic function outputs steering weights based on the model's real-time confidence online, thus balancing between overthinking and underthinking

🎨 Interactive Demo

3D surface of the model behavior-based dynamic control function

The fitted model behavior-based dynamic control function is visualized as a 3D surface above. As confidence signals evolve, the control function adaptively adjusts steering weight, which in turn shifts the model between overthinking mitigation and underthinking prevention.

We warmly welcome you to try our interactive demo, where you can manipulate different confidence signals and directly observe how the control function's steering behavior changes and finally affects the model's reasoning state.

🔥 TODO

Initialize Project.
Release the interactive demo.
Release the code and steering vectors for Qwen3-14B.

🚀 Quick Start

Easy Reproduction with Released Vectors

To facilitate quick deployment and reproducibility, we have released our pre-extracted steering vectors on Hugging Face 🤗.

Step 1. Download vectors from Hugging Face

Option 1: clone the full vector repository

git lfs install
git clone https://huggingface.co/Yulin-Li/ReBalance

Option 2: download only vectors/ with huggingface_hub

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="Yulin-Li/ReBalance",
    repo_type="model",
    allow_patterns="vectors/*",
    local_dir="."
)

Then place the downloaded vectors/ folder under your local project root as:

ReBalance/
├── ...
├── transformer_inference_steer_dp.py
└── vectors/
    ├── DeepSeek-R1-Distill-Qwen-1.5B/
    │   └── steer_vector_layer19_conf_mixed.pt
    ├── DeepSeek-R1-Distill-Qwen-7B/
    │   └── steer_vector_layer22_conf_mixed.pt
    ├── Qwen3-14B/
    │   └── steer_vector_layer34_conf_mixed.pt
    └── QwQ-32B/
        └── steer_vector_layer58_conf_mixed.pt

Step 2. Inference with dynamic steering

For Qwen3 models, use transformer_inference_steer_dp_qwen3.py instead. For coding tasks, use transformer_inference_steer_dp_code.py; Step 3 and Step 4 are not needed because it automatically aggregates shard outputs and runs code evaluation.

python transformer_inference_steer_dp.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
  --dataset_dir "./Data/" \
  --output_path "./outputs" \
  --dataset "Math_AIME2024" \
  --max_generated_tokens 16000 \
  --num_gpus 8 \
  --steer_vector_path ./vectors/DeepSeek-R1-Distill-Qwen-1.5B/steer_vector_layer19_conf_mixed.pt \
  --steer_layer 19 \
  --steer_coef -1

Step 3. Merge multi-GPU shards

python merge_shards.py \
  --dir ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024 \
  --base 'steer_temp0.7_maxlen16000'

Step 4. Evaluate merged outputs

python check.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
  --data_name "Math_AIME2024" \
  --generation_path "./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024/steer_temp0.7_maxlen16000.merged.jsonl"

Extract Steering Vectors Yourself

To better understand the underlying mechanisms of ReBalance or apply it to a broader range of models, you can conveniently obtain a lightweight steering vector (e.g., only 22 KB for QwQ-32B) in a single pass over a small-scale seen dataset.

Step 1. Extract hidden states and model confidence signals

python transformer_inference_dp.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-7B' \
  --dataset_dir "./Data" \
  --dataset "Math_Train" \
  --output_path "./outputs" \
  --max_generated_tokens 16000 \
  --num_gpus 8 \
  --trust_remote_code

Step 2. Automated best-layer selection for confidence modeling

python hidden_config_ridge.py \
  --jsonl_path ./outputs/DeepSeek-R1-Distill-Qwen-7B/Math_Train/origin_temp0.7_maxlen16000.merged.jsonl \
  --hidden_dir ./outputs/DeepSeek-R1-Distill-Qwen-7B/Math_Train/ \
  --layers all \
  --max_files 500 \
  --expected_offset 1 \
  --alpha 1.0 \
  --pca_components 64 \
  --test_size 0.2 \
  --random_state 42

Step 3. Extract steering vectors with automatic calibration

python hidden_analysis_auto.py \
  --layer_id 19 \
  --jsonl_path ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_Train/origin_temp0.7_maxlen16000.merged.jsonl \
  --hidden_dir ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_Train \
  --save_path  ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/steer_vector_layer19_conf_mixed.pt \
  --max_files 500 \
  --expected_offset 1

Step 4. Dynamic steering with your extracted vectors

For Qwen3 models, use transformer_inference_steer_dp_qwen3.py instead.

python transformer_inference_steer_dp.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
  --dataset_dir "./Data/" \
  --output_path "./outputs" \
  --dataset "Math_AIME2024" \
  --max_generated_tokens 16000 \
  --num_gpus 8 \
  --steer_vector_path ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/steer_vector_layer19_conf_mixed.pt \
  --steer_layer 19 \
  --steer_coef -1

Step 5. Merge multi-GPU shards

python merge_shards.py \
  --dir ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024 \
  --base 'steer_temp0.7_maxlen16000'

Step 6. Evaluate merged outputs

python check.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
  --data_name "Math_AIME2024" \
  --generation_path "./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024/steer_temp0.7_maxlen16000.merged.jsonl"

❤️ Acknowledgements

Our work builds upon the codebase of SEAL, DeepSeek-R1-Distill-Qwen, Qwen3, QwQ, and openPangu. We sincerely thank the authors for their remarkable contributions.

🙏 Citation

If you find ReBalance useful in your research, please cite our paper:

@article{li2026efficient,
  title={Efficient Reasoning with Balanced Thinking},
  author={Li, Yulin and Tu, Tengyao and Ding, Li and Wang, Junjie and Zhen, Huiling and Chen, Yixin and Li Yong and Tian, Zhuotao},
  booktitle={Proceedings of the 14th International Conference on Learning Representations},
  year={2026}
}