Activation-Steered Compression (ASC)

January 30, 2026 Β· View on GitHub

Activation-Steered Compression (ASC) is a training-free method that compresses verbose reasoning in Large Language Models (LLMs) at inference time by manipulating internal activations. It achieves substantial reductions in Chain-of-Thought (CoT) length while preserving, or even improving, answer accuracy β€” enabling faster, more efficient, and cost-effective deployment of reasoning models.

πŸ“„ This repository accompanies our paper:
Activation Steering for Chain-of-Thought Compression

πŸš€ Overview

Chain-of-Thought prompting improves reasoning but often leads to:

  • Verbose explanations
  • Redundant reasoning steps
  • Increased token usage and latency

ASC addresses this inefficiency by:

  • Extracting a steering vector from paired verbose vs. concise rationales
  • Injecting it into the model’s residual stream at inference time
  • Compressing CoTs without retraining or fine-tuning

🧠 Key Features

  • βš™οΈ Training-free: Works on any model without parameter updates
  • πŸ’‘ Concise reasoning: Reduces CoT length by up to 67%
  • ⚑ Efficient inference: Up to 2.73Γ— speedup in wall-clock time
  • πŸ§ͺ Model-agnostic: Works across 7B, 8B, and 32B parameter models
  • πŸ“ Theoretical guarantees: KL-bounded scaling ensures safe intervention

πŸ“Š Results Summary

Performance Comparison: CoT vs. ASC

ModelMethodMATH500 Acc. (%)MATH500 TokensGSM8K Acc. (%)GSM8K Tokens
Deepseek-R1-Distill-Qwen-7BCoT88.8398488.61080
ASC89.0154388.6536
Deepseek-R1-Distill-LLaMA-8BCoT89.2355489.12610
ASC89.2235389.3850
QwQ-32BCoT93.8450896.51530
ASC94.2222296.4830

πŸ› οΈ Setup

git clone https://github.com/ArminAzizi98/ASC.git
cd ASC
pip install -r requirements.txt

πŸ§ͺ Inference Example

The easiest way to use ASC during inference is with the --steering flag:

python -u generate.py \
  --model_name "Qwen/QwQ-32B" \
  --problem '''Define
\[p = \sum_{k = 1}^\infty \frac{1}{k^2} \quad \text{and} \quad q = \sum_{k = 1}^\infty \frac{1}{k^3}.\]
Find a way to write
\[\sum_{j = 1}^\infty \sum_{k = 1}^\infty \frac{1}{(j + k)^3}\]
in terms of $p$ and $q.$''' \
  --steering

You may provide any math problem as the --problem argument.

🧭 Creating Steering Vectors

To generate a steering vector for a new model or domain, follow these steps:

  1. Generate Concise CoTs using GPT-4o
    Requires access to the OpenAI ChatGPT API. This script prompts GPT-4o to produce math-centric, minimal-English rationales.
    python generate_short_cots.py
    
    
  2. Generate Verbose CoTs using the Target Model The following script generates standard chain-of-thought (CoT) outputs from your chosen reasoning model.
    python generate_long_cots.py
    
    
  3. Extract the Steering Vector Use this script to compute the activation-space vector that maps verbose to concise reasoning, based on the CoT pairs.
    python extract_steering_vector.py
    
    

βœ… Supported Models

The following models have been tested and are currently supported by ASC:

  • deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
  • deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  • Qwen/QwQ-32B

ℹ️ More models will be added soon. Contributions are welcome!