Semantic Similarity Based Dynamic Pruning (SSDP) for Tree-of-Thought Reasoning

October 21, 2025 ยท View on GitHub

Python 3.8+ PyTorch

This repository contains the official implementation of Semantic Similarity Based Dynamic Pruning (SSDP) for Tree-of-Thought reasoning. SSDP is an advanced method that improves the efficiency of large language model reasoning by dynamically pruning semantically similar nodes in the reasoning tree, reducing computational overhead while maintaining reasoning quality.

This implementation is built on top of the Dynamic Parallel Tree Search (DPTS) framework with significant enhancements for semantic similarity-based pruning.

Original DPTS Paper:

Dynamic Parallel Tree Search for Efficient LLM Reasoning
Authors: Ding, Yifu and Jiang, Wentao and Liu, Shunyu and Jing, Yongcheng and Guo, Jinyang and Wang, Yingjie and Zhang, Jing and Wang, Zengmao and Liu, Ziwei and Du, Bo and Liu, Xianglong and Tao, Dacheng
arXiv: 2502.16235

Our SSDP Paper: ๐Ÿ“„ Paper: Chopping Trees: Semantic Similarity Based Dynamic Pruning for Tree-of-Thought Reasoning

๐Ÿš€ Key Features

  • Semantic Clustering: Uses embedding models to identify and cluster semantically similar reasoning paths
  • Dynamic Pruning: Intelligently prunes redundant nodes based on similarity thresholds
  • Multi-GPU Support: Efficient distributed inference across multiple GPUs
  • Flexible Configuration: Easy-to-use JSON configuration system
  • Multiple Datasets: Support for GSM8K, MATH, and other reasoning benchmarks

๐Ÿ“‹ Table of Contents

๐Ÿ› ๏ธ Installation

Prerequisites

  • Python 3.8 or higher
  • CUDA-compatible GPU (recommended)
  • 16GB+ RAM (for large models)

Setup

  1. Clone the repository:

    git clone https://github.com/your-username/SSDP.git
    cd SSDP
    
  2. Run the setup script:

    bash setup.sh
    

    This will:

    • Upgrade pip and install build tools
    • Install PyTorch with CUDA support
    • Install all required dependencies
  3. Verify installation:

    python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
    

๐Ÿš€ Quick Start

Single GPU Inference

# Navigate to scripts directory and run single GPU script
cd scripts
bash single_run.sh

Multi-GPU Inference

# Navigate to scripts directory and run multi-GPU script
cd scripts
bash multi_gpu_run.sh

Basic Example

# Example with GSM8K dataset using single GPU script
cd scripts
bash single_run.sh

โš™๏ธ Configuration

SSDP uses JSON configuration files to control behavior. Key parameters include:

Core SSDP Parameters

{
    "config": {
        "enable_clustering": true,
        "clustering_threshold": 0.75,
        "clustering_method": "cosine_similarity",
        "embedding_model": "sentence-transformers",
        "tree_width": 4,
        "tree_depth": 16,
        "max_rollout": 20
    }
}

Available Configuration Files

  • configs/inference/SSDP.json - Full SSDP configuration with clustering
  • configs/inference/DPTS.json - Base DPTS configuration without clustering

Key Parameters

ParameterDescriptionDefault
enable_clusteringEnable semantic clusteringtrue
clustering_thresholdSimilarity threshold for pruning (0.0-1.0)0.75
clustering_methodClustering algorithm"cosine_similarity"
tree_widthMaximum tree width4
tree_depthMaximum tree depth16
max_rolloutMaximum reasoning steps20

๐Ÿ“Š Usage Examples

Configuring the Scripts

Before running experiments, you need to configure the model and dataset parameters in the bash scripts:

Single GPU Script (single_run.sh)

Edit the following variables in single_run.sh:

# Dataset configuration
DATASET_NAME=gsm8k                    # Options: gsm8k, math, gsm8ktoy, mathtoy, math100, gsm8k100, gsm8k500
MODEL_NAME=qwen-1.5b                  # Model name (e.g., qwen-1.5b, qwen-7b, etc.)
REWARD_MODEL=mistral_prm-7b          # Reward model for evaluation

# Experiment configuration
work_dir=./results                    # Output directory
exp_name=test                        # Experiment name

# Config file (modify the --config line in the python command)
python3 main.py \
    --config configs/inference/SSDP.json \  # Change this to your desired config
    --work-dir $work_dir \
    --exp-name $exp_name \
    --data $DATASET_NAME \
    --model $MODEL_NAME \
    --reward_model $REWARD_MODEL \
    --dtype bfloat16 \
    --flash-attn \
    --debug

Multi-GPU Script (multi_gpu_run.sh)

The multi-GPU script accepts command-line arguments:

# Usage: bash multi_gpu_run.sh [DATASET_NAME] [MODEL_NAME] [REWARD_MODEL] [EXP_NAME] [WORK_DIR] [CONFIG_FILE]

# Example with custom parameters
bash multi_gpu_run.sh gsm8k qwen-1.5b mistral_prm-7b my_experiment ./outputs configs/inference/SSDP.json

Running on Different Datasets

# GSM8K mathematical reasoning (single GPU)
# Edit single_run.sh: DATASET_NAME=gsm8k
cd scripts
bash single_run.sh

# MATH competition problems (multi-GPU)
cd scripts
bash multi_gpu_run.sh math qwen-1.5b mistral_prm-7b math_experiment

# Small toy datasets for testing
cd scripts
bash multi_gpu_run.sh gsm8ktoy qwen-1.5b mistral_prm-7b toy_test

Custom Configuration Files

Create your own configuration file based on the existing ones:

For Single GPU Script:

# Copy and modify existing config
cp configs/inference/SSDP.json configs/inference/my_config.json

# Edit my_config.json with your parameters
# Then edit single_run.sh and change the --config line:
# --config configs/inference/my_config.json \

cd scripts
bash single_run.sh

For Multi-GPU Script:

# Copy and modify existing config
cp configs/inference/SSDP.json configs/inference/my_config.json

# Edit my_config.json with your parameters
# Then use it with multi-GPU script
cd scripts
bash multi_gpu_run.sh gsm8k qwen-1.5b mistral_prm-7b my_experiment ./outputs configs/inference/my_config.json

๐Ÿ“ Dataset Support

SSDP supports multiple reasoning datasets:

  • GSM8K: Grade school math problems
  • MATH: Competition-level mathematics
  • Custom: User-defined datasets

Dataset Configuration

Datasets are automatically loaded based on the --data parameter. Each dataset includes:

  • Problem statements
  • Ground truth solutions
  • Evaluation metrics

๐Ÿ“ˆ Results and Output

SSDP generates comprehensive output including:

  • Results: Detailed reasoning paths and final answers
  • Metrics: Accuracy, efficiency, and clustering statistics
  • Logs: Detailed execution logs and performance metrics
  • Configurations: Complete experiment configuration

Output Structure

outputs/
โ”œโ”€โ”€ config.json              # Experiment configuration
โ”œโ”€โ”€ results-*.json           # Detailed results
โ”œโ”€โ”€ evaluation_results.json  # Accuracy metrics
โ”œโ”€โ”€ inference_metrics.json   # Performance metrics
โ””โ”€โ”€ clustering_metrics.json  # Clustering efficiency

๐Ÿ”ฌ Citation

If you use SSDP in your research, please cite both our paper and the original DPTS work:

Our SSDP Paper:

@article{ssdp2025,
  title={Chopping Trees: Semantic Similarity Based Dynamic Pruning for Tree-of-Thought Reasoning},
  author={Joongho Kim, Xirui Huang, Zarreen Reza, Gabriel Grand, Kevin Zhu, Ryan Lagasse},
  journal={NeurIPS 2025 Workshop on Efficient Reasoning},
  year={2025}
}

Original DPTS Paper (Please also cite):

@article{ding2025dynamic,
  title={Dynamic parallel tree search for efficient llm reasoning},
  author={Ding, Yifu and Jiang, Wentao and Liu, Shunyu and Jing, Yongcheng and Guo, Jinyang and Wang, Yingjie and Zhang, Jing and Wang, Zengmao and Liu, Ziwei and Du, Bo and Liu, Xianglong and Tao, Dacheng},
  journal={arXiv preprint arXiv:2502.16235},
  year={2025}
}

๐Ÿ› Troubleshooting

Common Issues

CUDA Out of Memory:

# Reduce batch size or model size
# Try using a smaller model or reducing tree_width/tree_depth in config

Installation Issues:

# Clean installation
pip uninstall torch torchaudio torchvision -y
pip install torch==2.1.2 torchaudio==2.1.2 torchvision==0.16.2
pip install -r requirements.txt

Model Loading Issues:

  • Ensure model paths are correct
  • Check model compatibility with your hardware
  • Verify sufficient disk space for model weights

Performance Optimization

  • Adjust clustering_threshold to balance efficiency and accuracy
  • Use multiple GPUs with multi_gpu_run.sh for faster processing
  • Reduce tree_width and tree_depth in config for faster experimentation

๐Ÿ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

The Apache License 2.0 provides:

  • โœ… Permissive licensing - Allows commercial and non-commercial use
  • โœ… Patent protection - Explicit patent grant from contributors
  • โœ… Attribution required - Must include copyright notice
  • โœ… Modification allowed - Can create derivative works
  • โœ… Distribution allowed - Can redistribute with or without changes

๐Ÿ™ Acknowledgments

We gratefully acknowledge the original Dynamic Parallel Tree Search (DPTS) team for their foundational work. This implementation extends their framework with semantic similarity-based pruning capabilities. Please refer to the original DPTS paper for the theoretical foundations and baseline implementation.

Original DPTS Repository: https://github.com/yifu-ding/DPTS
Original DPTS Paper: Dynamic Parallel Tree Search for Efficient LLM Reasoning