Chain-of-Thought Tokens are Computer Program Variables

May 9, 2025 · View on GitHub

Banner by @solitaryzero

This is the repository for paper Chain-of-Thought Tokens are Computer Program Variables with code and scripts.

Setup

Before running the code, you should install the dependencies by:

pip install -e requirements.txt

We use Qwen-2.5-1.5B for our experiments. You can download the model by executing:

python ./src/preprocess/download_model.py

which will download the Qwen-2.5-1.5B model and save it to ./models/Qwen-2.5-1.5B.

You can also manually download the model and put it under ./models.

Training & Evaluation

The scripts for experiments in the paper are stored in ./scripts:

1. Preparation

Execute sh ./scripts/{task}/dataset.sh to generate synthetic datasets, where {task} belongs to {multiplication, dynaprog, full_dynaprog}

Notice: Steps 2-4 will take a long time. You may instead execute the corresponding scripts in ./scripts/multiplication and ./scripts/dynaprog for 4*4 multiplication and 5*5 DP results, which we will use for subsequent experiments.

2. Section 3.1

Execute

sh ./scripts/full_multiplication/run_plain.sh
sh ./scripts/full_multiplication/run_full_cot.sh

for multiplication task results, and execute

sh ./scripts/full_dynaprog/run_plain.sh
sh ./scripts/full_dynaprog/run_full_cot.sh

for DP task results.

3. Section 3.2

Execute

sh ./scripts/full_multiplication/run_compressed_cot.sh

for multiplication task results.

4. Section 3.3

Execute

sh ./scripts/full_multiplication/run_latent.sh
sh ./scripts/full_dynaprog/run_latent.sh

for multiplication and DP task results.

5. Draw figures for Section 3 (Optional)

After executing steps 2-4, you may run

sh ./scripts/full_multiplication/draw_results.sh
sh ./scripts/full_dynaprog/draw_results.sh

to obtain accuracy heatmaps.

The figures will be generated in ./results/full_multiplication/accuracy and ./results/full_dynaprog/accuracy.

6. Section 4.1

You should obtain 4*4 multiplication and 5*5 DP results first. Then, execute

sh ./scripts/{task}/intervened_dataset.sh
sh ./scripts/{task}/run_intervention.sh

to perform intervention experiments, where {task} belongs to {multiplication, dynaprog}

Run sh ./scripts/misc/draw_intervention.sh to generate intervention success rate figures.

Run sh ./scripts/multiplication/error_analysis.sh to generate error breakdown figures.

The figures will be generated as ./results/misc/intervene_accuracy.pdf and ./results/multiplication/error/error_breakdown.pdf.

7. Section 4.2

You should obtain 5*5 DP results first. Then, execute

sh ./scripts/dynaprog/probe_dataset.sh
sh ./scripts/dynaprog/probe.sh
sh ./scripts/dynaprog/draw_probe.sh
sh ./scripts/dynaprog/probe_breakdown.sh

for probing results.

The figures will be generated in ./results/dynaprog/probe_figure and ./results/dynaprog/probe_breakdown.

How to Run Experiments with Custom Models

You may use your custom model with two steps:

Put the model under ./models;
Change the --base_model parameter in different scripts to the new model path.