SWAP: Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model

April 15, 2026 · View on GitHub

This repository contains the code for the paper [ACL 25 (main)] Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model.

Overview

SWAP introduces a structure-aware planning framework for deliberate multi-step reasoning in language models. The framework consists of two core components:

  • Generator
  • Discriminator

Within this framework, the generator is repurposed to serve three roles:

  • Policy model (MπM_{\pi})
  • World model (MwmM_{\text{wm}})
  • Controller (McM_{\text{c}})

Given a goal GG and an initial state (s0,g0)(s_0, g_0), SWAP operates as follows:

  1. Planning: The policy model MπM_{\pi} generates an optimized plan HH.
  2. Action generation: Using GG, HH, and the current state (st,gt)(s_t, g_t), the policy model proposes the next action ata_t through deliberate planning.
  3. State prediction: The world model MwmM_{\text{wm}} predicts the next state st+1s_{t+1} and updates the entailment graph gt+1g_{t+1}.
  4. Control: Based on GG and the updated state (st+1,gt+1)(s_{t+1}, g_{t+1}), the controller McM_{\text{c}} decides whether to continue the reasoning process or output the final answer.

During this process, the generator explores multiple candidate actions, and the discriminator evaluates the resulting partial trajectories to determine which trajectory should be continued.

SWAP performs multi-step reasoning through structure-aware planning in tasks such as FOLIO (left) and MATH (right). At each step, given the current state (represented as a graph) and an action, the world model predicts the next state as an updated graph. The policy model is guided by this graph to propose the next action.

Quick Start

Directory structure

SWAP/
├── model_weights/
├── output/
├── script/
└── src/

Setup

git clone https://github.com/xiongsiheng/SWAP.git
cd SWAP

# Create and activate the training environment
conda create -n swap_train python=3.10 -y
conda activate swap_train

# Install training dependencies
pip install -r requirements_train.txt

# Create and activate the evaluation environment
# vLLM is used to substantially accelerate evaluation
conda create -n swap_eval python=3.10 -y
conda activate swap_eval

# Install evaluation dependencies
pip install -r requirements_eval.txt

Training

# Train the generator
bash script/train_sft_generator_gsm8k.sh

# Train the discriminator
bash script/train_sft_discriminator_gsm8k.sh

# Optional: distributed training
bash script/train_sft_discriminator_gsm8k_dist.sh

# Optional: DPO training
bash script/train_dpo_discrimintor_gsm8k.sh

Evaluation

# Evaluate the generator (without planning)
bash script/eval_generator_gsm8k.sh

# Evaluate the full system
bash script/eval_system_gsm8k.sh

# Optional: distributed evaluation
CUDA_VISIBLE_DEVICES=0 NUM_SHARDS=4 SHARD_INDEX=0 bash script/eval_system_gsm8k.sh
CUDA_VISIBLE_DEVICES=1 NUM_SHARDS=4 SHARD_INDEX=1 bash script/eval_system_gsm8k.sh
CUDA_VISIBLE_DEVICES=2 NUM_SHARDS=4 SHARD_INDEX=2 bash script/eval_system_gsm8k.sh
CUDA_VISIBLE_DEVICES=3 NUM_SHARDS=4 SHARD_INDEX=3 bash script/eval_system_gsm8k.sh

# Optional: download our checkpoints

For detailed descriptions of the available arguments and configuration options, please refer to the source code.

Datasets & Checkpoints

All datasets used in SWAP (GSM8K, MATH, FOLIO, ReClor, HumanEval, MBPP) with trajectory and process supervision are available here:

from datasets import load_dataset

dataset = load_dataset("sxiong/SWAP", "gsm8k_trajectory")
print(dataset)
split = dataset["train"]

We also provide the corresponding checkpoints.

In addition, we release an updated version of datasets and provide the corresponding checkpoints.

Citation

@inproceedings{xiong2025deliberate,
  title={Deliberate reasoning in language models as structure-aware planning with an accurate world model},
  author={Xiong, Siheng and Payani, Ali and Yang, Yuan and Fekri, Faramarz},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={31900--31931},
  year={2025}
}