Download data for ALFWorld environment
May 30, 2025 ยท View on GitHub
SPA-RL-Agent
The repository contains the codes for our paper "SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution"
This paper introduces Stepwise Progress Attribution (SPA), a novel reward redistribution framework that provides fine-grained intermediate rewards by decomposing delayed rewards into incremental, per-step contributions, enabling more effective reinforcement learning for complex, multi-step agent tasks (Webshop, ALFWorld and VirtualHome).
โจKey advantages include:
- Fine-grained Reward Redistribution: Effectively decomposes delayed rewards into intermediate rewards, reflecting incremental progress toward task completion.
- Effective Multi-turn RL Training: Utilizes these intermediate rewards in PPO, achieving outstanding performance on complex long-horizon tasks.
๐News
- [2025.05.28] ๐ฅ Release our paper on arXiv. See here.
- [2025.05.26] ๐ SPA-RL-Agent Repo launched!
๐Contents
โ๏ธ Setup
# Create virtual environment for agentic evaluation
conda create -n SPA python=3.9
conda activate SPA
pip install -r requirements.txt
# Download webshop environment setting
cd envs/webshop
pip install -e .
python -m spacy download en_core_web_lg
conda install -y -c conda-forge openjdk=11
# Download data for WebShop environment
gdown https://drive.google.com/uc?id=1G_0ccLWn5kZE5rpeyAdh_YuoNzvBUjT9
gdown https://drive.google.com/uc?id=11zOUDkJSgGhYin9NxQtG8PVpDsika86y
unzip data.zip
mkdir search_index
unzip indexes.zip -d search_index/
# Download data for ALFWorld environment
cd ../..
cd eval_agent/data/alfworld
gdown https://drive.google.com/uc?id=1y7Vqeo0_xm9d3I07vZaP6qbPFtyuJ6kI
unzip alfworld_data.zip
# Download data for VirtualHome environment
cd ../..
gdown https://drive.google.com/uc?id=1kZKWkWhtJ-DneqfS1Nb_FybR1RBxPeef
unzip virtualhome_master.zip
# Download expert trajectories for Webshop, ALFWorld and VirtualHome environment
cd ../..
gdown https://drive.google.com/uc?id=1_tBMDixZcIjKuv-LExNllha-YIRxhKIq
unzip data.zip
Create another virtual environment for RL training due to package conflicts:
conda create -n RL_train python=3.10
conda activate RL_train
pip install -r ppo/requirements.txt
โ๏ธ Usage
๐ค Base Agent SFT Training
conda activate SPA
cd sft
# For ALFWorld environment
bash alfworld_llama3b.sh
# For Webshop environment
bash webshop_llama3b.sh
# For VirtualHome environment
bash virtualhome_llama3b.sh
โ ๏ธ Note that the bash scripts provide the hyperparameters to reproduce our results. You should modify the settings, such as the model path, according to your own environment.
๐น๏ธ Explored Trajectories Collection
cd ..
# For ALFWorld environment
bash exploration/alfworld/my_generate_response.sh
# For WebShop environment
bash exploration/webshop/my_generate_response_webshop.sh
๐ Progress Estimator Training
To orgainize the exploration data for progress estimator training, please run the following scripts firstly.
python prm/data_org.py
Then you could run the following script to train progress estimator.
deepspeed --include=localhost:0,1,2,3 prm/train_our_progress_model.py
๐คทโโ๏ธ Stepwise Progress Prediction
python prm/inference_prm.py
๐ช๐ฝ RL Training
To organize the inference data for RL training, please run the following script first.
python prm/rl_data_org.py
Next, execute the following script to perform reinforcement learning training using LoRA:
conda activate RL_train
bash ppo/train_ppo.sh
๐ฎ Evaluation
Before evaluation, we need to merge the LoRA weights with the original LLM weights to obtain the final model:
python ppo/merge.py
Then, we would run the evaluation scripts:
conda activate SPA
# For ALFWorld environment
bash eval/llama3_2_3b_eval_alfworld.sh
# For WebShop environment
bash eval/llama3_2_3b_eval_webshop.sh
# For VirtualHome environment
bash eval/llama3_2_3b_eval_virtualhome.sh
Running Baselines
TODO
๐น Acknowledgement
Our code implementation is based on ETO and steptool. We thank them for their great work.
Also very thankful for my wonderful co-authors: Chak Tou Leong, Jiashuo Wang, Jian Wang, Wenjie Li.
๐ Citation
If you find our work useful in your research, please consider citing our paper:
@article{wang2025spa,
title={SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution},
author={Wang, Hanlin and Leong, Chak Tou and Wang, Jiashuo and Wang, Jian and Li, Wenjie},
journal={arXiv preprint arXiv:2505.20732},
year={2025}
}