Make permanent

December 9, 2025 · View on GitHub

The First Framework for LLM-Driven Machine Design in Besiege

Paper: Agentic Design of Compositional Machines

Installation • Quick Start • Training • Citation

Demo animation

📋 Table of Contents

Overview
Installation
- Besiege Environment Setup
- AgenticFlow Installation
Quick Start
Fine-tuning
Performance Leaderboard
RL Fine-tuning Results
License
Acknowledgement
Citation

BesiegeField is a cutting-edge framework that enables Large Language Models (LLMs) to autonomously design and build complex machines in the Besiege physics-based game environment. This project bridges AI reasoning with creative engineering tasks.

🚀 Installation

1. Besiege Environment Setup

📦 System Requirements

Component	Version
Besiege	Linux v1.60-22044
Ubuntu	22.04
GLIBC	2.33 – 2.35
Mono	≥ 6.8.0.105

🎯 Obtain the Game

Step 1: Purchase the official copy on Steam

Step 2: Download DepotDownloader

Step 3: Download Besiege v1.60-22044

./DepotDownloader -app 346010 -depot 346016 -manifest 2732248020700221971 \
  -username <steam_user> -password <password>

Step 4: Download v1.20-17395 executables (required for headless operation)

./DepotDownloader -app 346010 -depot 346016 -manifest 5506301120812842666 \
  -username <steam_user> -password <password>

💡 Tip: Find other manifests on SteamDB if needed.

🔌 Download the Plugin

📥 BesiegeField Plugin (Google Drive)

🛠️ Install Dependencies

Standard Installation:

sudo apt install mono-complete xvfb  # xvfb only for headless workstation
mono --version  # Verify ≥ 6.8.0.105

📦 Offline/Manual Installation (click to expand)

If apt is unavailable, use manual installation:

# Install mono
cd /path/to/tar
tar -xzf mono-complete-offline.tar.gz
for deb in *.deb; do dpkg -x "$deb" .; done

export PATH="/path/to/mono/usr/bin:$PATH"
export LD_LIBRARY_PATH="/path/to/mono/usr/lib:$LD_LIBRARY_PATH"
export PKG_CONFIG_PATH="/path/to/mono/usr/lib/pkgconfig:$PKG_CONFIG_PATH"

# Make permanent
cat >> ~/.bashrc <<EOF
export PATH="/path/to/mono/usr/bin:\$PATH"
export LD_LIBRARY_PATH="/path/to/mono/usr/lib:\$LD_LIBRARY_PATH"
export PKG_CONFIG_PATH="/path/to/mono/usr/lib/pkgconfig:\$PKG_CONFIG_PATH"
EOF
source ~/.bashrc

# Install xvfb
cd /path/to/xvfb
tar -xzf xvfb-offline.tar.gz
dpkg -i *.deb

⚙️ Install BesiegeField Plugin

Step 1: Extract the plugin archive and copy all files into the v1.60-22044 game folder

Step 2: Copy Besiege.x86 & Besiege.x86_64 from v1.20-17395 into v1.60-22044, overwriting the originals

⚠️ Warning: This enables headless/code control but makes normal GUI start unstable. Keep a backup if you want to launch v1.60 visually.

Step 3: Set permissions

chmod -R 777 /path/to/Besiege

Step 4: Test the vanilla game (use backup copy)

cd /path/to/backup/Besiege && ./run.sh

2. AgenticFlow Installation

🐍 Create Conda Environment

conda env create -f environment_inferenceonly.yaml
conda activate <env_name>

📂 Path Configuration

Folder Structure:

your-project/
├── Besiege/                  # Game installation
└── AgenticCodes/             # Framework code

Edit AgenticCodes/config.py:

Parameter	Description
`APIPATH`	Path to file storing LLM type, API key, etc. Fill it in yourself.
`DEFAULT_SAVE_ROOT`	Root directory for LLM outputs
`SCRIPT_PATH`	Must point to `Besiege/run_besiegefield.sh`

🎯 Quick Start

🏹 Catapult Task

Design a machine to throw projectiles:

python main.py \
  -use_model deepseek-chat \
  -task catapult/catapult_level1 \
  -env_num 2 \
  -user_input "Design a machine to throw a boulder (type id 36) in a parabolic trajectory."

🚗 Car Task

Design a machine to move forward:

python main.py \
  -use_model deepseek-chat \
  -task car/car_level1 \
  -env_num 2 \
  -user_input "Design a machine to move forward on a straight road."

📝 Available Tasks

Explore all available tasks in environments/env_files/level_menus.json

🎮 Testing Your Designs

Generated .bsg machine files appear in DEFAULT_SAVE_ROOT
Copy them to Besiege/Besiege_Data/SavedMachines
Run ./run.sh to launch the game
Inspect and test your AI-designed machines in-game!

🔧 LLM Fine-tuning

📦 Install Training Environment

Add training-related packages:

conda activate <env_name>
pip install -r requirements_rl.txt

❄️ Cold Start Training

Step 1: Run Cold Start with Orthogonal Finetuning (Dataset will download from huggingface)

cd PostTraining/ColdStart
./run_cold_start.sh <model_path>

If you want to try cold start with human dataset (Not Recommended), you can run with:

cd PostTraining/ColdStart
./run_cold_start.sh <model_path> true

Step 2: Merge Checkpoints

Fill the paths in merge_ckpts.py before running:

python merge_ckpts.py

🎓 Reinforcement Learning

Configure rl_config.yaml with your settings (important!), then run:

cd PostTraining/RL
./rl_single_agent_light.sh

📊 Performance Leaderboard

🎯 Catapult Task

Performance metrics across different models and methods:

Models	Single-agent			Iterative Editing			Hierarchical Design
Models	Mean	Max	Std	Mean	Max	Std	Mean	Max	Std
Gemini 2.5 Pro	2.30	9.00	3.86	4.67	21.95	8.68	9.83	18.19	8.35
OpenAI o3	2.87	5.22	1.96	9.14	14.01	3.71	2.00	11.11	3.98
Qwen3-Coder-480B-A35B	1.75	9.24	3.17	5.10	12.02	5.54	3.90	6.52	2.54
Doubao Seed 1.6-250615	3.18	8.20	2.99	4.82	9.10	3.41	1.73	4.76	2.39
Claude Opus 4-20250514	1.19	4.82	2.21	1.18	4.91	2.18	2.27	9.32	4.22
DeepSeek-V3	3.50	4.86	2.17	3.07	5.24	2.55	2.41	4.93	2.58
Kimi K2-0711-preview	2.57	9.05	3.72	2.82	11.39	5.23	5.39	12.02	5.16
Llama 4 Scout 17B 16E	3.18	5.64	1.95	1.28	5.94	2.41	3.59	11.83	4.15

🚗 Car Task

Performance metrics across different models and methods:

Models	Single-agent			Iterative Editing			Hierarchical Design
Models	Mean	Max	Std	Mean	Max	Std	Mean	Max	Std
Gemini 2.5 Pro	33.96	40.85	6.73	34.34	41.66	13.96	29.96	41.52	7.78
OpenAI o3	15.28	32.08	8.97	14.34	35.08	11.79	28.39	36.18	11.01
Qwen3-Coder-480B-A35B	8.87	11.50	4.46	15.24	28.95	13.12	12.59	34.05	10.78
Doubao Seed 1.6-250615	3.51	9.40	4.85	8.11	10.04	3.58	18.75	26.02	4.38
Claude Opus 4-20250514	9.83	12.98	1.28	8.07	28.04	12.48	14.56	38.67	20.69
DeepSeek-V3	9.06	10.53	3.68	8.23	18.84	7.12	17.92	31.94	12.85
Kimi K2-0711-preview	1.75	8.09	2.80	14.36	28.34	9.47	1.94	14.99	5.48
Llama 4 Scout 17B 16E	0.02	0.03	0.01	3.04	12.76	5.23	1.55	2.00	0.32

🎓 RL-Finetuned LLM Results

Performance comparison of Qwen2.5-14B-Instruct model with different training strategies:

Models	Catapult			Car
Models	Validity Ratio	Mean Score	Max Score	Validity Ratio	Mean Score	Max Score
Qwen2.5-14B-Instruct	11/50	0.06	2.41	46/50	4.97	19.10
Qwen2.5-14B-Instruct + Cold-Start	9/50	0.11	5.54	40/50	4.67	20.23
Qwen2.5-14B-Instruct + RL	12/50	0.13	5.92	41/50	3.72	24.08
Qwen2.5-14B-Instruct + Cold-Start + RL	11/50	0.14	7.14	42/50	5.05	45.72

📚 Citation

If you find this repository useful for your research or projects, please consider citing our work:

@article{zhang2025besiegefield,
  title={Agentic Design of Compositional Machines},
  author={Zhang, Wenqian and Liu, Weiyang and Liu, Zhen},
  journal={arXiv preprint arXiv:2510.14980},
  year={2025}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👏 Acknowledgement

We’d like to thank the developers of Besiege for creating such an inspiring game and for nurturing such a vibrant player community — without them, this project wouldn’t exist.

Big thanks also to the BepInEx team for their amazing modding framework, which made it possible for us to push the boundaries of what’s possible in Besiege.

⭐ Star History

If you find this project helpful, please consider giving it a star! ⭐

📄 License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License - see the LICENSE file for details.