Unintended Misalignment from Agentic Fine-Tuning

August 28, 2025 ยท View on GitHub

main_figure

Table of Contents

Before You Start

Before getting started, please set your environment variables with the following commands:

echo "export PING_HOME=/path/to/current/directory" >> ~/.bashrc
source ~/.bashrc

After this, please refer to the instructions below for the next steps.

Fine-tuning

For detailed instructions on fine-tuning LLMs in the agentic domain, please refer to the Fine-tuning Guide.

Prefix Optimization

Evaluation

Each benchmark requires its own setup and configuration. Please refer to the manual for each benchmark to evaluate the agent.

Analysis

To train a linear probe, please follow the instructions in the manual to install the required dependencies and download the dataset.