Safety at One Shot: Patching Fine-Tuned LLMs with a Single Instance

January 5, 2026 ยท View on GitHub

Our experiments require a minimum of 4 A100/H100 80GB GPUs, as we perform full parameter fine-tuning with a batch size of 64. The experiments can also utilize up to 8 CPU cores and 256GB of RAM.


Demo

An end-to-end demo illustrating our key observations, including how benign fine-tuning can compromise safety, how patching with a single instance can restore it, and the corresponding ASR and task performance evaluations, is available at notebooks/demo.ipynb.


Full Experiment Suite (Optional)

To run the full experiment suite, we recommend using Python 3.10. Install the required dependencies using one of the following methods:

pip install -r requirements.txt
conda env create -f environment.yml
conda activate oneshot-alignment

After installation, see the detailed instructions in src/README.md for script descriptions and execution commands.