[NeurIPS 2025 Spotlight ๐ฅ] Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
March 16, 2026 ยท View on GitHub
๐ Paper: arXiv
BDS (Bayesian Data Scheduler) is an adaptive defense framework against harmful fine-tuning for Large Language Models (LLMs). It implements a novel approach to data scheduler that enhances safety during the fine-tuning process.
The pipeline of BDS is shown below. A brief workflow is illustrated here.

Representative experimental results:
- Figure 1: Data scheduling dynamics under low and high harmful ratios.
- Figure 6: Weight distributions under low and high harmful ratios.

๐ Installation
Environment Setup
-
Install
conda env create -f /content/environment.yml conda activate bds pip install -e ./OpenRLHFBase/ pip install -e . -
Datasets
Datasets json files are provided in
./run/scripts/datasets.Or you can construct the datasets according the scripts in
run/sst2,run/gsm8k,run/agnews,run/alpaca.
โ๏ธ Configuration
Edit Configuration
Edit ./run/scripts/config.sh with your actual values:
# API Keys and Tokens
export HUGGINGFACE_TOKEN="your_huggingface_token_here"
export WANDB_API_KEY="your_wandb_api_key_here"
export WANDB_PROJECT="your_project_name"
# Paths
export PREFIX_DIR="/path/to/your/bds/project"
๐ Quick Start
Training and Evaluation Template
Run the main script (including training, visualization, and evaluation):
bash ./run/scripts/0_train_eval_dbs.sh
๐ Project Structure
bds/
โโโ analysis/ # Analysis and visualization tools
โ โโโ mountain_range_plotter.py # Mountain Range visualization
โ โโโ score_analyzer.py # Score analysis
โ โโโ llama2guard_analyzer.py # LlamaGuard analysis
โโโ bds/ # Core BDS package
โ โโโ datasets/ # Dataset classes
โ โโโ models/ # Model definitions
โ โโโ trainer/ # Training logic
โ โโโ utils/ # Utilities
โโโ run/ # Run scripts and datasets
โ โโโ scripts/ # Training scripts
โ โ โโโ config.sh # Configuration template
โ โ โโโ 0_train_eval_dbs.sh # Main training+eval script
โ โ โโโ datasets/ # Dataset files
โ โโโ sst2/ # SST-2 dataset scripts
โ โโโ alpaca/ # Alpaca dataset scripts
โ โโโ gsm8k/ # GSM8K dataset scripts
โ โโโ agnews/ # AG News dataset scripts
โ โโโ poison/ # Poison evaluation scripts
โโโ environment.yml # Python dependencies
โโโ setup.py # Package setup
๐ง Analysis Tools
1. Score Analyzer (analysis/score_analyzer.py)
Analyzes and processes scoring data from training checkpoints.
Usage:
python analysis/score_analyzer.py --path /path/to/checkpoint --transformation softmax
Parameters:
--path: Path to the checkpoint directory--transformation: Transformation type (softmax, linear, etc.)
2. Mountain Range Plotter (analysis/mountain_range_plotter.py)
Creates Mountain Range style visualizations of training progress.
Usage:
python analysis/mountain_range_plotter.py --path /path/to/checkpoint --step 100 --flag all
Parameters:
--path: Path to the checkpoint directory--step: Step size for visualization--flag: Data filter (all, ft, harmful)--transformation: Transformation type
3. LlamaGuard Analyzer (analysis/llama2guard_analyzer.py) (Optional)
Analyzes model outputs using LlamaGuard for safety evaluation.
Usage:
python analysis/llama2guard_analyzer.py