LARF: Layer-Aware Representation Filtering

July 17, 2025 · View on GitHub

Overview

This method identifies safety-sensitive layers within the LLM and leverages data bi-representation to detect safety-degrading data samples in the fine-tuning dataset.

Installation

Clone the repository:

git clone https://github.com/LLLeoLi/LARF.git
cd LARF

Install dependencies:

conda create --name LARF python=3.10
conda activate LARF
pip install -r requirements.txt

Usage

1. Identify the Safety-Sensitive Layers

bash scripts/scaling_llama.sh

2. Filter Your Dataset with LARF

python get_bi_rep_llama.py --layer_num_start 10 --layer_num_end 11

3. Fine-tuning with LoRA

Train the model with custom datasets:

bash scripts/train_multipule_llama.sh

4. Safety Evaluation

Evaluate model outputs for safety:

bash scripts/llama_guard.sh
python llama_guard.py

5. Analysis

Open the Jupyter notebook for analysis:

jupyter notebook analysis.ipynb