LARF: Layer-Aware Representation Filtering

July 17, 2025 ยท View on GitHub

Overview

This method identifies safety-sensitive layers within the LLM and leverages data bi-representation to detect safety-degrading data samples in the fine-tuning dataset.

Installation

  1. Clone the repository:

    git clone https://github.com/LLLeoLi/LARF.git
    cd LARF
    
  2. Install dependencies:

    conda create --name LARF python=3.10
    conda activate LARF
    pip install -r requirements.txt
    

Usage

1. Identify the Safety-Sensitive Layers

bash scripts/scaling_llama.sh

2. Filter Your Dataset with LARF

python get_bi_rep_llama.py --layer_num_start 10 --layer_num_end 11

3. Fine-tuning with LoRA

Train the model with custom datasets:

bash scripts/train_multipule_llama.sh

4. Safety Evaluation

Evaluate model outputs for safety:

bash scripts/llama_guard.sh
python llama_guard.py

5. Analysis

Open the Jupyter notebook for analysis:

jupyter notebook analysis.ipynb