LARF: Layer-Aware Representation Filtering
July 17, 2025 ยท View on GitHub
Overview
This method identifies safety-sensitive layers within the LLM and leverages data bi-representation to detect safety-degrading data samples in the fine-tuning dataset.
Installation
-
Clone the repository:
git clone https://github.com/LLLeoLi/LARF.git cd LARF -
Install dependencies:
conda create --name LARF python=3.10 conda activate LARF pip install -r requirements.txt
Usage
1. Identify the Safety-Sensitive Layers
bash scripts/scaling_llama.sh
2. Filter Your Dataset with LARF
python get_bi_rep_llama.py --layer_num_start 10 --layer_num_end 11
3. Fine-tuning with LoRA
Train the model with custom datasets:
bash scripts/train_multipule_llama.sh
4. Safety Evaluation
Evaluate model outputs for safety:
bash scripts/llama_guard.sh
python llama_guard.py
5. Analysis
Open the Jupyter notebook for analysis:
jupyter notebook analysis.ipynb