UniDetox
April 14, 2025 · View on GitHub
Codes for UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation.
Authors:
Huimin Lu1, Masaru Isonuma1,2,3, Junichiro Mori1,4, and Ichiro Sakata1
1University of Tokyo
2University of Edinburgh
3National Institute of Informatics (NII)
4RIKEN AIP
0. Reproduce the Environment
conda env create --name unidetox -f environment.yml
conda activate unidetox
Potential GLIBCXX Issue
On some Linux systems, you may encounter an error about GLIBCXX_3.4.29 not found.
This happens if your system’s library paths overshadow conda’s newer libstdc++.so.6.
To ensure the conda environment’s libraries take priority, you can do:
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
1+2. Obtain a Toxic Model and Distill Detoxifying Data
python -m unidetox.toxic_gpt2_finetune_and_distill \
--base_model_name gpt2-xl \
--output_dir ./toxic_model \
--auth_token "enter_your_huggingface_auth_token_here_to_load_DGHS_dataset" \
--epochs 3 \
--lr 1e-5 \
--batch_size 4
3. Fine-tune Model(s) for Detoxification
python -m unidetox.unidetox --mode finetune \
--mode finetune \
--auth_token "enter_your_huggingface_auth_token_here_to_use_LLaMA2" \
--target_model "gpt2-xl"
Reproduce our Evaluation Results of GPT-2 XL
python -m unidetox.unidetox --mode evaluate \
--mode evaluate \
--auth_token "enter_your_huggingface_auth_token_here_to_use_LLaMA2" \
--target_model "gpt2-xl"
If you find our work helpful, please cite our paper:
@inproceedings{
lu2025unidetox,
title={UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation},
author={Huimin LU and Masaru Isonuma and Junichiro Mori and Ichiro Sakata},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=eLLBILFRsA}
}