ProFit
January 17, 2026 Β· View on GitHub
Official code for paper "ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection"
[π Paper] β’ [π± GitHub]
This repo contains the code for our paper: ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection.
Quick Start
The training codes are built on LLaMA-Factory. We employ OpenCompass for evaluation. Both are Tremendous projects, and you can find nearly everything there, thanks to their great framework and beautiful code!
Environment
git clone https://github.com/Utaotao/ProFit
cd ProFit
pip install -e ".[torch,metrics]"
pip install torch==2.9.1 transformers==4.57.1 deepspeed==0.16.9
Please refer to LLaMA Factory for more details.
Training Data
We use the Shadow 2K dataset and save it at data/Shadow_2k.parquet.
You can download via this link.
For custom datasets, remember to add information at data/dataset_info.json.
For Train
Set BASE_MODEL, BASE_OUTPUT_DIR, and other parameters in run.sh, then:
bash run.sh
Set BASE_MODEL="" to download the model from Huggingface, rather than using a local file.
The training script will automatically:
- Create output directories
- Start training with ProFit loss function
- Log training progress to
training_log.log - Save model checkpoints at specified intervals
Training Parameters
Core ProFit Parameters
-
prob_threshold: Probability threshold(s) for sample selection- Single value:
0.1(forhigher,lower,randomstrategies) - Two values:
[0.3, 0.7](formiddlestrategy)
- Single value:
-
threshold_direction: Sample selection strategy"higher": Train on tokens with prediction probability > threshold (core expression tokens)"lower": Train on tokens with prediction probability < threshold (non-core expression tokens)"middle": Train on tokens with probability within threshold range"random": Randomly select tokens for training
Example Configurations
# Train on core expression tokens (recommended)
--prob_threshold 0.1 --threshold_direction "higher"
# Train on non-core expression tokens
--prob_threshold 0.8 --threshold_direction "lower"
# Train on medium probability tokens
--prob_threshold 0.3,0.7 --threshold_direction "middle"
# Random sampling (30% of tokens)
--prob_threshold 0.3 --threshold_direction "random"
Training Script Example
BASE_MODEL="Qwen/Qwen3-0.6B-Base"
BASE_OUTPUT_DIR="./output"
DATASET="shadow_2k"
LEARNING_RATE=0.00001
prob_threshold="0.1"
threshold_direction="higher"
llamafactory-cli train \
--model_name_or_path "$BASE_MODEL" \
--stage sft \
--do_train true \
--finetuning_type full \
--prob_threshold $prob_threshold \
--threshold_direction "$threshold_direction" \
--dataset "$DATASET" \
--cutoff_len 8192 \
--max_samples 4000 \
--output_dir "$OUTPUT_DIR" \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 4 \
--learning_rate "$LEARNING_RATE" \
--num_train_epochs 1 \
--logging_steps 1 \
--save_steps 200 \
--bf16 true \
--flash_attn fa2
For Evaluation
Please refer to OpenCompass for evaluation. You may find more details at this repo.
Future Plan
- Introduce evaluation scripts in this repo.
- Add more threshold strategies.
- Support for multi-modal models.
License
We use the Apacheβ2.0 license. Please also comply with the licenses of any upstream models and datasets.
βοΈ Citation
If you find this repository helpful, please consider citing our paper:
@article{liu2026profit,
title={ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection},
author={Liu, Tao and Wu, Taiqiang and Yang, Runming and Sun, Shaoning and Wang, Junjie and Yang, Yujiu},
journal={arXiv preprint arXiv:2601.09195},
year={2026}
}
For any questions, feel free to pull an issue or email at liu-t25@mails.tsinghua.edu.cn