ProFit

January 17, 2026 Β· View on GitHub

Official code for paper "ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection"

[πŸ“œ Paper] β€’ [🐱 GitHub]

This repo contains the code for our paper: ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection.

Quick Start

The training codes are built on LLaMA-Factory. We employ OpenCompass for evaluation. Both are Tremendous projects, and you can find nearly everything there, thanks to their great framework and beautiful code!

Environment

git clone https://github.com/Utaotao/ProFit
cd ProFit
pip install -e ".[torch,metrics]"
pip install torch==2.9.1 transformers==4.57.1 deepspeed==0.16.9

Please refer to LLaMA Factory for more details.

Training Data

We use the Shadow 2K dataset and save it at data/Shadow_2k.parquet. You can download via this link.

For custom datasets, remember to add information at data/dataset_info.json.

For Train

Set BASE_MODEL, BASE_OUTPUT_DIR, and other parameters in run.sh, then:

bash run.sh

Set BASE_MODEL="" to download the model from Huggingface, rather than using a local file.

The training script will automatically:

  • Create output directories
  • Start training with ProFit loss function
  • Log training progress to training_log.log
  • Save model checkpoints at specified intervals

Training Parameters

Core ProFit Parameters

  • prob_threshold: Probability threshold(s) for sample selection

    • Single value: 0.1 (for higher, lower, random strategies)
    • Two values: [0.3, 0.7] (for middle strategy)
  • threshold_direction: Sample selection strategy

    • "higher": Train on tokens with prediction probability > threshold (core expression tokens)
    • "lower": Train on tokens with prediction probability < threshold (non-core expression tokens)
    • "middle": Train on tokens with probability within threshold range
    • "random": Randomly select tokens for training

Example Configurations

# Train on core expression tokens (recommended)
--prob_threshold 0.1 --threshold_direction "higher"

# Train on non-core expression tokens
--prob_threshold 0.8 --threshold_direction "lower"

# Train on medium probability tokens
--prob_threshold 0.3,0.7 --threshold_direction "middle"

# Random sampling (30% of tokens)
--prob_threshold 0.3 --threshold_direction "random"

Training Script Example

BASE_MODEL="Qwen/Qwen3-0.6B-Base"
BASE_OUTPUT_DIR="./output"
DATASET="shadow_2k"
LEARNING_RATE=0.00001
prob_threshold="0.1"
threshold_direction="higher"

llamafactory-cli train \
    --model_name_or_path "$BASE_MODEL" \
    --stage sft \
    --do_train true \
    --finetuning_type full \
    --prob_threshold $prob_threshold \
    --threshold_direction "$threshold_direction" \
    --dataset "$DATASET" \
    --cutoff_len 8192 \
    --max_samples 4000 \
    --output_dir "$OUTPUT_DIR" \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --learning_rate "$LEARNING_RATE" \
    --num_train_epochs 1 \
    --logging_steps 1 \
    --save_steps 200 \
    --bf16 true \
    --flash_attn fa2

For Evaluation

Please refer to OpenCompass for evaluation. You may find more details at this repo.

Future Plan

  • Introduce evaluation scripts in this repo.
  • Add more threshold strategies.
  • Support for multi-modal models.

License

We use the Apache‑2.0 license. Please also comply with the licenses of any upstream models and datasets.

β˜•οΈ Citation

If you find this repository helpful, please consider citing our paper:

@article{liu2026profit,
  title={ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection},
  author={Liu, Tao and Wu, Taiqiang and Yang, Runming and Sun, Shaoning and Wang, Junjie and Yang, Yujiu},
  journal={arXiv preprint arXiv:2601.09195},
  year={2026}
}

For any questions, feel free to pull an issue or email at liu-t25@mails.tsinghua.edu.cn