Test

May 2, 2026 Β· View on GitHub

OpenVul: An Open-Source Post-Training Framework for LLM-Based Vulnerability Detection

Python License asc

πŸ’₯ News

  • The corresponding paper, β€œFrom SFT to RL: Demystifying the Post-Training Pipeline for LLM-Based Vulnerability Detection,” is available on arXiv.

πŸ› οΈ Environment Setup

git clone https://github.com/youpengl/OpenVul.git
cd OpenVul
pip install uv
uv python install 3.11.13
uv venv --python 3.11.13
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install flash-attn==2.8.1 --no-build-isolation
export HF_TOKEN = ""
export WANDB_API_KEY = ""

βš™οΈ Post-training Framework

We have developed the first post-training framework for LLM-based VD based on the Hugging Face TRL library. Our framework currently supports SFT, Preference Optimization (e.g., DPO, ORPO), and on-policy RL (e.g., GRPO) for VD LLMs. We plan to continuously integrate more specialized post-training algorithms for VD in the future.

πŸ“Š Leaderboard

πŸƒπŸ» Running Details

# Train

## Cold Start Stage
sbatch sft.slurm


## Preference Optimizaiton
sbatch dpo.slurm
sbatch orpo.slurm


## RL Stage

### Step 1: Run judge server and vllm server
sbatch judge_server.slurm
sbatch vllm_server.slurm

### Step 2: GRPO Training
#### Please switch between ['detection', 'prediction', 'reasoning', 'specification'] to change the reward system in the file `grpo.sh`.
#### For specification-based reward, please modify: --reward_weights 1.0 1.0 1.0 1.0.
#### By defualt, the reasoning-based reward is recommended to use to balance model performance and training stability.
sbatch grpo.slurm


# Test

## LLM Inference via vLLM
sbatch vllm_inference.slurm


## Output Judge
python LLM_judge_for_vulnerability_detection.py --gpu [input your gpu node ip] --name [input your model name] 


## Metric Calculation
python calculate_metrics.py

πŸ’» GPU Requirements

StagePurposeHardware (A100 80GB)Estimated Duration
Cold StartSFT4x GPUs< 1 Days
Preference OptimizationDPO / ORPO4x GPUs< 1 Days
RL Stage (Training)GRPO Training8x GPUs3 - 5 Days
RL Stage (Judge Server)Reward Model / LLM-as-a-Judge4x GPUsSynchronous
RL Stage (vLLM Server)Rollout / Inference2x GPUsSynchronous

πŸ—‚οΈ Overview of the Datasets Released on Hugging Face

🧠 Overview of the Models Released on Hugging Face

πŸ“š Citation

@misc{li2026sftrldemystifyingposttraining,
      title={From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection}, 
      author={Youpeng Li and Fuxun Yu and Xinda Wang},
      year={2026},
      eprint={2602.14012},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2602.14012},
}

πŸ“¬ Contact

Feel free to contact me via youpeng [dot] li [dot] utdallas [dot] edu