Training-time Realignment (TrRa)

June 23, 2025 · View on GitHub

♻️ ReAligner

A flexible realignment framework is proposed to quantitatively control alignment during training and inference, combining Training-time Realignment (TrRa) and Inference-time Realignment (InRa).

Training-time Realignment (TrRa)

For how to realign your model, please see the training script in this file.
For instance, we realign DeepSeek-R1-Distill-Qwen-1.5B model by using DeepScalerR-1.5B-Preview and reduce token usage without performance loss. We provide our realigned models as follows:

Model Name	Token Reduction	Performance Degrade	Download Link
🤗 DeepSeek-R1-TrRa-1.5B-λ=0.5	17.42	No	Link
🤗 DeepSeek-R1-TrRa-1.5B-λ=1.5	37.48	No	Link
🤗 DeepSeek-R1-TrRa-1.5B-λ=2.0	42.11	No	Link
🤗 DeepSeek-R1-TrRa-1.5B-λ=5.0	47.83	No	Link
🤗 DeepSeek-R1-TrRa-1.5B-λ=10.0	50.83	No	Link
🤗 DeepSeek-R1-TrRa-iter1-1.5B-λ=2.0	54.63	No	Link
🤗 DeepSeek-R1-TrRa-iter2-1.5B-λ=2.0	60.48	Yes	Link

exp1

Inference-time Realignment (InRa)

For how to enable your model with thinking control ability, please see the training script in this file.
For instance, we endow DeepSeek-R1-Distill-Qwen-1.5B with thinking control ability

exp2

We provide our hybrid models as follows:

Model Name	Download Link
🤗 DeepSeek-R1-Distill-Qwen-1.5B-InRa	Link
🤗 DeepSeek-R1-Distill-Qwen-7B-InRa	Link

Installation

conda create -n InRa python==3.10
conda activate InRa
cd vllm
VLLM_USE_PRECOMPILED=1 pip install --editable .

VLLM_USE_V1=0 vllm serve wh-zhu/DeepSeek-R1-Distill-Qwen-7B-InRa

Inference

The hyperparameter control determines the degree of "thinking" the model performs.

When 0 < control < 1, the model operates between a "thinking" and "non-thinking" mode.
When control > 1, the model engages in shorter thinking.
When control < 0, the model engages in longer thinking.

from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
        api_key="EMPTY",
        base_url=openai_api_base,
)
chat_response = client.chat.completions.create(
    model="wh-zhu/DeepSeek-R1-Distill-Qwen-7B-InRa",
    messages=[
        {'role': 'user', 
        'content': """Find the sum of all integer bases $b>9$ for which \$17_{b}$ is a divisor of \$97_{b}$."""}
    ],
    extra_body={
        'stop_token_ids': [151643, 151645],
        'skip_special_tokens': False,
        'control': 0.1
    },
    temperature=0.7,
    top_p=0.95,
    max_tokens=4096*4,
    n=1
)

print(chat_response.choices[0].message.content)

Citation

@article{zhu2025flexible,
  title={Flexible Realignment of Language Models},
  author={Zhu, Wenhong and Xie, Ruobing and Zhang, Weinan and Wang, Rui},
  journal={arXiv preprint arXiv:2506.12704},
  year={2025}
}