Training-time Realignment (TrRa)

June 23, 2025 · View on GitHub

♻️ ReAligner

arXiv Paper   Homepage   Models

A flexible realignment framework is proposed to quantitatively control alignment during training and inference, combining Training-time Realignment (TrRa) and Inference-time Realignment (InRa).

Training-time Realignment (TrRa)

  • For how to realign your model, please see the training script in this file.

  • For instance, we realign DeepSeek-R1-Distill-Qwen-1.5B model by using DeepScalerR-1.5B-Preview and reduce token usage without performance loss. We provide our realigned models as follows:

Model NameToken ReductionPerformance DegradeDownload Link
🤗 DeepSeek-R1-TrRa-1.5B-λ=0.517.42NoLink
🤗 DeepSeek-R1-TrRa-1.5B-λ=1.537.48NoLink
🤗 DeepSeek-R1-TrRa-1.5B-λ=2.042.11NoLink
🤗 DeepSeek-R1-TrRa-1.5B-λ=5.047.83NoLink
🤗 DeepSeek-R1-TrRa-1.5B-λ=10.050.83NoLink
🤗 DeepSeek-R1-TrRa-iter1-1.5B-λ=2.054.63NoLink
🤗 DeepSeek-R1-TrRa-iter2-1.5B-λ=2.060.48YesLink

exp1


Inference-time Realignment (InRa)

  • For how to enable your model with thinking control ability, please see the training script in this file.

  • For instance, we endow DeepSeek-R1-Distill-Qwen-1.5B with thinking control ability

exp2

We provide our hybrid models as follows:

Model NameDownload Link
🤗 DeepSeek-R1-Distill-Qwen-1.5B-InRaLink
🤗 DeepSeek-R1-Distill-Qwen-7B-InRaLink

Installation

conda create -n InRa python==3.10
conda activate InRa
cd vllm
VLLM_USE_PRECOMPILED=1 pip install --editable .

VLLM_USE_V1=0 vllm serve wh-zhu/DeepSeek-R1-Distill-Qwen-7B-InRa

Inference

The hyperparameter control determines the degree of "thinking" the model performs.

  • When 0 < control < 1, the model operates between a "thinking" and "non-thinking" mode.

  • When control > 1, the model engages in shorter thinking.

  • When control < 0, the model engages in longer thinking.

from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
        api_key="EMPTY",
        base_url=openai_api_base,
)
chat_response = client.chat.completions.create(
    model="wh-zhu/DeepSeek-R1-Distill-Qwen-7B-InRa",
    messages=[
        {'role': 'user', 
        'content': """Find the sum of all integer bases $b>9$ for which \$17_{b}$ is a divisor of \$97_{b}$."""}
    ],
    extra_body={
        'stop_token_ids': [151643, 151645],
        'skip_special_tokens': False,
        'control': 0.1
    },
    temperature=0.7,
    top_p=0.95,
    max_tokens=4096*4,
    n=1
)

print(chat_response.choices[0].message.content)

Citation

@article{zhu2025flexible,
  title={Flexible Realignment of Language Models},
  author={Zhu, Wenhong and Xie, Ruobing and Zhang, Weinan and Wang, Rui},
  journal={arXiv preprint arXiv:2506.12704},
  year={2025}
}