local for torchrun

June 5, 2026 · View on GitHub

Twinkle: Training workbench to make your model glow


English  |  中文 

English Documentation   |   中文文档   |   Twinkle Web  

✨ What is Twinkle?

Twinkle✨ is a lightweight, client-server training framework engineered with modular, high-cohesion interfaces. Whether you are executing locally with torchrun, or scaling training across Ray clusters, Twinkle✨ eliminates infrastructure friction by encapsulating training logic into standardized APIs. Beyond simple abstraction, Twinkle✨ serves as a robust backend and gateway to enable serverless Training-as-a-Service (TaaS). It offers interfaces that constitute a superset of Tinker APIs, thereby making it possible to access a Twinkle✨ training service via Tinker client or the native Twinkle✨ client, which offers more functionalities.

🧩 Decoupled Architecture: Standardized Interfaces, backward compatible with Tinker APIs.
🚀 Multiple Runtime Modes: torchrun / Ray / HTTP.
🔌 Versatile Backends: Transformers / Megatron.
👥 Multi-Tenancy Training Service: Train multiple LoRAs that share one base model deployment.

Discord GroupTwinkle Wechat Group

Installation

Install with package:

pip install 'twinkle-kit'

Install from Source:

git clone https://github.com/modelscope/twinkle.git
cd twinkle
pip install -e .

Use our docker image:

modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:twinkle-0.2.1

If you need to use Twinkle's Client, you can use our one-click installation script:

# Mac or Linux
sh INSTALL_CLIENT.sh
# Windows, Open with powershell
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
.\INSTALL_CLIENT.ps1

This script will download or utilize conda to create a virtual environment called twinkle-client, which can be directly used for remote training.

If you need to install Megatron-related dependencies, you can use the following script:

sh INSTALL_MEGATRON.sh

Tutorials

Training TypeModel FrameworkCookbook Path
FSDP finetuningtransformersScript
EP FSDP2 LoRA finetuningtransformersScript
SP FSDP finetuningtransformersScript
pp/tp/cp finetuningmegatronScript
pp/tp/cp MoE finetuningmegatronScript
Multimodal FSDP finetuningtransformersScript
GRPO RL trainingmegatronScript
GRPO Multimodal RL trainingmegatronScript
GRPO Math RL trainingmegatronScript
DPO full-parameter trainingtransformersScript
DPO LoRA trainingtransformersScript
DPO multi-LoRA trainingtransformersScript
GKD on-policy distillationmegatronScript
GKD off-policy distillationmegatronScript
Tinker client finetuning (self-host)transformersScript
Tinker client finetuning (ModelScope)transformersScript
Twinkle client finetuning (self-host)transformersScript
Twinkle client finetuning (ModelScope)transformersScript
Server startup scriptstransformers/megatronScript

Changelog

  • 🎉2026-05-20 Support DeepSeek-V4-Flash and DeepSeek-V4-Pro models.
  • 🎉2026-05-20 Multi-turn rollout and tool calling in RL are now supported. The Cookbook is currently being written. You can use from twinkle_agentic.rollout import MultiTurnRollout/APIMultiTurnRollout directly for multi-turn rollout.
  • 🎉2026-05-20 IM message alerting on training job failure is now supported. Usage: import twinkle; twinkle.initialize(..., notifier=DingNotifier(...)).
  • 🎉2026-04-27 Support the padding_free operation for sft/dpo/grpo/gkd, use set_processor('InputProcessor', padding_free=True) to train with it.
  • 🎉2026-04-22 The ModelScope service has been deployed to Qwen/Qwen3.6-27B with a new release 0.2.1.
  • 🎉2026-04-14 The ModelScope service has been deployed to Qwen/Qwen3.6-35B-A3B with a new release 0.2.0.
  • 🎉2026-03-28 Support DPO training with both Transformers and Megatron backends. See dpo_full.py and dpo_lora.py.
  • 🎉2026-03-24 Twinkle Web site is now live at https://modelscope.github.io/twinkle-web/
  • 🎉2026-03-19 Support GKD training, please refer to this cookbook.
  • 🎉2026-02-13 Initial version of Twinkle✨ released, including SFT/PT/RL support for text models.

Training as a Service on ModelScope

We are rolling out training service built atop Twinkle✨ on ModelScope. You may train via API endpoint base_url=https://www.modelscope.cn/twinkle. For more details, please refer to our documentation.

Supported Hardware

Hardware EnvironmentNotes
Nvidia GPUs✅ Support for BF16/Flash-Attn may be incomplete in earlier GPUs
Ascend NPU✅ Some operators may not be supported
PPU
CPUSupports partial components like dataset, dataloader

Supported Models

We will be adding support for more models as new models are released. The following table lists current models supported on Twinkle✨ framework.

Note

For serverless training service accessed via base_url=https://www.modelscope.cn/twinkle, it is currently provided via the Tinker-compatible APIs. We will be rolling out services that support both Tinker APIs, as well as the full-fledged Twinkle✨ native APIs. The serverless endpoint is backed by one training base at a time, and currently it is Qwen3.6-27B.

Model TypeModel ID on ModelScopeModel SizeRequiresSupport MegatronHF Model ID
qwen3 seriesQwen/Qwen3-14B-Base0.6B/1.7B/4B/8B/14Btransformers>=4.51Qwen/Qwen3-14B-Base
Qwen/Qwen3-32B0.6B/1.7B/4B/8B/14B/32Btransformers>=4.51Qwen/Qwen3-32B
qwen3_moe seriesQwen/Qwen3-30B-A3B-Base30B-A3B/A3B-Base,235B-A22Btransformers>=4.51Qwen/Qwen3-30B-A3B-Base
qwen3.5 moe seriesQwen/Qwen3.5-35B-A3B35B-A3B,122B-A10B, etc.transformers>=5.2.0Qwen/Qwen3.5-35B-A3B
qwen3.5 seriesQwen/Qwen3.5-9B2B ~ 27Btransformers>=5.2.0Qwen/Qwen3.5-9B
qwen2 seriesQwen/Qwen2-0.5B-Instruct0.5B/1.5B/7B/72Btransformers>=4.37Qwen/Qwen2-0.5B-Instruct
Qwen/Qwen2-1.5B0.5B/1.5B/7B/72Btransformers>=4.37Qwen/Qwen2-1.5B
Qwen/Qwen2.5-1.5B-Instruct0.5B/1.5B/3B/7B/14B/32B/72Btransformers>=4.37Qwen/Qwen2.5-1.5B-Instruct
Qwen/Qwen2.5-0.5B0.5B/1.5B/3B/7B/14B/32Btransformers>=4.37Qwen/Qwen2.5-0.5B
qwen2_moe seriesQwen/Qwen1.5-MoE-A2.7B-Chat-transformers>=4.40Qwen/Qwen1.5-MoE-A2.7B-Chat
Qwen/Qwen1.5-MoE-A2.7B-transformers>=4.40Qwen/Qwen1.5-MoE-A2.7B
chatglm3 seriesZhipuAI/chatglm3-6b6b/6b-base/6b-32k/6b-128ktransformers<4.42zai-org/chatglm3-6b
chatglm4 seriesZhipuAI/glm-4-9b-chatglm-4-9b/glm-4-9b-chat/glm-4-9b-chat-1mtransformers>=4.42zai-org/glm-4-9b-chat
ZhipuAI/LongWriter-glm4-9b-transformers>=4.42zai-org/LongWriter-glm4-9b
glm_edge seriesZhipuAI/glm-edge-1.5b-chat1.5b-chat/4b-chattransformers>=4.46zai-org/glm-edge-1.5b-chat
internlm2 seriesShanghai_AI_Laboratory/internlm2-1_8b1_8b/chat-1_8b-sft/base-7b/7b/chat-7b/transformers>=4.38internlm/internlm2-1_8b
deepseek_v1deepseek-ai/DeepSeek-V2-LiteV2/V2-Lite/V2-Chat/2-Lite-Chat/V2.5transformers>=4.39.3deepseek-ai/DeepSeek-V2-Lite
deepseek-ai/DeepSeek-Prover-V2-7B-transformers>=4.39.3deepseek-ai/DeepSeek-Prover-V2-7B
deepseek-ai/DeepSeek-R1-transformers>=4.39.3deepseek-ai/DeepSeek-R1
deepSeek-r1-distilldeepseek-ai/DeepSeek-R1-Distill-Qwen-7B1.5B/7B/14B/32Btransformers>=4.37deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
DeepSeek V4全系列deepseek-ai/DeepSeek-V4-Flash284Btransformers>=5.8.0deepseek-ai/DeepSeek-V4-Flash
deepseek-ai/DeepSeek-V4-Pro1.6Ttransformers>=5.8.0deepseek-ai/DeepSeek-V4-Pro
Gemma4全系列google/gemma-4-E2B2.3B effective (5.1B with embeddings)transformers>=5.8.0google/gemma-4-E2B · Hugging Face
google/gemma-4-E4B4.5B effective (8B with embeddings)transformers>=5.8.0google/gemma-4-E4B · Hugging Face
google/gemma-4-12B11.95Btransformers>=5.10.1google/gemma-4-12B · Hugging Face
google/gemma-4-31B30.7Btransformers>=5.8.0google/gemma-4-31B · Hugging Face
google/gemma-4-26B-A4B25.2B (Active 3.8B)transformers>=5.8.0google/gemma-4-26B-A4B · Hugging Face

Sample Code

Below are some of the capabilities demonstrated in the example code. For a complete introduction to training capabilities, please refer to Quick Start and cookbook.

Train with Ray

from peft import LoraConfig
import twinkle
from twinkle import DeviceMesh, DeviceGroup
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import TransformersModel
from twinkle.preprocessor import SelfCognitionProcessor

device_group = [DeviceGroup(name='default',ranks=8,device_type='cuda')]
device_mesh = DeviceMesh.from_sizes(fsdp_size=4, dp_size=2)
# local for torchrun
twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_mesh)


def train():
    # to load model from Hugging Face, use 'hf://...'
    base_model = 'ms://Qwen/Qwen3.6-27B'
    # 1000 samples
    dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
    # Set template to prepare encoding
    dataset.set_template('Qwen3_5Template', model_id=base_model)
    # Preprocess the dataset to standard format
    dataset.map(SelfCognitionProcessor('twinkle LLM', 'ModelScope Community'))
    # Encode dataset
    dataset.encode()
    # Global batch size = 8, for GPUs, so 1 sample per GPU
    dataloader = DataLoader(dataset=dataset, batch_size=8, min_batch_size=8)
    # Use a TransformersModel
    model = TransformersModel(model_id=base_model, remote_group='default')

    lora_config = LoraConfig(
        r=8,
        lora_alpha=32,
        target_modules='all-linear'
    )

    # Add a lora to model, with name `default`
    # Comment this to use full-parameter training
    model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=2)
    # Add Optimizer for lora `default`
    model.set_optimizer(optimizer_cls='AdamW', lr=1e-4)
    # Add LRScheduler for lora `default`
    model.set_lr_scheduler(scheduler_cls='CosineWarmupScheduler', num_warmup_steps=5,
                           num_training_steps=len(dataloader))
    for step, batch in enumerate(dataloader):
        # Do forward and backward
        model.forward_backward(inputs=batch)
        # Step
        model.clip_grad_and_step()
        if step % 20 == 0:
            # Print metric
            metric = model.calculate_metric(is_training=True)
            print(f'Current is step {step} of {len(dataloader)}, metric: {metric}')
    model.save(f'last-checkpoint')


if __name__ == '__main__':
    train()

Access the Serverless Training Services via Tinker-compatible API

import os
from tqdm import tqdm
from tinker import types
from twinkle import init_tinker_client
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3.6-27B'
base_url='your-base-url'
api_key='your-api-key'

# Use twinkle dataset to load the data
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(500)))
dataset.set_template('Qwen3_5Template', model_id=base_model, max_length=256)
dataset.map(SelfCognitionProcessor('twinkle Model', 'ModelScope Team'), load_from_cache_file=False)
dataset.encode(batched=True, load_from_cache_file=False)
dataloader = DataLoader(dataset=dataset, batch_size=8)

# Initialize Tinker client before importing ServiceClient
init_tinker_client()
from tinker import ServiceClient

service_client = ServiceClient(base_url=base_url, api_key=api_key)
training_client = service_client.create_lora_training_client(base_model=base_model[len('ms://'):], rank=16)

# Training loop: use input_feature_to_datum to transfer the input format
for epoch in range(3):
    for step, batch in tqdm(enumerate(dataloader)):
        input_datum = [input_feature_to_datum(input_feature) for input_feature in batch]

        fwdbwd_future = training_client.forward_backward(input_datum, "cross_entropy")
        optim_future = training_client.optim_step(types.AdamParams(learning_rate=1e-4))

        fwdbwd_result = fwdbwd_future.result()
        optim_result = optim_future.result()

    training_client.save_state(f"twinkle-lora-{epoch}").result()

Architecture Design

Twinkle✨ features a decoupled Client-Server architecture designed for maximum flexibility. The client-side provides two distinct integration paths:

  • Twinkle✨ Native: A conforming API that mirrors the server-side interface for seamless end-to-end integration.
  • Tinker Compatibility: Full support for the native Tinker API, enabling developers to leverage Twinkle✨’s backend using Tinker client.

This dual-path design ensures access to Twinkle✨’s training services using Tinker API, with a simple modification of the Tinker base URL.

Multi-Tenancy

Twinkle✨ supports simultaneous multi-tenant training on a shared base model. Leveraging a LoRA Pool + Tenant Application architecture, Twinkle enables up to N tenants to train in parallel with complete isolation. This design offers unprecedented flexibility: from the model's perspective, each tenant's session is distinct, supporting heterogeneous configurations including unique data padding strategies, optimizers, and loss functions—all running concurrently on the same base model.

Note: This feature is currently optimized for LoRA.

For example:

  • Tenant A: Load local private dataset locally, LoRA rank=8, using base model for SFT
  • Tenant B: Load open-source dataset from Hub remotely, LoRA rank=32, using base model for PT
  • Tenant C: Use base model for GRPO loss calculation, using Sampler for sampling
  • Tenant D: Use base model for logps inference

These processes are executed concurrently on a single base model because the Model and Sampler are integrated as task-agnostic components within the Twinkle✨ ecosystem. Upon completion, checkpoints are automatically pushed to ModelScope or HuggingFace repositories (private by default). On the server side, Twinkle✨ provides a robust multi-tenant suite featuring automated cluster management and dynamic scaling, making it the foundation for building customizable, enterprise-grade training services.

As a modular framework, Twinkle✨ also supports remote temporary exclusive training, i.e., training in full-parameter mode.

🛠️ Twinkle✨ Modular Ecosystem

Dataset
Data loading and preprocessing

Template
Encoding and decoding

DataLoader
Data distribution and batching

Preprocessor
Data ETL

InputProcessor
Task-specific input processing

Model
Large models, supports multiple frameworks

Sampler
Sampler logic

Loss
Loss functions

Metric
Training metrics collection

Reward
Reward function

Advantage
Advantage function

CheckpointEngine
Weight synchronization

Patch
Patches for model fixes

Module
Components, e.g., Optimizer

Kernel
Operators

Server
Start backend cluster

Client
Client code

Infra
Isolate ray and torchrun differences

Plugin
Use hub components

Hub
Interface with HF/MS libraries

Community Components

Component TypeComponent LinkComponent FunctionAuthor
Patchqwen3_moe_transformers4_patchFixes Qwen3 MoE model hang issue during FSDP2 training, effective for transformers==4.xModelScope Official

Contributions

Twinkle✨ is designed, developed, and maintained by an Open Workshop composed of members from various open-source technology teams. We welcome more developers passionate about large model training to join us in building and improving this framework.

The core members of the workshop currently come from:

We are grateful to the open-source community, particularly the projects that inspired us, including Transformers, MS-SWIFT, veRL, Tinker, and many others.

We welcome open contributions via issues and pull-requests.