README.md

October 31, 2025 ยท View on GitHub



GitHub Repo stars license PyPI Downloads issue resolution open issues

๐Ÿ‘‹ join us on Static Badge Static Badge Static Badge

๐Ÿ” Explore our models on Static Badge Static Badge Static Badge Static Badge

English | ็ฎ€ไฝ“ไธญๆ–‡

๐Ÿš€ Speed Benchmark

๐ŸŽ‰ News

  • [2025/09] XTuner V1 Released! A Next-Generation Training Engine Built for Ultra-Large MoE Models

๐Ÿ“– XTuner V1

XTuner V1 is a next-generation LLM training engine specifically designed for ultra-large-scale MoE models. Unlike traditional 3D parallel training architectures, XTuner V1 is optimized for the mainstream MoE training scenarios prevalent in today's academic research.

Key Features

๐Ÿ“Š Dropless Training

  • Scalable without complexity: Train 200B-scale MoE models without expert parallelism; 600B models require only intra-node expert parallelism
  • Optimized parallelism strategy: Smaller expert parallelism dimension compared to traditional 3D approaches, enabling more efficient Dropless training

๐Ÿ“ Long Sequence Support

  • Memory-efficient design: Train 200B MoE models on 64k sequence lengths without sequence parallelism through advanced memory optimization techniques
  • Flexible scaling: Full support for DeepSpeed Ulysses sequence parallelism with linearly scalable maximum sequence length
  • Robust performance: Maintains stability despite expert load imbalance during long sequence training

โšก Superior Efficiency

  • Massive scale: Supports MoE training up to 1T parameters
  • Breakthrough performance: First to achieve FSDP training throughput that surpasses traditional 3D parallel schemes for MoE models above 200B scale
  • Hardware optimization: Achieves training efficiency on Ascend A3 Supernode that exceeds NVIDIA H800

๐Ÿ”ฅ Roadmap

XTuner V1 is committed to continuously improving training efficiency for pre-training, instruction fine-tuning, and reinforcement learning of ultra-large MoE models, with special focus on Ascend NPU optimization.

๐Ÿš€ Training Engine

Our vision is to establish XTuner V1 as a versatile training backend that seamlessly integrates with the broader open-source ecosystem.

ModelGPU(FP8)GPU(BF16)NPU(BF16)
Intern S1โœ…โœ…โœ…
Intern VLโœ…โœ…โœ…
Qwen3 Denseโœ…โœ…โœ…
Qwen3 MoEโœ…โœ…โœ…
GPT OSSโœ…โœ…๐Ÿšง
Deepseek V3โœ…โœ…๐Ÿšง
KIMI K2โœ…โœ…๐Ÿšง

๐Ÿง  Algorithm

The algorithm component is actively evolving. We welcome community contributions - with XTuner V1, scale your algorithms to unprecedented sizes!

Implemented

  • โœ… Multimodal Pre-training - Full support for vision-language model training
  • โœ… Multimodal Supervised Fine-tuning - Optimized for instruction following
  • โœ… GRPO - Group Relative Policy Optimization

Coming Soon

  • ๐Ÿ”„ MPO - Mixed Preference Optimization
  • ๐Ÿ”„ DAPO - Dynamic Sampling Policy Optimization
  • ๐Ÿ”„ Multi-turn Agentic RL - Advanced agent training capabilities

โšก Inference Engine Integration

Seamless deployment with leading inference frameworks:

  • LMDeploy
  • vLLM
  • SGLang

Data Preparation

  • You can use GraphGen to create synthetic data for fine-tuning.

๐Ÿค Contributing

We appreciate all contributions to XTuner. Please refer to CONTRIBUTING.md for the contributing guideline.

๐Ÿ™ Acknowledgement

The development of XTuner V1's training engine has been greatly inspired by and built upon the excellent work of the open-source community. We extend our sincere gratitude to the following pioneering projects:

Training Engine:

  • Torchtitan - A PyTorch native platform for training generative AI models
  • Deepspeed - Microsoft's deep learning optimization library
  • MindSpeed - Ascend's high-performance training acceleration library
  • Megatron - NVIDIA's large-scale transformer training framework

Reinforcement Learning:

XTuner V1's reinforcement learning capabilities have been enhanced through insights and best practices from:

  • veRL - Volcano Engine Reinforcement Learning for LLMs
  • SLIME - THU's scalable RLHF implementation
  • AReal - Ant Reasoning Reinforcement Learning for LLMs
  • OpenRLHF - An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray

We are deeply grateful to all contributors and maintainers of these projects for advancing the field of large-scale model training.

๐Ÿ–Š๏ธ Citation

@misc{2023xtuner,
    title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
    author={XTuner Contributors},
    howpublished = {\url{https://github.com/InternLM/xtuner}},
    year={2023}
}

License

This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.