README.md

August 1, 2025 · View on GitHub

StepFun: Cost-Effective Multimodal Intelligence

Chat Homepage
Hugging Face ModelScope Twitter Follow
Discord License
📰  Step3 Model Blog     |     📄  Step3 System Tech Report

Introduction

Step3 is our cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.

Step3 model card:

ConfigValue
Number of Layers (Dense layer included)61
Number of Dense Layers5
Hidden Dimension7168
Attention MechanismMFA
Low-rank Query Dimension2048
Number of Query Heads64
Head Dimension256
Number of Experts48
Selected Experts per Token3
Number of Shared Experts1
Max Context Length65536
TokenizerDeepseek V3
Total Parameters (LLM)316B
Activated Params per Token38B
Total Parameters (VLM)321B

Evaluation Results

Deployment

You can access Step3's API on https://platform.stepfun.com/ , we provide OpenAI/Anthropic-compatible API for you.

Our model checkpoints are stored in bf16 and block-fp8 format, you can find it on Huggingface.

Currently, it is recommended to run Step3 on the following inference engines:

  • vLLM
  • SGLang

Deployment and Request examples for vLLM and SGLang can be found in the Model Deployment Guide.

Contact Us

If you have any questions, please reach out at contact@stepfun.com .

License

Both the code repository and the model weights are released under the Apache License (Version 2.0).

Citation

@misc{step3system,
      title={Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding}, 
      author={StepFun Team},
      year={2025},
      eprint={2507.19427},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.19427}, 
}

@misc{step3blog,
      title={Step3: Cost-Effective Multimodal Intelligence}, 
      author={StepFun Team},
      url={https://stepfun.ai/research/step3}, 
}