gpt-oss-tvm
January 27, 2026 ยท View on GitHub
This project aims to compile OpenAI gpt-oss model using Apache TVM and run it on the target device.
Project Goals
Visit Wiki Home or Design Philosophy page to read more for the project goal and objectives!
Setup
To support gpt-oss correctly, TVM & MLC LLM needs to be built with a few patches.
Please refer to our Wiki - Setup & Run page for setup instructions.
Download model
Note
While TVM supports multiple hardware backends, this project has been mainly tested with the metal target on macOS. As the model uses the original mxfp4 and bfloat16 weights without further quantization, an Apple Silicon Mac with 24 GB or more of unified memory is recommended.
Files for gpt-oss reference torch implementation
pip install huggingface_hub # to use `hf` command
hf download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/
Compile & Run
Important
To ensure equivalence with gpt-oss, please confirm that a TVM built with the patches applied.
You can install the desired TVM & MLC LLM by referring to the Wiki Setup page.
Basic single-turn test
python run_gpt_oss.py
Multi-turn chat
python chat.py
Use other target devices
The target device can be changed by modifying the following line in the scripts:
- engine = Engine(model_path, target="metal")
+ engine = Engine(model_path, target="<YOUR DEVICE TYPE>")
Supported device types are determined by TVM target support.
License
This project follows the Apache License 2.0, in line with the licenses of gpt-oss and TVM.
Authors
- @Liberatedwinner
- @grf53
- @jhlee525
- @khj809