gpt-oss-tvm

January 27, 2026 · View on GitHub

This project aims to compile OpenAI gpt-oss model using Apache TVM and run it on the target device.

Project Goals

Visit Wiki Home or Design Philosophy page to read more for the project goal and objectives!

Setup

To support gpt-oss correctly, TVM & MLC LLM needs to be built with a few patches.

Please refer to our Wiki - Setup & Run page for setup instructions.

While TVM supports multiple hardware backends, this project has been mainly tested with the metal target on macOS. As the model uses the original mxfp4 and bfloat16 weights without further quantization, an Apple Silicon Mac with 24 GB or more of unified memory is recommended.

Files for gpt-oss reference torch implementation

pip install huggingface_hub  # to use `hf` command
hf download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/

Compile & Run

Important

To ensure equivalence with gpt-oss, please confirm that a TVM built with the patches applied.
You can install the desired TVM & MLC LLM by referring to the Wiki Setup page.

Basic single-turn test

python run_gpt_oss.py

Multi-turn chat

python chat.py

Use other target devices

The target device can be changed by modifying the following line in the scripts:

- engine = Engine(model_path, target="metal")
+ engine = Engine(model_path, target="<YOUR DEVICE TYPE>")

Supported device types are determined by TVM target support.

License

This project follows the Apache License 2.0, in line with the licenses of gpt-oss and TVM.

Authors

@Liberatedwinner
@grf53
@jhlee525
@khj809