aarch64
May 12, 2026 · View on GitHub
[中文版|English]
💫 Moore Threads(摩尔线程)mthreads (Triton 3.6)
- Based on Triton 3.6, x64
- Available for S4000/S5000
1. Build and run environment
1.1 Use the preinstalled image (Triton 3.6, MTT-S5000)
If your network connection is available, you do not need to perform the later step 1.x, because dependencies will be fetched automatically during the build.
# Plan A: docker pull (60.5GB)
IMAGE=harbor.baai.ac.cn/flagtree/flagtree-mthreads3.6-py310-torch2.7.1-musa5.1.0-ubuntu22.04:202605-base
docker pull ${IMAGE}
# Plan B: docker load (17GB)
IMAGE=flagtree-mthreads3.6-py310-torch2.7.1-musa5.1.0-ubuntu22.04:202605-base
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/flagtree-mthreads3.6-py310-torch2.7.1-musa5.1.0-ubuntu22.04.202605-base.tar.gz
docker load -i flagtree-mthreads3.6-py310-torch2.7.1-musa5.1.0-ubuntu22.04.202605-base.tar.gz
mcc_version # 5.1.0
CONTAINER=flagtree-dev-xxx
docker run -dit \
--network=host --pid=host --privileged \
--cap-add=SYS_PTRACE \
--shm-size 16gb \
--security-opt seccomp=unconfined \
-e MTHREADS_VISIBLE_DEVICES=all -e MTHREADS_DRIVER_CAPABILITIES=all \
-v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu \
-v /lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu \
-v /etc/alternatives:/etc/alternatives \
-v /etc/localtime:/etc/localtime:ro \
-v /data:/data -v /home:/home -v /tmp:/tmp \
-w /root --name ${CONTAINER} ${IMAGE} bash
docker exec -it ${CONTAINER} /bin/bash
1.2 Manually download the FlagTree dependencies
mkdir -p ~/.flagtree/mthreads; cd ~/.flagtree/mthreads
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm22-x64_v0.5.0.tar.gz
tar zxvf mthreads-llvm22-x64_v0.5.0.tar.gz
1.3 Manually download the Triton dependencies
The Triton dependencies are already downloaded and installed in the image. If you do not need to build FlagTree or Triton from source, you do not need to download the Triton dependencies.
cd ${YOUR_CODE_DIR}/FlagTree
# For Triton 3.6 (x64)
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/build-deps-triton_3.6.x-linux-x64.tar.gz
sh python/scripts/unpack_triton_build_deps.sh ./build-deps-triton_3.6.x-linux-x64.tar.gz
After executing the above script, the original ~/.triton directory will be renamed, and a new ~/.triton directory will be created to store the pre-downloaded packages. Note that the script will prompt for manual confirmation during execution.
2. Installation Commands
2.1 Source-free Installation
# Note: First install PyTorch, then execute the following commands
python3 -m pip uninstall -y triton # Repeat the cmd until fully uninstalled
RES="--index-url=https://resource.flagos.net/repository/flagos-pypi-hosted/simple"
python3.10 -m pip install flagtree===0.5.1+mthreads3.6 $RES
After installing flagtree, you can check it with:
python3 -m pip show flagtree
2.2 Build from Source
apt update; apt install zlib1g zlib1g-dev libxml2 libxml2-dev
cd ${YOUR_CODE_DIR}/FlagTree
git checkout -b triton_v3.6.x origin/triton_v3.6.x
python3 -m pip install -r requirements.txt
export FLAGTREE_BACKEND=mthreads
MAX_JOBS=32 python3 -m pip install . --no-build-isolation -v
3. Testing and validation
Refer to Tests of mthreads3.2 backend
💫 Moore Threads(摩尔线程)mthreads (Triton 3.2)
- Based on Triton 3.2, x64
- Available for S4000/S5000
1. Build and run environment
1.1 Use the preinstalled image (Triton 3.2, MTT-S5000)
If your network connection is available, you do not need to perform the later step 1.x, because dependencies will be fetched automatically during the build.
# Plan A: docker pull (59.4GB)
IMAGE=harbor.baai.ac.cn/flagtree/flagtree-mthreads3.2-py310-torch2.7.1-musa5.1.0-ubuntu22.04:202605-base
docker pull ${IMAGE}
# Plan B: docker load (17GB)
IMAGE=flagtree-mthreads3.2-py310-torch2.7.1-musa5.1.0-ubuntu22.04:202605-base
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/flagtree-mthreads3.2-py310-torch2.7.1-musa5.1.0-ubuntu22.04.202605-base.tar.gz
docker load -i flagtree-mthreads3.2-py310-torch2.7.1-musa5.1.0-ubuntu22.04.202605-base.tar.gz
mcc_version # 5.1.0
CONTAINER=flagtree-dev-xxx
docker run -dit \
--network=host --pid=host --privileged \
--cap-add=SYS_PTRACE \
--shm-size 16gb \
--security-opt seccomp=unconfined \
-e MTHREADS_VISIBLE_DEVICES=all -e MTHREADS_DRIVER_CAPABILITIES=all \
-v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu \
-v /lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu \
-v /etc/alternatives:/etc/alternatives \
-v /etc/localtime:/etc/localtime:ro \
-v /data:/data -v /home:/home -v /tmp:/tmp \
-w /root --name ${CONTAINER} ${IMAGE} bash
docker exec -it ${CONTAINER} /bin/bash
1.2 Manually download the FlagTree dependencies
mkdir -p ~/.flagtree/mthreads; cd ~/.flagtree/mthreads
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm20-x64_v0.5.0.tar.gz
tar zxvf mthreads-llvm20-x64_v0.5.0.tar.gz
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreadsTritonPlugin-triton3.2-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-x64_v0.5.0.tar.gz
tar zxvf mthreadsTritonPlugin-triton3.2-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-x64_v0.5.0.tar.gz
1.3 Manually download the Triton dependencies
The Triton dependencies are already downloaded and installed in the image. If you do not need to build FlagTree or Triton from source, you do not need to download the Triton dependencies.
cd ${YOUR_CODE_DIR}/FlagTree
# For Triton 3.2 (x64)
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/build-deps-triton_3.2.x-linux-x64.tar.gz
sh python/scripts/unpack_triton_build_deps.sh ./build-deps-triton_3.2.x-linux-x64.tar.gz
After executing the above script, the original ~/.triton directory will be renamed, and a new ~/.triton directory will be created to store the pre-downloaded packages. Note that the script will prompt for manual confirmation during execution.
2. Installation Commands
2.1 Source-free Installation
# Note: First install PyTorch, then execute the following commands
python3 -m pip uninstall -y triton # Repeat the cmd until fully uninstalled
RES="--index-url=https://resource.flagos.net/repository/flagos-pypi-hosted/simple"
python3.10 -m pip install flagtree===0.5.1+mthreads3.2 $RES
After installing flagtree, you can check it with:
python3 -m pip show flagtree
2.2 Build from Source
apt update; apt install zlib1g zlib1g-dev libxml2 libxml2-dev
cd ${YOUR_CODE_DIR}/FlagTree/python
git checkout -b triton_v3.2.x origin/triton_v3.2.x
python3 -m pip install -r requirements.txt
export FLAGTREE_BACKEND=mthreads
MAX_JOBS=32 python3 -m pip install . --no-build-isolation -v
3. Testing and validation
Refer to Tests of mthreads3.2 backend
💫 Moore Threads(摩尔线程)mthreads (Triton 3.1)
- Based on Triton 3.1, x64/aarch64
- Available for S4000/S5000
1. Build and run environment
1.1 Use the preinstalled image (Triton 3.1, MTT-S5000)
If you use this preinstalled image, you do not need to perform the later step 1.x. If your network connection is available, you also do not need to perform the later step 1.x, because dependencies will be fetched automatically during the build.
# Plan A: docker pull (55.3GB)
IMAGE=harbor.baai.ac.cn/flagtree/flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04:202603
docker pull ${IMAGE}
# Plan B: docker load (18GB)
IMAGE=flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04:202603
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04.202603.tar.gz
docker load -i flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04.202603.tar.gz
CONTAINER=flagtree-dev-xxx
docker run -dit \
--network=host --pid=host --privileged \
--cap-add=SYS_PTRACE \
--shm-size 16gb \
--security-opt seccomp=unconfined \
-e MTHREADS_VISIBLE_DEVICES=all -e MTHREADS_DRIVER_CAPABILITIES=all \
-v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu \
-v /lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu \
-v /etc/alternatives:/etc/alternatives \
-v /etc/localtime:/etc/localtime:ro \
-v /data:/data -v /home:/home -v /tmp:/tmp \
-w /root --name ${CONTAINER} ${IMAGE} bash
docker exec -it ${CONTAINER} /bin/bash
1.2 Manually download the FlagTree dependencies
mkdir -p ~/.flagtree/mthreads; cd ~/.flagtree/mthreads
# x64
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm19-glibc2.35-glibcxx3.4.30-x64_v0.4.0.tar.gz
tar zxvf mthreads-llvm19-glibc2.35-glibcxx3.4.30-x64_v0.4.0.tar.gz \
-C ./mthreads-llvm19-glibc2.35-glibcxx3.4.30 --strip-components=1
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-x64_v0.4.1.tar.gz
tar zxvf mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-x64_v0.4.1.tar.gz
# aarch64
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm19-glibc2.35-glibcxx3.4.30-aarch64_v0.4.0.tar.gz
tar zxvf mthreads-llvm19-glibc2.35-glibcxx3.4.30-aarch64_v0.4.0.tar.gz \
-C ./mthreads-llvm19-glibc2.35-glibcxx3.4.30 --strip-components=1
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-aarch64_v0.4.0.tar.gz
tar zxvf mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-aarch64_v0.4.0.tar.gz
1.3 Manually download the Triton dependencies
The Triton dependencies are already downloaded and installed in the image. If you do not need to build FlagTree or Triton from source, you do not need to download the Triton dependencies.
cd ${YOUR_CODE_DIR}/FlagTree
# For Triton 3.1 (x64)
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/build-deps-triton_3.1.x-linux-x64.tar.gz
sh python/scripts/unpack_triton_build_deps.sh ./build-deps-triton_3.1.x-linux-x64.tar.gz
After executing the above script, the original ~/.triton directory will be renamed, and a new ~/.triton directory will be created to store the pre-downloaded packages. Note that the script will prompt for manual confirmation during execution.
2. Installation Commands
2.1 Source-free Installation
# Note: First install PyTorch, then execute the following commands
python3 -m pip uninstall -y triton # Repeat the cmd until fully uninstalled
RES="--index-url=https://resource.flagos.net/repository/flagos-pypi-hosted/simple"
python3.10 -m pip install flagtree===0.5.1+mthreads3.1 $RES
flagtree is already installed in the preinstalled image. You can check it with:
python3 -m pip show flagtree
2.2 Build from Source
apt update; apt install zlib1g zlib1g-dev libxml2 libxml2-dev
cd ${YOUR_CODE_DIR}/FlagTree/python
python3 -m pip install -r requirements.txt
export FLAGTREE_BACKEND=mthreads
MAX_JOBS=32 python3 -m pip install . --no-build-isolation -v
3. Testing and validation
Refer to Tests of mthreads3.1 backend
For triton 3.1 kernels that use tl.dot, setting the environment variable export MUSA_ENABLE_SQMMA=1 can improve performance.