README.md

February 23, 2026 ยท View on GitHub

Performance

OPDEVICESHAPETIMEPERFPEAKRATIO
hgemm-wmmaRTX4090m=8192,n=8192,k=81926.7 ms163.5 TFLOPS165.2 TFLOPS99%
hgemm-wmmaH20m=8192,n=8192,k=819212.2 ms89.5 TFLOPS95 TFLOPS94%

Install

git clone https://github.com/xytpai/gpuk
cd gpuk
python -m pip install -e . --no-build-isolation

Test

python test/test_all_reduce_fusion.py