EffectiveKernels
May 22, 2026 ยท View on GitHub
Prerequisites
- CUDA Toolkit 12.0+ (SM90 / Hopper)
- PyTorch 2.0+
- Python 3.8+
- nvidia-cutlass-dsl 4.4.2+
Build
git clone https://github.com/Kwai-Keye/EffectiveKernels.git
cd EffectiveKernels
# Standard build (JIT only)
pip install -e . --no-build-isolation
# Build with AOT pre-compiled kernels
EFFECTIVE_KERNELS_AOT=1 pip install -e . --no-build-isolation
Acknowledgement
This project is built upon FlashAttention and CUTLASS. We gratefully acknowledge their excellent work and contributions to the community.