EffectiveKernels

May 22, 2026 ยท View on GitHub

Prerequisites

  • CUDA Toolkit 12.0+ (SM90 / Hopper)
  • PyTorch 2.0+
  • Python 3.8+
  • nvidia-cutlass-dsl 4.4.2+

Build

git clone https://github.com/Kwai-Keye/EffectiveKernels.git
cd EffectiveKernels

# Standard build (JIT only)
pip install -e . --no-build-isolation

# Build with AOT pre-compiled kernels
EFFECTIVE_KERNELS_AOT=1 pip install -e . --no-build-isolation

Acknowledgement

This project is built upon FlashAttention and CUTLASS. We gratefully acknowledge their excellent work and contributions to the community.