Tiny FlashAttention

November 3, 2024 ยท View on GitHub

WIP

A tiny flash attention implement in python, rust, cuda and c for learning purpose.

  • python version
    • naive pure python code
  • triton version
    • triton code
  • [c version]
    • naive pure c code
    • naive cuda code standalone
    • naive cuda code python binding
    • cutlass cuda code
  • [rust version]

cutlass cute flash attention in action

my env: cutlass v3.4, torch 1.14, cuda 12.4

  • en tutorial
  • zh tutorial