Kernels in CUDA || Triton

August 24, 2025 ยท View on GitHub

kernels of different DL funcs

activation

  • ELU (fp32, fp16, fp16x2, fp16x8_packed)
  • GeLU (fp32, fp16, fp16x4_packed)
  • Sigmoid (fp32, fp16, fp16x8_packed)
  • ReLU (fp32, fp16)
  • Swish (fp32, fp16)

embedding

  • similar kernel to torch.nn.functional.embedding in fp32 & fp16