Kernels in CUDA || Triton
August 24, 2025 ยท View on GitHub
kernels of different DL funcs
activation
- ELU (fp32, fp16, fp16x2, fp16x8_packed)
- GeLU (fp32, fp16, fp16x4_packed)
- Sigmoid (fp32, fp16, fp16x8_packed)
- ReLU (fp32, fp16)
- Swish (fp32, fp16)
embedding
- similar kernel to
torch.nn.functional.embeddingin fp32 & fp16