FlashKAN: Grid size-independent computation of Kolmogorov Arnold networks using BSpline bases

June 7, 2024 ยท View on GitHub

Check out the demo notebook

In short, we demonstrate how FlashKAN's training and inference speed scales much better with the grid size G compared to other BSpline-based implementations of the Kolmogorov Arnold Linear layer (in pytorch), without any shortcomings in the loss/accuracy performance with the example of training on the MNIST dataset. Memory consumption/allocations are yet to be benchmarked and could perhaps be better optimimized.

While FlashKAN still has O(N2L(G+k))\mathcal{O}(N^2L(G+k)) (using notation from the paper) many parameters, we have reduced the training/inference speed to O(N2Lk)\mathcal{O}(N^2Lk), making it O(G)\mathcal{O}(G) faster than an MLP with similar number of parameters.

References