RKAN: Residual Kolmogorov-Arnold Network

October 23, 2025 · View on GitHub

License: MIT arXiv

Overview

Despite their immense success, deep convolutional neural networks (CNNs) can be difficult to optimize and costly to train due to hundreds of layers within the network depth. Conventional convolutional operations are fundamentally limited by their linear nature along with fixed activations, where many layers are needed to learn meaningful patterns in data. Because of the sheer size of these networks, this approach is simply computationally inefficient, and poses overfitting or gradient explosion risks, especially in small datasets. As a result, we introduce a "plug-in" module, called Residual Kolmogorov-Arnold Network (RKAN). Our module is highly compact, so it can be easily added into any stage (level) of traditional deep networks, where it learns to integrate supportive polynomial feature transformations to existing convolutional frameworks. RKAN offers consistent improvements over baseline models in different vision tasks and widely tested benchmarks, accomplishing cutting-edge performance on them.

RKAN Stage 4 Visualization

RKAN is currently integrated into and tested successfully on standard ResNet, ResNeXt, Wide ResNet (WRN), ResNet-D, ResNeSt, Res2Net, ECA-Net, SENet, GCNet, CBAM, PyramidNet, RegNet, DenseNet, SA-Net, SimAM, and ELA. On CIFAR-100, Tiny Imagenet, Food-101, networks are trained from scratch for 200 epochs using stochastic gradient descent (SGD) with a weight decay of 0.0005. On ImageNet-1k, networks are trained for 100 epochs with a weight decay of 0.0001. RandAugment, CutMix with a 50% probability (p = 0.5), and MixUp (α\alpha = 0.2) with a 30% probability are used as data augmentation. On MS COCO, we use Mask R-CNN for object detection and instance segmentation. Networks are trained for 12 epochs using pre-trained weights from ImageNet-1k. We use SGD with a weight decay of 0.0001, batch size of 8, and base learning rate of 0.01 that decays by a factor of 10 at epoch 9 and 12. For data augmentation, we apply horizontal flipping (p = 0.5), random scaling (≤ 10%), color jittering. RKAN blocks are added to the last (fourth) stage of the network. ResNet is set to the default backbone, where RKAN-ResNet-101 is shortened as RKANet-101. RKAN-augmented models are marked with (*).

We also introduce a larger variant of RKAN, RKAN-L, which uses the inverse bottleneck design (with a default bottleneck expansion multiplier of 4). RKANet-101-4×L achieves very competitive performance on CIFAR-100, outperforming all enhancement methods, modern ConvNets, and Vision Transformers. it also outperforms other "plug-in" channel and spatial attention mechanisms on ImageNet and MS COCO. It can even be integrated alongside other attention mechanisms as well, such as SENet, ECA-Net, SA-Net, etc. When intergrating RKAN into multiple stages, it performs better when RKAN blocks are only implemented into stages {3, 4} and we use the notation "E" (RKANet-E-101) to indicate this extended version. Intergrating RKAN into the first 2 stages (low-level features may not benefit from RKAN's highly complex polynomial feature transformations) could disrupt the original model's carefully optimized learning process. It should be noted that the integration only works with the standard RKAN variant and does not work with RKAN-L variants. More details can be found in our original paper on arXiv.

RKAN Multi-stages

Usage

All necessary code is included in the repository to run RKAN with different backbone architectures on different datasets.

  1. Clone the repository or download the ZIP file
  2. Run the training.ipynb notebook
  3. Key configuration parameters:
    # Select dataset
    dataset = "cifar_100"  # Options: cifar_100, cifar_10, svhn, tiny_imagenet, food_101, imagenet_1k, coco_detection
    
    # Select model
    model_name = "resnet50"  # See model_configs for all supported models
    
    # RKAN configuration
    reduce_factor = [2, 2, 2, 2]  # Reduce factors for each stage
    mechanisms = [None, None, None, "addition"]  # Aggregation mechanism for each stage, input None to remove RKAN from the stage (added only to stage 4 by default)
    kan_type = "chebyshev"  # Type of KAN convolutions, including chebyshev, rbf, b_spline, jacobi, hermite, etc.
    inv_bottleneck, inv_factor = False, 4  # Turning on inv_bottleneck will use RKAN-L, inv_factor controls the inverse bottleneck expansion multiplier
    

Results

CIFAR-100 (128×128) Results

ModelTop-1 AccuracyThroughput (img/s)Parameters
ResNet-10184.002,22242.71M
ResNet-15284.631,68358.35M
WRN-101-284.771,176125.04M
ResNet-101-D85.092,12642.72M
RKANet-101*85.121,85244.28M
ResNeXt-10185.281,25686.95M
RegNetY-32GF85.44789141.71M
RKANet-E-101*85.441,68944.68M
RKANet-101-2×L*85.481,64849.00M
RKANet-101-4×L*85.661,41255.30M
RKANet-101-6×L*85.951,21061.60M
RKANeXt-101*86.151,12088.53M
RKAN-RegNetY-32GF*87.03701145.27M

Tiny ImageNet (160×160) Results

RKAN ModelTop-1 AccuracyBaseline ModelTop-1 Accuracy
RKAN-WRN-101-277.56WRN-101-275.46
RKANeXt-10177.48ResNeXt-10175.57
RKANeXt-5075.41ResNeXt-5073.56
RKANet-15276.82ResNet-15274.88
RKANet-10176.29ResNet-10174.51
RKANet-5074.43ResNet-5072.85
RKAN-RegNetY-32GF77.79RegNetY-32GF75.90
RKAN-RegNetY-8GF77.13RegNetY-8GF75.58
RKAN-RegNetY-3.2GF76.05RegNetY-3.2GF74.07
RKAN-DenseNet-16175.79DenseNet-16174.14
RKAN-DenseNet-20175.12DenseNet-20173.10

ImageNet (224×224) Results

ModelTop-1 AccuracyThroughput (img/s)Parameters
RKANet-50-6×L*78.9150044.45M
RKANet-50-4×L*78.8063238.15M
RKANet-50-2×L*78.6578031.86M
RKANet-50*78.0294327.14M
ResNet-5077.151,21625.56M
RKAN-ELA-L-50*78.9250527.98M
ELA-L-5078.2357826.40M
RKAN-SENet-50*78.6677929.65M
SENet-5077.6896528.07M
RKAN-DenseNet-169*78.0077014.89M
DenseNet-16977.2584314.15M

MS COCO 2017 (640 pixels on shorter side) Results

ModelAPbboxAPbbox50APmaskAPmask50FPS
RKANet-50-2×L*36.1354.3032.2951.3897.2
RKANet-50*35.9254.2132.1651.20105.5
ResNet-5035.5953.5831.9450.79118.2
RKAN-SENet-50*36.3554.4832.3751.6482.4
SENet-5035.9454.1032.1451.1390.0

Citation

If you find our work useful, consider citing our paper at:

@article{yu2024rkan,
  title={Residual Kolmogorov-Arnold Network for Enhanced Deep Learning},
  author={Yu, Ray Congrui and Wu, Sherry and Gui, Jiang},
  journal={arXiv preprint arXiv:2410.05500},
  year={2024}
}