TinyNeXt

October 22, 2025 · View on GitHub


Official pytorch implementation of "An Efficient Hybrid Vision Transformer for TinyML Applications, ICCV'2025"

Abstract: To enable the deployment of Vision Transformers on resource-constrained mobile and edge devices, the development of efficient ViT models has attracted significant attention. Researchers achieved remarkable improvements in accuracy and speed by optimizing attention mechanisms and integrating lightweight CNN modules. However, existing designs often overlook runtime overhead from memory-bound operations and the shift in feature characteristics from spatial-dominant to semantic-dominant as networks deepen. This work introduces TinyNeXt, a family of efficient hybrid ViTs for TinyML, featuring Lean Single-Head Self-Attention to minimize memory-bound operations, and a macro design tailored to feature characteristics at different stages. TinyNeXt strikes a better accuracy-speed trade-off across diverse tasks and hardware platforms, outperforming state-of-the-art models of comparable scale. For instance, our TinyNeXt-T achieves a remarkable 71.5% top-1 accuracy with only 1.0M parameters on ImageNet-1K. Furthermore, compared to recent efficient models like MobileViT-XXS and MobileViT-XS, TinyNeXt-S and TinyNeXt-M achieve 3.7%/0.5% higher accuracy, respectively, while running 2.1×/2.6× faster on Nvidia Jetson Nano.



Comparison with SOTA models on ImgeNet-1K


Overview of TinyNeXt


Repository Structure

Model Performance

Image Classification Performance (ImageNet-1K)

ModelTop-1 AccuracyParametersMACsLatency
TinyNeXt-M75.3%2.3M475M19.4ms
TinyNeXt-S72.7%1.3M304M14.3ms
TinyNeXt-T71.5%1.0M259M12.7ms

Latency is measured on Nvidia Jetson Nano.

Object Detection Performance Based on SSDLite (MS-COCO 2017)

BackboneAPAP50AP75Parameters
TinyNeXt-S22.437.922.72.3M
TinyNeXt-M25.041.125.43.3M

Semantic Segmentation Performance Based on DeepLabv3 (Pascal VOC 2012)

BackboneParametersFlopsmIOU
TinyNeXt-S2.3M3.5G75.5
TinyNeXt-M3.3M5.1G76.9

Acknowledgements

We thank but not limited to following repositories for providing assistance for our research:

Citation

If you find this work helpful, please consider citing:

@inproceedings{tinynext_iccv2025,
	author    = {Zeng, Fanhong and Li, Huanan and Guan, Juntao and Fan, Rui and Wu, Tong and Wang, Xilong and Lai, Rui},
	title     = {An Efficient Hybrid Vision Transformer for TinyML Applications},
	booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
	month     = {October}, 
	year      = {2025},    
	pages     = {19914-19924} 
}