Awesome Edge AI

June 10, 2026 · View on GitHub

From TinyML to Cognitive Edge Computing
Curated resources on data, model, system optimization + Large Models (LLMs/VLMs), Agents & On-Device AI (2016–2026)

Maintainers / Survey Authors

Xubin Wang^1,2,3
Weijia Jia^2,3*

¹ Hong Kong Baptist University
² Beijing Normal University
³ Beijing Normal-Hong Kong Baptist University
^* Corresponding author

License

Core Survey (2025, v2) + Dedicated Repository

Paper: Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment (arXiv:2501.03265, v2 Nov 2025)
Focused literature map, reproducibility artifacts & benchmarks: cognitive-edge-llm-agent-survey (the official companion repo for the new survey)

Legacy Survey (v1) — retained here for historical completeness
This repository continues to host and reference the original survey:

Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies

Legacy citation (for the data-model-system triad paper):

@article{wang2025optimizing,
  title={Optimizing edge AI: A comprehensive survey on data, model, and system strategies},
  author={Wang, Xubin and Jia, Weijia},
  journal={arXiv e-prints},
  pages={arXiv--2501},
  year={2025}
}

Abstract

This living repository curates the most important advances in Edge AI, spanning:

Classic data / model / system optimization for tiny deep learning (CNNs, RNNs, efficient architectures).
The new frontier: on-device / edge Large Language Models (LLMs), Vision-Language Models (VLMs), Small Language Models (SLMs), efficient inference, quantization, speculative decoding, KV-cache optimization, on-device training, and AI Agents that run with tool use on phones, microcontrollers, and NPUs.

It serves researchers, engineers, and students who want to deploy real intelligence at the extreme edge (KB–few GB memory, mW–few W power). The list is actively maintained and regularly enriched with 2023–2026 literature, frameworks, benchmarks, and hardware.

Cite the main survey (v2):

@article{wang2025cognitive,
  title={Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment},
  author={Wang, Xubin and Li, Qing and Jia, Weijia},
  journal={arXiv preprint arXiv:2501.03265},
  year={2025}
}

New: Unified Architecture & Cognitive Edge (2023–2026)

Master Overview Figure — the Cognitive Edge Computing unified stack (Fig 1):

Cognitive Edge AI Architecture

Five-layer architecture: Hardware (NPU/GPU/MCU) → Runtimes (llama.cpp, MLC-LLM, ExecuTorch) → Model Efficiency (Quantization, Pruning, KD) → Agentic/Cognitive (LLM Agents, RAG, Planning) → Applications (Healthcare, Smart Home, Autonomous) — with cross-cutting concerns (security, energy, benchmarks, data pipeline, feedback loops, networking, standardization) on the left. See the full Modern Era section below for detailed 2023–2026 literature and supporting figures.

New: Unified Architecture & Cognitive Edge (2023–2026)
New: Federated Learning on Edge
New: TinyML & Microcontroller AI
New: Edge AI Security & Privacy
New: On-Device Training & Personalization
New: Multimodal & Embodied Edge AI
New: Real-World Applications & Case Studies
Historical Foundations (Legacy v1 Content — fully retained)
Contributing

New: Federated Learning on Edge

Federated Learning (FL) enables collaborative model training across distributed edge devices without centralizing raw data, crucial for privacy-preserving edge AI. Recent advances combine FL with LLMs, PEFT, and heterogeneous edge hardware.

Federated Learning on Edge

Federated Learning Foundations & Systems

Title & Basic Information	Affiliation	Code
Communication-Efficient Learning of Deep Networks from Decentralized Data (FedAvg) (AISTATS 2017)	Google	--
Federated Learning: Strategies for Improving Communication Efficiency (arXiv 2016)	Google	--
Federated Optimization in Heterogeneous Networks (FedProx) (MLSys 2020)	University of Michigan	Code
SCAFFOLD: Stochastic Controlled Averaging for Federated Learning (ICML 2020)	EPFL	--
Adaptive Federated Optimization (ICLR 2021)	Google Research	--
Federated Learning with Matched Averaging (FedMA) (ICLR 2020)	IBM Research	--

Federated Learning for LLMs & Edge Models

Title & Basic Information	Affiliation	Code
Federated Learning of Large Language Models via Parameter-Efficient Tuning (arXiv 2023)	Zhejiang University	--
FedPETuning: Federated Parameter-Efficient Tuning for Large Language Models (arXiv 2023)	--	--
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes (ICML 2024)	--	--
DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices (arXiv 2026)	--	--
FedGen-Edge: Parameter-Efficient and Personalized Federated Training of Generative Models at the Edge (arXiv 2025)	--	--
Federated Black-box Prompt Tuning System for Large Language Models on the Edge (MobiCom 2024)	--	--
Towards Federated Learning on the Edge: A Survey of Systems, Challenges, and Opportunities (ACM CSUR 2024)	--	--
Split Learning for Distributed Deep Neural Networks (arXiv 2018)	--	--
Efficient Split Learning for Collaborative Edge AI (IEEE IoTJ 2023)	--	--

New: TinyML & Microcontroller AI

TinyML pushes AI inference to ultra-low-power microcontrollers (MCUs) with KB-level memory and mW-level power budgets. The field has rapidly evolved from basic CNNs to on-device Transformers and small language models.

TinyML Ecosystem

Foundational TinyML Papers

Title & Basic Information	Affiliation	Code
MCUNet: Tiny Deep Learning on IoT Devices (NeurIPS 2020)	MIT HAN Lab	Code
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning (NeurIPS 2021)	MIT HAN Lab	Code
MCUNetV3: On-Device Training Under 256KB Memory (NeurIPS 2022)	MIT HAN Lab	Code
TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning (NeurIPS 2020)	MIT HAN Lab	Code
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers (MLSys 2021)	Arm Research	--
TinyML: Current Progress and Future Directions (arXiv 2020)	Harvard / MIT	--
On-Device Training Under 256KB Memory (NeurIPS 2022)	MIT HAN Lab	Code
TinyEngine: Efficient Training and Inference on Microcontrollers (2023)	MIT HAN Lab	Code

TinyML Frameworks & Tools

TensorFlow Lite for Microcontrollers (TFLM) — Google's framework for MCU inference.
CMSIS-NN — ARM's optimized NN kernels for Cortex-M.
Edge Impulse — End-to-end TinyML development platform.
Arduino Edge / TinyML Kit — Arduino's TinyML hardware + software.
microTVM — Apache TVM for microcontrollers.
Neuton TinyML — AutoML for ultra-tiny models.

New: Edge AI Security & Privacy

As edge devices handle increasingly sensitive data with on-device LLMs, security and privacy have become paramount. Key topics include adversarial robustness, model extraction defense, differential privacy, and TEE-based secure inference.

Security & Adversarial Robustness

Title & Basic Information	Affiliation	Code
Towards Deep Learning Models Resistant to Adversarial Attacks (ICLR 2018)	MIT / UC Berkeley	--
Adversarial Examples for Semantic Segmentation and Object Detection (ICCV 2017)	Johns Hopkins	--
Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey (IEEE Access 2018)	--	--
Differential Privacy: A Survey of Results (TAMC 2008)	Microsoft Research	--
Speed Kills: Exploring Confused Deputy Attacks Through Edge AI Accelerators (CVE-2025-66425, arXiv 2026)	--	--
Competition for Attention Predicts Good-to-Bad Tipping in Edge AI (arXiv 2026)	--	--
Integer-Arithmetic-Only Certified Robustness for Quantized Neural Networks (ICCV 2021)	USC	--

Privacy-Preserving Edge AI

Title & Basic Information	Affiliation	Code
Deep Learning with Differential Privacy (CCS 2016)	Google	--
Federated Learning with Differential Privacy: Algorithms and Performance Analysis (IEEE TIFS 2020)	--	--
DPFinLLM: Privacy-Enhanced Lightweight LLM for On-Device Financial Applications (arXiv 2025)	--	--
SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical LLM Deployment (IEEE ICEdge 2025)	--	--
On-Device Generative AI for GDPR-Compliant Visual Monitoring (arXiv 2026)	--	--
PolyLink: A Blockchain-Based Decentralized Edge AI Platform for LLM Inference (arXiv 2025)	PolyU	Code
Privacy-Preserving Multimodal Wearable for Local Voice-and-Vision Inference (arXiv 2025)	UMD	--
Secure Multi-LLM Agentic AI and Agentification for Edge General Intelligence by Zero-Trust (arXiv 2025)	--	--

New: On-Device Training & Personalization

Beyond inference, enabling on-device training and fine-tuning allows models to adapt to user-specific data and changing environments without privacy leakage.

On-Device Training & Fine-tuning

Title & Basic Information	Affiliation	Code
On-Device Training Under 256KB Memory (NeurIPS 2022)	MIT HAN Lab	Code
TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning (NeurIPS 2020)	MIT HAN Lab	Code
Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning (IEEE TCAD 2020)	University of Pittsburgh	--
Octo: INT8 Training with Loss-aware Compensation and Backward Quantization for Tiny On-device Learning (USENIX ATC 2021)	PolyU	Code
LoRA: Low-Rank Adaptation of Large Language Models (ICLR 2022)	Microsoft	Code
QLoRA: Efficient Finetuning of Quantized LLMs (NeurIPS 2023)	University of Washington	Code
DoRA: Weight-Decomposed Low-Rank Adaptation (ICML 2024)	--	Code
VeRA: Vector-based Random Matrix Adaptation (ICLR 2024)	--	--
Unlocking the Edge Deployment and On-Device Acceleration of Multi-LoRA Enabled One-for-All Foundational LLM (ACL 2026)	Samsung	--
Parameter-Efficient and Personalized Federated Training of Generative Models at the Edge (FedGen-Edge) (arXiv 2025)	--	--
PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing (IEEE TCAS-I 2022)	Tsinghua	--

New: Multimodal & Embodied Edge AI

Multimodal models (vision + language + audio) and embodied AI systems (robots, AR/VR, autonomous vehicles) running on edge devices represent the next frontier. Key challenges include fusing modalities under tight memory/power budgets.

Edge AI Agent Architecture

Multimodal Models on Edge

Title & Basic Information	Affiliation	Code
MiniCPM-V: A GPT-4V Level MLLM on Your Phone (arXiv 2024)	OpenBMB / THU	Code
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction (arXiv 2026)	OpenBMB	--
MobileVLM: Vision-Language Model for Mobile Devices (arXiv 2023)	Meituan	Code
LLaVA-1.5: Improved Baselines for Visual Instruction Tuning (NeurIPS 2024)	UW-Madison / Microsoft	Code
Self-adapting Large Visual-Language Models to Edge Devices Across Visual Modalities (ECCV 2024)	--	--
VaVLM: Toward Efficient Edge-Cloud Video Analytics With Vision-Language Models (IEEE TBC 2025)	--	--
AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution (arXiv 2026)	Intel Labs / CMU	--
FastReasonSeg: Fast Reasoning Segmentation for Images and Videos on Edge (arXiv 2025)	--	--

Embodied AI & Robotics on Edge

Title & Basic Information	Affiliation	Code
VLA-Perf: How Fast Can I Run My VLA? Demystifying VLA Inference Performance (arXiv 2026)	Stanford	--
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (arXiv 2023)	Google DeepMind	--
Octo: An Open-Source Generalist Robot Policy (RSS 2024)	UC Berkeley	Code
π0: A Vision-Language-Action Flow Model for General Robot Control (arXiv 2024)	Physical Intelligence	--
An Agentic AI Framework with LLMs and CoT for UAV-Assisted Logistics Scheduling with MEC (arXiv 2026)	NTU	--

New: Real-World Applications & Case Studies

Edge AI is transforming diverse industries. This section highlights real-world deployments and application-focused research.

Healthcare & Wellness

Title & Basic Information	Affiliation	Code
ECG Foundation Models and Medical LLMs for Agentic Cardiovascular Intelligence at the Edge (arXiv 2026)	KAUST	--
A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents (BioCAS 2025)	HKUST	--
Edge2Analysis: A Novel AIoT Platform for Atrial Fibrillation Recognition and Detection (IEEE JBHI 2022)	Sun Yat-Sen	--
Accessible Melanoma Detection Using Smartphones and Mobile Image Analysis (IEEE TMM 2018)	SUTD	--
Edge-Based Compression and Classification for Smart Healthcare Systems (ESWA 2019)	Qatar University	--

Smart Home & IoT

Title & Basic Information	Affiliation	Code
AIoT Smart Home via Autonomous LLM Agents (IEEE IoTJ 2024)	--	--
BitRL-Light: 1-bit LLM Agents with DRL for Energy-Efficient Smart Home Lighting (IPCCC 2025)	--	--
VoiceAlign: A Shimming Layer for Enhancing the Usability of Legacy VUI Systems (IUI 2026)	--	--
Privacy-Preserving Multimodal Wearable for Local Voice-and-Vision Inference (arXiv 2025)	UMD	--

Autonomous Driving & Transportation

Title & Basic Information	Affiliation	Code
Edge Computing for Autonomous Driving: Opportunities and Challenges (Proc. IEEE 2019)	Wayne State	--
Edge Intelligence for Autonomous Driving in 6G Wireless System (IEEE Wireless Comm. 2021)	--	--
LLM-Generated Fault Scenarios for Evaluating Perception-Driven Lane Following in Autonomous Edge Systems (arXiv 2026)	--	--
Efficient On-Device Training for Object Detection at the Edge (CVPRW 2020)	ASU	--

Industrial IoT & Manufacturing

Title & Basic Information	Affiliation	Code
Edge Computing in Industrial Internet of Things: Architecture, Advances and Challenges (IEEE COMST 2020)	--	--
Artificial Intelligence-Driven Mechanism for Edge Computing-Based Industrial Applications (IEEE TII 2019)	--	--
A Reconfigurable Method for Intelligent Manufacturing Based on Industrial Cloud and Edge Intelligence (IEEE IoTJ 2019)	--	--
Rethinking On-Device LLM Reasoning for IoT DDoS Detection (arXiv 2026)	--	--

Edge AI in 6G Networks

Title & Basic Information	Affiliation	Code
6G Needs Agents: Toward Agentic AI-Native Networks for Autonomous Intelligence (arXiv 2026)	--	--
CORE: Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of LLM Agents Over Hierarchical Edge (IEEE Comm. Mag. 2026)	--	--
GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference (arXiv 2026)	BJTU	--
Fast Collaborative Inference via Distributed Speculative Decoding (TSLT) (arXiv 2025)	--	--
Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding (TK-SLT) (WCSP 2025)	--	--
A Survey on Cloud-Edge-Terminal Collaborative Intelligence in AIoT Networks (arXiv 2025)	--	--

Historical Foundations (Legacy v1 Content — fully retained)

All previous papers, tables, and classic Data-Model-System content from the original v1 survey are preserved below without any deletions.

1. Background Knowledge

1.1. Edge Computing

Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data generation. This proximity is expected to improve response times, reduce bandwidth consumption, and enable real-time analytics.

Edge Computing Infrastructure

1.2. Edge AI

Edge AI refers to the deployment of artificial intelligence (AI) algorithms and models directly on edge devices, such as mobile phones, Internet of Things (IoT) devices, and smart sensors. By processing data locally, Edge AI enables real-time decision-making, reduces the need for data transmission to remote servers, and enhances data privacy and security. The proliferation of edge devices and the demand for intelligent, low-latency applications have made Edge AI a critical area of research and development.

1.2.1. Blogs About Edge AI

2. Our Survey (To be released)

2.1 The Taxonomy of the Discussed Topics

Framework

Edge AI — Unified Taxonomy (New):

Edge AI Unified Taxonomy

2.2 Edge AI Optimization Triad

We introduce a data-model-system optimization triad for edge deployment. Scope

2.3 The Edge AI Deployment Pipeline

An overview of edge deployment. The figure shows a general pipeline from the three aspects of data, model and system. Note that not all steps are necessary in real applications.

Research Scope Overview — Edge AI:

Research Scope

3. The Data-Model-System Optimization Triad

3.1. Data Optimization

An overview of data optimization operations. Data cleaning improves data quality by removing errors and inconsistencies in the raw data. Feature compression is used to eliminate irrelevant and redundant features. For scarce data, data augmentation is employed to increase the data size. Data

3.1.1. Data Cleaning

Title & Basic Information	Affiliation	Code
Active label cleaning for improved dataset quality under resource constraints[J]. Nature communications, 2022.	Microsoft Research Cambridge	Code
Locomotion mode recognition using sensory data with noisy labels: A deep learning approach. IEEE Trans. on Mobile Computing.	Indian Institute of Technology BHU Varanasi	Code
Big data cleaning based on mobile edge computing in industrial sensor-cloud[J]. IEEE Trans. on Industrial Informatics, 2019.	Huaqiao University	--
Federated data cleaning: Collaborative and privacy-preserving data cleaning for edge intelligence[J]. IoTJ, 2020.	Xidian University	--
A data stream cleaning system using edge intelligence for smart city industrial environments[J]. IEEE Trans. on Industrial Informatics, 2021.	Hangzhou Dianzi University	--
Protonn: Compressed and accurate knn for resource-scarce devices[C] ICML, 2017.	Microsoft Research, India	Code
Intelligent data collaboration in heterogeneous-device iot platforms[J]. ACM Trans. on Sensor Networks (TOSN), 2021.	Hangzhou Dianzi University	--

3.1.2. Feature Compression

3.1.2.1. Feature Selection

Title & Basic Information	Affiliation	Code
Accessible melanoma detection using smartphones and mobile image analysis[J]. IEEE Trans. on Multimedia, 2018.	Singapore University of Technology and Design	--
ActID: An efficient framework for activity sensor based user identification[J]. Computers & Security, 2021.	University of Houston-Clear Lake	--
Descriptor Scoring for Feature Selection in Real-Time Visual Slam[C] ICIP, 2020.	Processor Architecture Research Lab, Intel Labs	--
Edge2Analysis: a novel AIoT platform for atrial fibrillation recognition and detection[J]. IEEE Journal of Biomedical and Health Informatics, 2022.	Sun Yat-Sen University	--
Feature selection with limited bit depth mutual information for portable embedded systems[J]. Knowledge-Based Systems, 2020.	CITIC, Universidade da Coruña	--
Seremas: Self-resilient mobile autonomous systems through predictive edge computing[C] SECON, 2021.	University of California, Irvine	--
A covid-19 detection algorithm using deep features and discrete social learning particle swarm optimization for edge computing devices[J]. ACM Trans. on Internet Technology (TOIT), 2021.	Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System	--

3.1.2.2. Feature Extraction

Title & Basic Information	Affiliation	Code
Supervised compression for resource-constrained edge computing systems[C] WACV, 2022.	University of California, Irvine	Code
"Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification." CVPR, 2013.	University of Science and Technology of China	--
Toward intelligent sensing: Intermediate deep feature compression[J]. TIP, 2019.	Nangyang Technological University	--
Selective feature compression for efficient activity recognition inference[C] ICCV, 2021.	Amazon Web Services	--
Video coding for machines: A paradigm of collaborative compression and intelligent analytics[J]. TIP, 2020.	Peking University	--
Communication-computation trade-off in resource-constrained edge inference[J]. IEEE Communications Magazine, 2020.	The Hong Kong Polytechnic University	Code
Edge-based compression and classification for smart healthcare systems: Concept, implementation and evaluation[J]. ESWA, 2019.	Qatar University	--
EFCam: Configuration-adaptive fog-assisted wireless cameras with reinforcement learning[C] SECON, 2021.	Nanyang Technological University	--
Edge computing for smart health: Context-aware approaches, opportunities, and challenges[J]. IEEE Network, 2019.	Qatar University	--
DEEPEYE: A deeply tensor-compressed neural network for video comprehension on terminal devices[J]. TECS, 2020.	Shanghai Jiao Tong University	--
CROWD: crow search and deep learning based feature extractor for classification of Parkinson’s disease[J]. TOIT, 2021.	Taif University	--
"Deep-Learning Based Monitoring Of Fog Layer Dynamics In Wastewater Pumping Stations", Water research 202 (2021): 117482.	Deltares	--
Distributed and efficient object detection via interactions among devices, edge, and cloud[J]. IEEE Trans. on Multimedia, 2019.	Central South University	--

3.1.3. Data Augmentation

Title & Basic Information	Affiliation	Code
An effective litchi detection method based on edge devices in a complex scene[J]. Biosystems Engineering, 2022.	Beihang University	--
Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling[J]. Cognitive Computation, 2018.	Tsinghua University	--
Multiuser physical layer authentication in internet of things with data augmentation[J]. IoTJ, 2019.	University of Electronic Science and Technology of China	--
Data-augmentation-based cellular traffic prediction in edge-computing-enabled smart city[J]. TII, 2020.	University of Electronic Science and Technology of China	--
Towards light-weight and real-time line segment detection[C] AAAI, 2022.	NAVER/LINE Corp.	Code
Intrusion Detection System After Data Augmentation Schemes Based on the VAE and CVAE[J]. IEEE Trans. on Reliability, 2022.	Guangdong Ocean University	--
Magicinput: Training-free multi-lingual finger input system using data augmentation based on mnists[C] ICIP, 2021.	Shanghai Jiao Tong University	--

3.2. Model Optimization

An overview of model optimization operations. Model design involves creating lightweight models through manual and automated techniques, including architecture selection, parameter tuning, and regularization. Model compression involves using various techniques, such as pruning, quantization, and knowledge distillation, to reduce the size of the model and obtain a compact model that requires fewer resources while maintaining high accuracy. Model

3.2.1. Model Design

3.2.1.1. Compact Architecture Design

Title & Basic Information	Affiliation	Code
Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv, 2017.	Google Inc.	Code
Mobilenetv2: Inverted residuals and linear bottlenecks[C] CVPR, 2018.	Google Inc.	Code
Searching for mobilenetv3[C]// ICCV, 2019.	Google Inc.	Code
Rethinking bottleneck structure for efficient mobile network design[C] ECCV, 2020.	National University of Singapore	Code
Mnasnet: Platform-aware neural architecture search for mobile[C] CVPR, 2019.	Google Brain	Code
Shufflenet: An extremely efficient convolutional neural network for mobile devices[C] CVPR, 2018.	Megvii Inc (Face++)	Code
Shufflenet v2: Practical guidelines for efficient cnn architecture design[C] ECCV, 2018.	Megvii Inc (Face++)	Code
Single path one-shot neural architecture search with uniform sampling[C] ECCV, 2020.	Megvii Inc (Face++)	Code
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv, 2016.	DeepScale∗ & UC Berkeley	Code
Squeezenext: Hardware-aware neural network design[C] CVPR Workshops. 2018.	UC Berkeley	Code
Ghostnet: More features from cheap operations[C] CVPR, 2020.	Noah’s Ark Lab, Huawei Technologies	Code
Efficientnet: Rethinking model scaling for convolutional neural networks[C] ICML, 2019.	Google Brain	Code
Efficientnetv2: Smaller models and faster training[C] ICML, 2021.	Google Brain	Code
Efficientdet: Scalable and efficient object detection[C] CVPR, 2020.	Google Brain	Code
Condensenet: An efficient densenet using learned group convolutions[C] CVPR, 2018.	Cornell University	Code
Condensenet v2: Sparse feature reactivation for deep networks[C] CVPR, 2021.	Tsinghua University	Code
Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation[C] ECCV, 2018.	University of Washington	Code
Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network[C] CVPR, 2019.	University of Washington	Code
Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search[C] CVPR, 2019.	UC Berkeley	Code
Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions[C] CVPR, 2020.	UC Berkeley	Code
Fbnetv3: Joint architecture-recipe search using predictor pretraining[C] CVPR, 2021.	Facebook Inc. & UC Berkeley	--
Pelee: A real-time object detection system on mobile devices[J]. NeurIPS, 2021.	University of Western Ontario	Code
Going deeper with convolutions[C] CVPR, 2015.	Google Inc.	Code
Batch normalization: Accelerating deep network training by reducing internal covariate shift[C] ICML, 2015.	Google Inc.	Code
Rethinking the inception architecture for computer vision[C]// CVPR, 2016.	Google Inc.	Code
Inception-v4, inception-resnet and the impact of residual connections on learning[C] AAAI, 2017.	Google Inc.	Code
Xception: Deep learning with depthwise separable convolutions[C] CVPR, 2017.	Google, Inc.	Code
Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[J]. arXiv, 2021.	Apple	Code
Lite transformer with long-short range attention[J]. arXiv, 2020.	Massachusetts Institute of Technology	Code
Coordinate attention for efficient mobile network design[C] CVPR, 2021.	National University of Singapore	Code
ECA-Net: Efficient channel attention for deep convolutional neural networks[C] CVPR, 2020.	Tianjin University	Code
Sa-net: Shuffle attention for deep convolutional neural networks[C] ICASSP, 2021.	Nanjing University	Code
Triplet Attention: Rethinking the Similarity in Transformers[C] KDD, 2021.	Beihang University	Code
Resnest: Split-attention networks[C] CVPR, 2020.	Meta	Code

3.2.1.2. Neural Architecture Search (NAS)

Title & Basic Information	Affiliation	Code
FTT-NAS: Discovering fault-tolerant convolutional neural architecture[J]. TODAES), 2021.	Tsinghua University	Code
An adaptive neural architecture search design for collaborative edge-cloud computing[J]. IEEE Network, 2021.	Nanjing University of Posts and Telecommunications	--
Binarized neural architecture search for efficient object recognition[J]. IJCV, 2021.	Beihang University	--
Multiobjective reinforcement learning-based neural architecture search for efficient portrait parsing[J]. IEEE Trans. on Cybernetics, 2021.	University of Electronic Science and Technology of China	--
Intermittent-aware neural architecture search[J]. ACM Transactions on Embedded Computing Systems (TECS), 2021.	Academia Sinica and National Taiwan University	Code
Hardcore-nas: Hard constrained differentiable neural architecture search[C] ICML, 2021.	Alibaba Group, Tel Aviv, Israel	Code
MemNAS: Memory-efficient neural architecture search with grow-trim learning[C] CVPR, 2020.	Beijing University of Posts and Telecommunications	--
Pvnas: 3D neural architecture search with point-voxel convolution[J]. TPAMI, 2021.	Massachusetts Institute of Technology	--
Toward tailored models on private aiot devices: Federated direct neural architecture search[J]. IoTJ, 2022.	Northeastern University, Qinhuangdao	--
Automatic design of convolutional neural network architectures under resource constraints[J]. TNNLS, 2021.	Sichuan University	--

3.2.2. Model Compression

3.2.2.1. Model Pruning

Title & Basic Information	Affiliation	Code
Supervised compression for resource-constrained edge computing systems[C] WACV, 2022.	University of Pittsburgh	--
Train big, then compress: Rethinking model size for efficient training and inference of transformers[C] ICML, 2020.	UC Berkeley	--
Hrank: Filter pruning using high-rank feature map[C]// CVPR, 2020.	Xiamen University	Code
Clip-q: Deep network compression learning by in-parallel pruning-quantization[C] CVPR, 2018.	Simon Fraser University	--
Sparse: Sparse architecture search for cnns on resource-constrained microcontrollers[J]. NeurIPS, 2019.	Arm ML Research	--
Deepadapter: A collaborative deep learning framework for the mobile web using context-aware network pruning[C] INFOCOM, 2020.	Beijing University of Posts and Telecommunications	--
SCANN: Synthesis of compact and accurate neural networks[J]. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2021.	Princeton University	--
Directx: Dynamic resource-aware cnn reconfiguration framework for real-time mobile applications[J]. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2020.	George Mason University	--
Pruning deep reinforcement learning for dual user experience and storage lifetime improvement on mobile devices[J]. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2020.	City University of Hong Kong	--
SuperSlash: A unified design space exploration and model compression methodology for design of deep learning accelerators with reduced off-chip memory access volume[J]. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2020.	Information Technology University	--
Penni: Pruned kernel sharing for efficient CNN inference[C] ICML, 2020.	Duke University	Code
Fast operation mode selection for highly efficient iot edge devices[J]. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2019.	Karlsruhe Institute of Technology	--
Efficient on-chip learning for optical neural networks through power-aware sparse zeroth-order optimization[C] AAAI, 2021.	University of Texas at Austin	--
A Fast Post-Training Pruning Framework for Transformers[C]// NeurIPS	UC Berkeley	Code
Radio frequency fingerprinting on the edge[J]. TMC, 2021.	Northeastern University, Boston	--
Exploring sparsity in image super-resolution for efficient inference[C] CVPR, 2021.	National University of Defense Technology	Code
O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference[J]. TPDS, 2020.	Boston University	--
Enabling on-device cnn training by self-supervised instance filtering and error map pruning[J]. TCAD, 2020.	University of Pittsburgh	--
Dropnet: Reducing neural network complexity via iterative pruning[C] ICML, 2020.	National University of Singapore	Code
Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference[C] MICRO-54, 2021.	Harvard University	Code
Fusion-catalyzed pruning for optimizing deep learning on intelligent edge devices[J]. TCAD, 2020.	Chinese Academy of Sciences	--
3D CNN acceleration on FPGA using hardware-aware pruning[C] DAC, 2020.	Northeastern University, MA	--
Width & depth pruning for vision transformers[C] AAAI, 2020.	Institute of Computing Technology, Chinese Academy of Sciences	--
Prive-hd: Privacy-preserved hyperdimensional computing[C] DAC, 2020.	UC San Diego	--
NestFL: efficient federated learning through progressive model pruning in heterogeneous edge computing[C] MobiCom, 2022.	Purple Mountain Laboratories, Nanjing	--

Title & Basic Information	Affiliation	Code
Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions[C] ICML, 2018.	Texas A&M University	Code
T-basis: a compact representation for neural networks[C] ICML, 2020.	ETH Zurich	Code
"Soft Weight-Sharing for Neural Network Compression." International Conference on Learning Representations.	University of Amsterdam	Code
ShiftAddNAS: Hardware-inspired search for more accurate and efficient neural networks[C] ICML, 2022.	Rice University	Code
EfficientTDNN: Efficient architecture search for speaker recognition[J]. IEEE/ACM Trans. on Audio, Speech, and Language Processing, 2022.	Tongji University	Code
A generic network compression framework for sequential recommender systems[C] SIGIR, 2020.	University of Science and Technology	Code
Neural architecture search for LF-MMI trained time delay neural networks[J]. IEEE/ACM Trans. on Audio, Speech, and Language Processing, 2022.	The Chinese University of Hong Kong	--
Structured transforms for small-footprint deep learning[J]. NeurIPS, 2015.	Google, New York	--

3.2.2.3. Model Quantization

Title & Basic Information	Affiliation	Code
Fractrain: Fractionally squeezing bit savings both temporally and spatially for efficient dnn training[J]. NeurIPS, 2020.	Rice University	Code
Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference[C] MICRO-54, 2021.	Harvard University	--
Stochastic precision ensemble: self-knowledge distillation for quantized deep neural networks[C] AAAI, 2021.	Seoul National University	--
Q-capsnets: A specialized framework for quantizing capsule networks[C] DAC, 2020.	Technische Universitat Wien (TU Wien)	Code
Fspinn: An optimization framework for memory-efficient and energy-efficient spiking neural networks[J]. TCAD, 2020.	Technische Universität Wien	--
Octo: INT8 Training with Loss-aware Compensation and Backward Quantization for Tiny On-device Learning[C] USENIX Annual Technical Conference. 2021.	Hong Kong Polytechnic University	Code
Hardware-centric automl for mixed-precision quantization[J]. IJCV, 2020.	Massachusetts Institute of Technology	--
An automated quantization framework for high-utilization rram-based pim[J]. TCAD, 2021.	Capital Normal University	--
Exact neural networks from inexact multipliers via fibonacci weight encoding[C] DAC, 2021.	Swiss Federal Institute of Technology Lausanne (EPFL)	--
Integer-arithmetic-only certified robustness for quantized neural networks[C] ICCV, 2021.	University of Southern California	--
Bits-Ensemble: Toward Light-Weight Robust Deep Ensemble by Bits-Sharing[J]. TCAD, 2022.	McGill University	--
Similarity-Aware CNN for Efficient Video Recognition at the Edge[J]. TCAD, 2021.	University of Southampton	--
Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization[C] CVPR, 2022.	Huawei Noah's Ark Lab	--

3.2.2.4. Knowledge Distillation

Title & Basic Information	Affiliation	Code
Be your own teacher: Improve the performance of convolutional neural networks via self distillation[C] ICCV, 2019.	Tsinghua University	Code
Dynabert: Dynamic bert with adaptive width and depth[J]. NeurIPS, 2020.	Huawei Noah’s Ark Lab	Code
Scan: A scalable neural networks framework towards compact and efficient models[J]. NeurIPS, 2019.	Tsinghua University	Code
Content-aware gan compression[C] CVPR, 2021.	Princeton University	--
Stochastic precision ensemble: self-knowledge distillation for quantized deep neural networks[C] AAAI, 2021.	Seoul National University	--
Cross-modal knowledge distillation for vision-to-sensor action recognition[C] ICASSP, 2022.	Texas State University	--
Learning efficient and accurate detectors with dynamic knowledge distillation in remote sensing imagery[J]. IEEE Trans. on Geoscience and Remote Sensing, 2021.	Chinese Academy of Sciences	--
On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation[C] SIGIR, 2022.	The University of Queensland	Code
Personalized edge intelligence via federated self-knowledge distillation[J]. TPDS, 2022.	Huazhong University of Science and Technology	--
Mobilefaceswap: A lightweight framework for video face swapping[C] AAAI, 2022.	Baidu Inc.	--
Dynamically pruning segformer for efficient semantic segmentation[C] ICASSP, 2022.	Amazon Halo Health & Wellness	--
CDFKD-MFS: Collaborative Data-Free Knowledge Distillation via Multi-Level Feature Sharing[J]. IEEE Trans. on Multimedia, 2022.	Beijing Institute of Technology	Code
Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation[J]. NeurIPS, 2022.	Beijing Institute of Technology	Code
Learning Accurate, Speedy, Lightweight CNNs via Instance-Specific Multi-Teacher Knowledge Distillation for Distracted Driver Posture Identification[J]. IEEE Trans. on Intelligent Transportation Systems, 2022.	Hefei Institutes of Physical Science (HFIPS), Chinese Academy of Sciences	--

3.2.2.5. Low-rank Factorization

Title & Basic Information	Affiliation	Code
Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification[C] CVPR workshops. 2020.	Duke University	--
MicroNet: Towards image recognition with extremely low FLOPs[J]. arXiv, 2020.	UC San Diego	--
Locality Sensitive Hash Aggregated Nonlinear Neighborhood Matrix Factorization for Online Sparse Big Data Analysis[J]. ACM/IMS Transactions on Data Science (TDS), 2022.	Hunan University	--

3.3. System Optimization

An overview of system optimization operations. Software optimization involves developing frameworks for lightweight model training and inference, while hardware optimization focuses on accelerating models using hardware-based approaches to improve computational efficiency on edge devices. System

3.3.1. Software Optimization

Title & Basic Information	Affiliation	Code
Hidet: Task-mapping programming paradigm for deep learning tensor programs[C] ASPLOS Conference, 2023.	University of Toronto	Code
SparkNoC: An energy-efficiency FPGA-based accelerator using optimized lightweight CNN for edge computing[J]. Journal of Systems Architecture, 2021.	Shanghai Advanced Research Institute, Chinese Academy of Sciences	--
Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices[C] ICCAD, 2016.	Institute of Computing Technology, Chinese Academy of Sciences	--
A unified optimization approach for cnn model inference on integrated gpus[C] ICPP, 2019.	Amazon Web Services	Code
ACG-engine: An inference accelerator for content generative neural networks[C] ICCAD, 2019.	University of Chinese Academy of Sciences	--
Edgeeye: An edge service framework for real-time intelligent video analytics[C] EDGESYS Conference, 2018.	University of Wisconsin-Madison	--
Haq: Hardware-aware automated quantization with mixed precision[C] CVPR, 2019.	Massachusetts Institute of Technology	--
Source compression with bounded dnn perception loss for iot edge computer vision[C] MobiCom, 2019.	Hewlett Packard Labs	--
A lightweight collaborative deep neural network for the mobile web in edge cloud[J]. TMC, 2020.	Beijing University of Posts and Telecommunications	--
Enabling incremental knowledge transfer for object detection at the edge[C] CVPR Workshops, 2020.	Arizona State university	--
DA3: Dynamic Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning[C] CVPR, 2022.	Arizona State University	--
An efficient GPU-accelerated inference engine for binary neural network on mobile phones[J]. Journal of Systems Architecture, 2021.	Sun Yat-sen University	Code
RAPID-RL: A Reconfigurable Architecture with Preemptive-Exits for Efficient Deep-Reinforcement Learning[C] ICRA, 2022.	Purdue University	--
A variational information bottleneck based method to compress sequential networks for human action recognition[C] WACV, 2021.	Indian Institute of Technology Delhi	--
EdgeDRNN: Recurrent neural network accelerator for edge inference[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2020.	University of Zürich and ETH Zürich	--
Structured pruning of recurrent neural networks through neuron selection[J]. Neural Networks, 2020.	University of Electronic Science and Technology of China	--
Dynamically hierarchy revolution: dirnet for compressing recurrent neural network on mobile devices[J]. arXiv, 2018.	Arizona State University	--
High-throughput cnn inference on embedded arm big. little multicore processors[J]. TCAD, 2019.	National University of Singapore	--
SCA: a secure CNN accelerator for both training and inference[C] DAC, 2020.	University of Pittsburgh	--
NeuLens: spatial-based dynamic acceleration of convolutional neural networks on edge[C] MobiCom, 2022.	New Jersey Institute of Technology	--
Weightless neural networks for efficient edge inference[C] PACT, 2022.	The University of Texas at Austin	Code
O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference[J]. TPDS, 2020.	--
Blockgnn: Towards efficient gnn acceleration using block-circulant weight matrices[C] DAC, 2021.	Peking University	--
{Hardware/Software}{Co-Programmable} Framework for Computational {SSDs} to Accelerate Deep Learning Service on {Large-Scale} Graphs[C] FAST, 2022.	KAIST	--
Achieving full parallelism in LSTM via a unified accelerator design[C] ICCD, 2020.	University of Pittsburgh	--
Pasgcn: An reram-based pim design for gcn with adaptively sparsified graphs[J]. TCAD, 2022.	Shanghai Jiao Tong University	--

3.3.2. Hardware Optimization

Title & Basic Information	Affiliation	Code
Ncpu: An embedded neural cpu architecture on resource-constrained low power devices for real-time end-to-end performance[C] MICRO, 2020.	Northwestern Univeristy, Evanston, IL	--
Reduct: Keep it close, keep it cool!: Efficient scaling of dnn inference on multi-core cpus with near-cache compute[C] ISCA, 2021.	ETH Zurich	--
FARNN: FPGA-GPU hybrid acceleration platform for recurrent neural networks[J]. TPDS, 2021.	Sungkyunkwan University	--
Apgan: Approximate gan for robust low energy learning from imprecise components[J]. IEEE Trans. on Computers, 2019.	University of Central Florida	--
An FPGA overlay for CNN inference with fine-grained flexible parallelism[J]. TACO, 2022.	International Institute of Information Technology	--
Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference[C] RTSS, 2019.	University of California, Riverside	--
Deadline-based scheduling for GPU with preemption support[C] RTSS, 2018.	University of Modena and Reggio Emilia Modena	--
Energon: Toward Efficient Acceleration of Transformers Using Dynamic Sparse Attention[J]. TCAD, 2022.	Peking University	--
Light-OPU: An FPGA-based overlay processor for lightweight convolutional neural networks[C] FPGA, 2022.	University of california, Los Angeles	--
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs[J]. arXiv, 2022.	Samsung AI Center, Cambridge	--
BitSystolic: A 26.7 TOPS/W 2b~ 8b NPU with configurable data flows for edge devices[J]. IEEE Trans. on Circuits and Systems I: Regular Papers, 2020.	Duke University	--
PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing[J]. IEEE Trans. on Circuits and Systems I: Regular Papers, 2022.	Tsinghua University	--