Awesome Edge AI
June 10, 2026 · View on GitHub
From TinyML to Cognitive Edge Computing
Curated resources on data, model, system optimization + Large Models (LLMs/VLMs), Agents & On-Device AI (2016–2026)
Maintainers / Survey Authors
- Xubin Wang1,2,3
- Weijia Jia2,3*
1 Hong Kong Baptist University
2 Beijing Normal University
3 Beijing Normal-Hong Kong Baptist University
* Corresponding author
Core Survey (2025, v2) + Dedicated Repository
- Paper: Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment (arXiv:2501.03265, v2 Nov 2025)
- Focused literature map, reproducibility artifacts & benchmarks: cognitive-edge-llm-agent-survey (the official companion repo for the new survey)
Legacy Survey (v1) — retained here for historical completeness
This repository continues to host and reference the original survey:
Legacy citation (for the data-model-system triad paper):
@article{wang2025optimizing,
title={Optimizing edge AI: A comprehensive survey on data, model, and system strategies},
author={Wang, Xubin and Jia, Weijia},
journal={arXiv e-prints},
pages={arXiv--2501},
year={2025}
}
Abstract
This living repository curates the most important advances in Edge AI, spanning:
- Classic data / model / system optimization for tiny deep learning (CNNs, RNNs, efficient architectures).
- The new frontier: on-device / edge Large Language Models (LLMs), Vision-Language Models (VLMs), Small Language Models (SLMs), efficient inference, quantization, speculative decoding, KV-cache optimization, on-device training, and AI Agents that run with tool use on phones, microcontrollers, and NPUs.
It serves researchers, engineers, and students who want to deploy real intelligence at the extreme edge (KB–few GB memory, mW–few W power). The list is actively maintained and regularly enriched with 2023–2026 literature, frameworks, benchmarks, and hardware.
Cite the main survey (v2):
@article{wang2025cognitive,
title={Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment},
author={Wang, Xubin and Li, Qing and Jia, Weijia},
journal={arXiv preprint arXiv:2501.03265},
year={2025}
}
New: Unified Architecture & Cognitive Edge (2023–2026)
Master Overview Figure — the Cognitive Edge Computing unified stack (Fig 1):

Five-layer architecture: Hardware (NPU/GPU/MCU) → Runtimes (llama.cpp, MLC-LLM, ExecuTorch) → Model Efficiency (Quantization, Pruning, KD) → Agentic/Cognitive (LLM Agents, RAG, Planning) → Applications (Healthcare, Smart Home, Autonomous) — with cross-cutting concerns (security, energy, benchmarks, data pipeline, feedback loops, networking, standardization) on the left. See the full Modern Era section below for detailed 2023–2026 literature and supporting figures.
Table of Contents
- New: Unified Architecture & Cognitive Edge (2023–2026)
- New: Federated Learning on Edge
- New: TinyML & Microcontroller AI
- New: Edge AI Security & Privacy
- New: On-Device Training & Personalization
- New: Multimodal & Embodied Edge AI
- New: Real-World Applications & Case Studies
- Historical Foundations (Legacy v1 Content — fully retained)
- Contributing
New: Federated Learning on Edge
Federated Learning (FL) enables collaborative model training across distributed edge devices without centralizing raw data, crucial for privacy-preserving edge AI. Recent advances combine FL with LLMs, PEFT, and heterogeneous edge hardware.

Federated Learning Foundations & Systems
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| Communication-Efficient Learning of Deep Networks from Decentralized Data (FedAvg) (AISTATS 2017) | -- | |
| Federated Learning: Strategies for Improving Communication Efficiency (arXiv 2016) | -- | |
| Federated Optimization in Heterogeneous Networks (FedProx) (MLSys 2020) | University of Michigan | Code |
| SCAFFOLD: Stochastic Controlled Averaging for Federated Learning (ICML 2020) | EPFL | -- |
| Adaptive Federated Optimization (ICLR 2021) | Google Research | -- |
| Federated Learning with Matched Averaging (FedMA) (ICLR 2020) | IBM Research | -- |
Federated Learning for LLMs & Edge Models
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| Federated Learning of Large Language Models via Parameter-Efficient Tuning (arXiv 2023) | Zhejiang University | -- |
| FedPETuning: Federated Parameter-Efficient Tuning for Large Language Models (arXiv 2023) | -- | -- |
| Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes (ICML 2024) | -- | -- |
| DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices (arXiv 2026) | -- | -- |
| FedGen-Edge: Parameter-Efficient and Personalized Federated Training of Generative Models at the Edge (arXiv 2025) | -- | -- |
| Federated Black-box Prompt Tuning System for Large Language Models on the Edge (MobiCom 2024) | -- | -- |
| Towards Federated Learning on the Edge: A Survey of Systems, Challenges, and Opportunities (ACM CSUR 2024) | -- | -- |
| Split Learning for Distributed Deep Neural Networks (arXiv 2018) | -- | -- |
| Efficient Split Learning for Collaborative Edge AI (IEEE IoTJ 2023) | -- | -- |
New: TinyML & Microcontroller AI
TinyML pushes AI inference to ultra-low-power microcontrollers (MCUs) with KB-level memory and mW-level power budgets. The field has rapidly evolved from basic CNNs to on-device Transformers and small language models.

Foundational TinyML Papers
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| MCUNet: Tiny Deep Learning on IoT Devices (NeurIPS 2020) | MIT HAN Lab | Code |
| MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning (NeurIPS 2021) | MIT HAN Lab | Code |
| MCUNetV3: On-Device Training Under 256KB Memory (NeurIPS 2022) | MIT HAN Lab | Code |
| TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning (NeurIPS 2020) | MIT HAN Lab | Code |
| MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers (MLSys 2021) | Arm Research | -- |
| TinyML: Current Progress and Future Directions (arXiv 2020) | Harvard / MIT | -- |
| On-Device Training Under 256KB Memory (NeurIPS 2022) | MIT HAN Lab | Code |
| TinyEngine: Efficient Training and Inference on Microcontrollers (2023) | MIT HAN Lab | Code |
TinyML Frameworks & Tools
- TensorFlow Lite for Microcontrollers (TFLM) — Google's framework for MCU inference.
- CMSIS-NN — ARM's optimized NN kernels for Cortex-M.
- Edge Impulse — End-to-end TinyML development platform.
- Arduino Edge / TinyML Kit — Arduino's TinyML hardware + software.
- microTVM — Apache TVM for microcontrollers.
- Neuton TinyML — AutoML for ultra-tiny models.
New: Edge AI Security & Privacy
As edge devices handle increasingly sensitive data with on-device LLMs, security and privacy have become paramount. Key topics include adversarial robustness, model extraction defense, differential privacy, and TEE-based secure inference.
Security & Adversarial Robustness
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| Towards Deep Learning Models Resistant to Adversarial Attacks (ICLR 2018) | MIT / UC Berkeley | -- |
| Adversarial Examples for Semantic Segmentation and Object Detection (ICCV 2017) | Johns Hopkins | -- |
| Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey (IEEE Access 2018) | -- | -- |
| Differential Privacy: A Survey of Results (TAMC 2008) | Microsoft Research | -- |
| Speed Kills: Exploring Confused Deputy Attacks Through Edge AI Accelerators (CVE-2025-66425, arXiv 2026) | -- | -- |
| Competition for Attention Predicts Good-to-Bad Tipping in Edge AI (arXiv 2026) | -- | -- |
| Integer-Arithmetic-Only Certified Robustness for Quantized Neural Networks (ICCV 2021) | USC | -- |
Privacy-Preserving Edge AI
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| Deep Learning with Differential Privacy (CCS 2016) | -- | |
| Federated Learning with Differential Privacy: Algorithms and Performance Analysis (IEEE TIFS 2020) | -- | -- |
| DPFinLLM: Privacy-Enhanced Lightweight LLM for On-Device Financial Applications (arXiv 2025) | -- | -- |
| SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical LLM Deployment (IEEE ICEdge 2025) | -- | -- |
| On-Device Generative AI for GDPR-Compliant Visual Monitoring (arXiv 2026) | -- | -- |
| PolyLink: A Blockchain-Based Decentralized Edge AI Platform for LLM Inference (arXiv 2025) | PolyU | Code |
| Privacy-Preserving Multimodal Wearable for Local Voice-and-Vision Inference (arXiv 2025) | UMD | -- |
| Secure Multi-LLM Agentic AI and Agentification for Edge General Intelligence by Zero-Trust (arXiv 2025) | -- | -- |
New: On-Device Training & Personalization
Beyond inference, enabling on-device training and fine-tuning allows models to adapt to user-specific data and changing environments without privacy leakage.
On-Device Training & Fine-tuning
New: Multimodal & Embodied Edge AI
Multimodal models (vision + language + audio) and embodied AI systems (robots, AR/VR, autonomous vehicles) running on edge devices represent the next frontier. Key challenges include fusing modalities under tight memory/power budgets.

Multimodal Models on Edge
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| MiniCPM-V: A GPT-4V Level MLLM on Your Phone (arXiv 2024) | OpenBMB / THU | Code |
| MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction (arXiv 2026) | OpenBMB | -- |
| MobileVLM: Vision-Language Model for Mobile Devices (arXiv 2023) | Meituan | Code |
| LLaVA-1.5: Improved Baselines for Visual Instruction Tuning (NeurIPS 2024) | UW-Madison / Microsoft | Code |
| Self-adapting Large Visual-Language Models to Edge Devices Across Visual Modalities (ECCV 2024) | -- | -- |
| VaVLM: Toward Efficient Edge-Cloud Video Analytics With Vision-Language Models (IEEE TBC 2025) | -- | -- |
| AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution (arXiv 2026) | Intel Labs / CMU | -- |
| FastReasonSeg: Fast Reasoning Segmentation for Images and Videos on Edge (arXiv 2025) | -- | -- |
Embodied AI & Robotics on Edge
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| VLA-Perf: How Fast Can I Run My VLA? Demystifying VLA Inference Performance (arXiv 2026) | Stanford | -- |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (arXiv 2023) | Google DeepMind | -- |
| Octo: An Open-Source Generalist Robot Policy (RSS 2024) | UC Berkeley | Code |
| π0: A Vision-Language-Action Flow Model for General Robot Control (arXiv 2024) | Physical Intelligence | -- |
| An Agentic AI Framework with LLMs and CoT for UAV-Assisted Logistics Scheduling with MEC (arXiv 2026) | NTU | -- |
New: Real-World Applications & Case Studies
Edge AI is transforming diverse industries. This section highlights real-world deployments and application-focused research.
Healthcare & Wellness
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| ECG Foundation Models and Medical LLMs for Agentic Cardiovascular Intelligence at the Edge (arXiv 2026) | KAUST | -- |
| A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents (BioCAS 2025) | HKUST | -- |
| Edge2Analysis: A Novel AIoT Platform for Atrial Fibrillation Recognition and Detection (IEEE JBHI 2022) | Sun Yat-Sen | -- |
| Accessible Melanoma Detection Using Smartphones and Mobile Image Analysis (IEEE TMM 2018) | SUTD | -- |
| Edge-Based Compression and Classification for Smart Healthcare Systems (ESWA 2019) | Qatar University | -- |
Smart Home & IoT
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| AIoT Smart Home via Autonomous LLM Agents (IEEE IoTJ 2024) | -- | -- |
| BitRL-Light: 1-bit LLM Agents with DRL for Energy-Efficient Smart Home Lighting (IPCCC 2025) | -- | -- |
| VoiceAlign: A Shimming Layer for Enhancing the Usability of Legacy VUI Systems (IUI 2026) | -- | -- |
| Privacy-Preserving Multimodal Wearable for Local Voice-and-Vision Inference (arXiv 2025) | UMD | -- |
Autonomous Driving & Transportation
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| Edge Computing for Autonomous Driving: Opportunities and Challenges (Proc. IEEE 2019) | Wayne State | -- |
| Edge Intelligence for Autonomous Driving in 6G Wireless System (IEEE Wireless Comm. 2021) | -- | -- |
| LLM-Generated Fault Scenarios for Evaluating Perception-Driven Lane Following in Autonomous Edge Systems (arXiv 2026) | -- | -- |
| Efficient On-Device Training for Object Detection at the Edge (CVPRW 2020) | ASU | -- |
Industrial IoT & Manufacturing
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| Edge Computing in Industrial Internet of Things: Architecture, Advances and Challenges (IEEE COMST 2020) | -- | -- |
| Artificial Intelligence-Driven Mechanism for Edge Computing-Based Industrial Applications (IEEE TII 2019) | -- | -- |
| A Reconfigurable Method for Intelligent Manufacturing Based on Industrial Cloud and Edge Intelligence (IEEE IoTJ 2019) | -- | -- |
| Rethinking On-Device LLM Reasoning for IoT DDoS Detection (arXiv 2026) | -- | -- |
Edge AI in 6G Networks
| Title & Basic Information | Affiliation | Code |
|---|---|---|
| 6G Needs Agents: Toward Agentic AI-Native Networks for Autonomous Intelligence (arXiv 2026) | -- | -- |
| CORE: Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of LLM Agents Over Hierarchical Edge (IEEE Comm. Mag. 2026) | -- | -- |
| GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference (arXiv 2026) | BJTU | -- |
| Fast Collaborative Inference via Distributed Speculative Decoding (TSLT) (arXiv 2025) | -- | -- |
| Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding (TK-SLT) (WCSP 2025) | -- | -- |
| A Survey on Cloud-Edge-Terminal Collaborative Intelligence in AIoT Networks (arXiv 2025) | -- | -- |
Historical Foundations (Legacy v1 Content — fully retained)
All previous papers, tables, and classic Data-Model-System content from the original v1 survey are preserved below without any deletions.
1. Background Knowledge
1.1. Edge Computing
Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data generation. This proximity is expected to improve response times, reduce bandwidth consumption, and enable real-time analytics.

1.2. Edge AI
Edge AI refers to the deployment of artificial intelligence (AI) algorithms and models directly on edge devices, such as mobile phones, Internet of Things (IoT) devices, and smart sensors. By processing data locally, Edge AI enables real-time decision-making, reduces the need for data transmission to remote servers, and enhances data privacy and security. The proliferation of edge devices and the demand for intelligent, low-latency applications have made Edge AI a critical area of research and development.
1.2.1. Blogs About Edge AI
- Edge AI – What is it and how does it Work?
- What is Edge AI?
- Edge AI – Driving Next-Gen AI Applications in 2022
- Edge Intelligence: Edge Computing and Machine Learning (2023 Guide)
- What is Edge AI, and how does it work?
- Edge AI 101- What is it, Why is it important, and How to implement Edge AI?
- Edge AI: The Future of Artificial Intelligence
- What is Edge AI? Machine Learning + IoT
- What is edge AI computing?
- 在边缘实现机器学习都需要什么?
- 边缘计算 | 在移动设备上部署深度学习模型的思路与注意点
2. Our Survey (To be released)
2.1 The Taxonomy of the Discussed Topics

Edge AI — Unified Taxonomy (New):

2.2 Edge AI Optimization Triad
We introduce a data-model-system optimization triad for edge deployment.

2.3 The Edge AI Deployment Pipeline
An overview of edge deployment. The figure shows a general pipeline from the three aspects of data, model and system. Note that not all steps are necessary in real applications.

Research Scope Overview — Edge AI:

3. The Data-Model-System Optimization Triad
3.1. Data Optimization
An overview of data optimization operations. Data cleaning improves data quality by removing errors and inconsistencies in the raw data. Feature compression is used to eliminate irrelevant and redundant features. For scarce data, data augmentation is employed to increase the data size.

3.1.1. Data Cleaning
3.1.2. Feature Compression
3.1.2.1. Feature Selection
3.1.2.2. Feature Extraction
3.1.3. Data Augmentation
3.2. Model Optimization
An overview of model optimization operations. Model design involves creating lightweight models through manual and automated techniques, including architecture selection, parameter tuning, and regularization. Model compression involves using various techniques, such as pruning, quantization, and knowledge distillation, to reduce the size of the model and obtain a compact model that requires fewer resources while maintaining high accuracy.

3.2.1. Model Design
3.2.1.1. Compact Architecture Design
3.2.1.2. Neural Architecture Search (NAS)
3.2.2. Model Compression
3.2.2.1. Model Pruning
3.2.2.2. Parameter Sharing
3.2.2.3. Model Quantization
3.2.2.4. Knowledge Distillation
3.2.2.5. Low-rank Factorization
3.3. System Optimization
An overview of system optimization operations. Software optimization involves developing frameworks for lightweight model training and inference, while hardware optimization focuses on accelerating models using hardware-based approaches to improve computational efficiency on edge devices.
