README_EN.md
March 28, 2026 · View on GitHub
简体 | English
nndeploy: An Easy-to-Use and High-Performance AI deployment framework
Documentation | Ask DeepWiki | WeChat | Discord
Latest Updates
- [2025/05/29] 🔥 Jointly launched a free inference framework course with Huawei Ascend official Ascend Official | Bilibili Video! Based on nndeploy's internal inference sub-module, helping you quickly master core AI inference deployment technologies.
Introduction
nndeploy is an easy-to-use and high-performance AI deployment framework. Based on visual workflows and multi-end inference, developers can quickly develop SDKs for specified platforms and hardware from algorithm repositories, significantly saving development time. In addition, the framework has deployed numerous AI models including LLM, AIGC generation, face swapping, object detection, image segmentation, etc., which are ready to use out of the box.
Easy to Use
-
Visual Workflow: Deploy AI algorithms by dragging nodes, with real-time adjustable parameters and intuitive effects.
-
Custom Nodes: Support Python/C++ custom nodes. Whether implementing preprocessing in Python or writing high-performance nodes in C++/CUDA, they can be seamlessly integrated into the visual workflow.
-
One-Click Deployment: Workflows can be exported as JSON and called through C++/Python APIs, applicable to platforms such as Linux, Windows, macOS, Android, and iOS.
Building AI Workflow on Desktop Deployment on Mobile 

High Performance
-
Parallel Optimization: Support execution modes such as serial, pipeline parallelism, and task parallelism.
-
Memory Optimization: Zero-copy, memory pool, memory reuse and other optimization strategies.
-
High-Performance Optimization: Built-in nodes optimized with C++/CUDA/Ascend C/SIMD implementations.
-
Multi-End Inference: One workflow for multi-end inference, integrating 13 mainstream inference frameworks, covering full-platform deployment scenarios such as cloud, desktop, mobile, and edge.
ONNXRuntime TensorRT OpenVINO MNN TNN ncnn CoreML AscendCL RKNN SNPE TVM PyTorch nndeploy_inner ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ If there is a custom inference framework, it can be used completely independently without relying on any third-party frameworks.
Out-of-the-Box Algorithms
A list of deployed models with over 100+ visual nodes.
| Application Scenario | Available Models | Remarks |
|---|---|---|
| Large Language Models | QWen-2.5, QWen-3 | Support small B models |
| Image Generation | Stable Diffusion 1.5, Stable Diffusion XL, Stable Diffusion 3, HunyuanDiT, etc. | Support text-to-image, image-to-image, image inpainting, based on diffusers |
| Face Swapping | deep-live-cam | |
| OCR | Paddle OCR | |
| Object Detection | YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv11, YOLOx | |
| Object Tracking | FairMot | |
| Image Segmentation | RBMGv1.4, PPMatting, Segment Anything | |
| Classification | ResNet, MobileNet, EfficientNet, PPLcNet, GhostNet, ShuffleNet, SqueezeNet | |
| API Services | OPENAI, DeepSeek, Moonshot | Support LLM and AIGC services |
For more, see Detailed List of Deployed Models
Quick Start
-
Step 1: Installation
pip install --upgrade nndeploy -
Step 2: Launch the Visual Interface
# Method 1: Command line nndeploy-app --port 8000 # Method 2: Code startup cd path/to/nndeploy python app.py --port 8000After successful launch, open http://localhost:8000 to access the workflow editor. Here, you can drag nodes, adjust parameters, and preview effects in real-time, with a what-you-see-is-what-you-get experience.
-
Step 3: Save and Load for Execution
After building and debugging in the visual interface, click save, and the workflow will be exported as a JSON file, which encapsulates all processing procedures. You can run it in the production environment in the following two ways:
-
Method 1: Command-line execution
For debugging
# Python CLI nndeploy-run-json --json_file path/to/workflow.json # C++ CLI nndeploy_demo_run_json --json_file path/to/workflow.json -
Method 2: Load and run in Python/C++ code
You can integrate the JSON file into your existing Python or C++ project. Here is an example code for loading and running an LLM workflow:
- Python API to load and run LLM workflow
graph = nndeploy.dag.Graph("") graph.remove_in_out_node() graph.load_file("path/to/llm_workflow.json") graph.init() input = graph.get_input(0) text = nndeploy.tokenizer.TokenizerText() text.texts_ = [ "<|im_start|>user\nPlease introduce NBA superstar Michael Jordan<|im_end|>\n<|im_start|>assistant\n" ] input.set(text) status = graph.run() output = graph.get_output(0) result = output.get_graph_output() graph.deinit() - C++ API to load and run LLM workflow
std::shared_ptr<dag::Graph> graph = std::make_shared<dag::Graph>(""); base::Status status = graph->loadFile("path/to/llm_workflow.json"); graph->removeInOutNode(); status = graph->init(); dag::Edge* input = graph->getInput(0); tokenizer::TokenizerText* text = new tokenizer::TokenizerText(); text->texts_ = { "<|im_start|>user\nPlease introduce NBA superstar Michael Jordan<|im_end|>\n<|im_start|>assistant\n"}; input->set(text, false); status = graph->run(); dag::Edge* output = graph->getOutput(0); tokenizer::TokenizerText* result = output->getGraphOutput<tokenizer::TokenizerText>(); status = graph->deinit();
- Python API to load and run LLM workflow
-
Requires Python 3.10+. By default, it includes ONNXRuntime, and MNN. For more inference backends, please use developer mode.
Documentation
- How to Build
- How to Obtain Models
- Visual Workflow
- Production Environment Deployment
- Python API
- Python Custom Node Development Guide
- C++ API
- C++ Custom Node Development Guide
- Deploy New Algorithms
- Integrate New Inference Frameworks
Performance Testing
Test environment: Ubuntu 22.04, i7-12700, RTX3060
-
Pipeline parallel acceleration. End-to-end workflow total time for YOLOv11s, serial vs pipeline parallel
Execution Mode \ Inference Engine ONNXRuntime OpenVINO TensorRT Serial 54.803 ms 34.139 ms 13.213 ms Pipeline Parallel 47.283 ms 29.666 ms 5.681 ms Performance Improvement 13.7% 13.1% 57% -
Task parallel acceleration. End-to-end total time for combined tasks (segmentation RMBGv1.4 + detection YOLOv11s + classification ResNet50), serial vs task parallel
Execution Mode \ Inference Engine ONNXRuntime OpenVINO TensorRT Serial 654.315 ms 489.934 ms 59.140 ms Task Parallel 602.104 ms 435.181 ms 51.883 ms Performance Improvement 7.98% 11.2% 12.2%
Roadmap
Contact Us
-
If you love open source and enjoy tinkering, whether for learning purposes or to share better ideas, you are welcome to join us.
-
WeChat: Always031856 (Feel free to add as a friend to join the group discussion. Please note: nndeploy_name)
Acknowledgements
-
Thanks to the following projects: TNN, FastDeploy, opencv, CGraph, tvm, mmdeploy, FlyCV, oneflow, flowgram.ai, deep-live-cam.
-
Thanks to HelloGithub for recommendation