๐งฉ ORP: a Lightweight Rust Framework for Building ONNX Runtime Pipelines with ORT
January 31, 2025 ยท View on GitHub
๐ฌ Introduction
orp is a lightweight framework designed to simplify the creation and execution of ONNX Runtime Pipelines in Rust. Built on top of the ๐ฆ ort runtime and the ๐ composable crate, it provides an simple way to handle data pre- and post-processing, chain multiple ONNX models together, while encouraging code reuse and clarity.
๐จ Sample Use-Cases
๐ฟ gline-rs: inference engine for GLiNER models๐งฒ gte-rs: text embedding and re-ranking
โก๏ธ GPU/NPU Inferences
The execution providers available in ort can be leveraged to perform considerably faster inferences on GPU/NPU hardware.
The first step is to pass the appropriate execution providers in RuntimeParameters. For example:
let rtp = RuntimeParameters::default().with_execution_providers([
CUDAExecutionProvider::default().build()
]);
The second step is to activate the appropriate features (see related section below), otherwise ir may silently fall-back to CPU. For example:
$ cargo run --features=cuda ...
Please refer to doc/ORT.md for details about execution providers.
๐ฆ Crate Features
This create mirrors the following ort features:
- To allow for dynamic loading of ONNX-runtime libraries:
load-dynamic - To allow for activation of execution providers:
cuda,tensorrt,directml,coreml,rocm,openvino,onednn,xnnpack,qnn,cann,nnapi,tvm,acl,armnn,migraphx,vitis, andrknpu.
โ๏ธ Dependencies
ort: the ONNX runtime wrappercomposable: this crate is used to actually define the pre- and post-processing pipelines by composition or elementary steps, and can in turn be used to combine mutliple pipelines.