README.md

May 23, 2025 · View on GitHub

custos logo


Crates.io version Docs Rust GPU rust-clippy Android NNAPI

A minimal, extensible OpenCL, Vulkan (with WGSL), CUDA, NNAPI (Android) and host CPU array manipulation engine / framework written in Rust. This crate provides tools for executing custom array and automatic differentiation operations.

Installation

The latest published version is of 0.7.x (April 14th, 2023). A lot has changed since then. 0.7.x can be found in the custos-0.7 branch.

Add "custos" as a dependency:

[dependencies]
custos = "0.7.0"

# to disable the default features (cpu, cuda, opencl, static-api, blas, macro) and use an own set of features:
#custos = {version = "0.7.0", default-features=false, features=["opencl", "blas"]}

Available features:

To make specific devices useable, activate the corresponding features:

FeatureDeviceNotes
cpuCPUUses heap allocations.
stackStackUseable in no-std environments as it uses stack allocated Buffers without requiring alloc or std. Practically only supports the Base module.
openclOpenCLAutomatically maps unified memory.
cudaCUDA
vulkanVulkanShaders are written in WGSL. + unified memory
nnapiNnapiDeviceLazy module is mandatory.
untypedUntypedRemoves the need of Buffer's generic parameters. (CPU and CUDA only for now)

custos ships combineable modules. Different selected modules result in different behaviour when executing operations. New modules can be added in user code.

use custos::prelude::*; 
// Autograd, Base = Modules
let device = CPU::<Autograd<Base>>::new();

To make specific modules useable for building a device, activate the corresponding features:

FeatureModuleDescription
on by defaultBaseDefault behaviour.
autogradAutogradEnables running automatic differentiation.
cachedCachedReuses allocations on demand.
forkForkDecides whether the CPU or GPU is faster for an operation. It then uses the faster device for following computations. (unified memory devices)
lazyLazyLazy execution of operations and lazy intermediate allocations. Enables support for CUDA graphs.
graphGraphAdds a memory usage optimizeable graph and fusing of unary operations in combination with Lazy.

Usage of these modules when writing custom operations: modules.md and modules_usage.rs.

If an operations wants to be affected by a module, specific custos code must be called in that operation.

Remaining features:

FeatureDescription
static-apiEnables the creation of Buffers without providing a device.
std Adds standard library support.
no-stdFor no std and no alloc environments, activates stack feature.
macroReexport of custos-macro
blasAdds gemm functions of the system's (selected) BLAS library.
halfAdds support for half precision floats.
serdeAdds serialization and deserialization support.
jsonAdds convenience functions for serialization and deserialization to and from json.

Examples

Implement an operation for CPU:

This operation is only affected by the Cached module (and partially Autograd).

use custos::prelude::*;
use std::ops::{Deref, Mul};

pub trait MulBuf<T: Unit, S: Shape = (), D: Device = Self>: Sized + Device {
    fn mul(&self, lhs: &Buffer<T, D, S>, rhs: &Buffer<T, D, S>) -> Buffer<T, Self, S>;
}

impl<Mods, T, S, D> MulBuf<T, S, D> for CPU<Mods>
where
    Mods: Retrieve<Self, T, S> + AddOperation + 'static,
    T: Unit + Mul<Output = T> + Copy,
    S: Shape,
    D: Device + 'static,
    D::Base<T, S>: Deref<Target = [T]>,
{
    fn mul(&self, lhs: &Buffer<T, D, S>, rhs: &Buffer<T, D, S>) -> Buffer<T, Self, S> {
        // add optional caching or graph functionality (add "Cached" or "Graph" module to device)
        let mut out = self.retrieve(lhs.len(), (lhs, rhs)).unwrap(); // unwrap or return error (update trait)

        // add optional lazy operation (add "Lazy" module to device)
        self.add_op((lhs, rhs, &mut out), |(lhs, rhs, out)| {
            for ((lhs, rhs), out) in lhs.iter().zip(rhs.iter()).zip(out) {
                *out = *lhs * *rhs;
            }
            Ok(())
        }).unwrap();

        out
    }
}

A lot more usage examples can be found in the tests and examples folders. (Or in the unary operation file, custos-math and sliced)