Rustane builds on the work of many researchers and open-source projects. This documents every significant source that informed the architecture, research, and implementation.
| Project | Author | What We Learned |
|---|
| maderix/ANE | maderix | ANE private API reverse engineering (C++). Dynamic weight pipeline, 1x1 conv discovery, mega-kernel fusion, IOSurface weight staging, INT8 W8A8 training. The foundational RE work everything else builds on. |
| Anemll | Anemll team | ANE inference tricks: Conv2d workaround for matmul, doubled RMSNorm for stability, in-model argmax, ring buffer KV cache, LUT quantization limits, SRAM bandwidth analysis. |
| ANEgpt | Vipul Divyanshu | ANE classifier as 1x1 conv (10.2x speedup over CPU sgemm). Bridge API patterns for _ANEClient. |
| Project | Author | What We Learned |
|---|
| ane-infer | thebasedcapital | First Rust + ANE + Metal hybrid prototype. 13 custom Metal shaders, single-command-buffer decode (zero allocations), doEvaluateDirectWithModel chaining, fused FFN mega-kernels (3.6 TFLOPS), IOKit H11ANE kernel access. Benchmarks: 32 tok/s Q8 on M5. Proves the Rust+ANE+Metal stack is memory-safe and viable. |
| ane crate | computer-graphics-tools | Clean Rust FFI to private AppleNeuralEngine.framework via objc2. GPT-2 inference example. 2,567 LOC across 5 key files. Our base for ane-bridge (will vendor and extend for training). |
| Project | Author | What We Learned |
|---|
| uzu | trymirai | Pure Rust Metal LLM engine. 40+ Metal shaders, fused MLP epilogue, quantized dispatch hierarchy, speculative decoding agent module. Closest to production Rust edge AI on M-series (no ANE). Informs our metal-decode crate design. |
| candle | Hugging Face | Rust ML framework with Metal + CUDA backends. Hardcoded 3-variant Storage/Device enums make direct ANE integration impractical — but CUDA backend via cudarc is our Jetson deployment path. SafeTensors + GGML quantization support. |
| Project | Author | What We Learned |
|---|
| RCLI | RunanywhereAI | Swift + MetalRT inference CLI. 658 tok/s MetalRT benchmarks on M-series. Voice/agent pipeline. Useful for comparing our ANE inference tok/s against GPU-only approaches on the same hardware. |
| runanywhere-sdks | RunanywhereAI | Production agent SDKs (iOS/Android). Screen reindexing, /no_think prompting, inference guards. Language-agnostic patterns we'll port to Rust for the agent loop — especially relevant for Jetson drone/sat deployment. |
| Project | Author | What We Learned |
|---|
| autoresearch | Andrej Karpathy | Autonomous research framework, climbmix-400B dataset, rustbpe tokenizer. Our gpt_karpathy Phase 1 config derives from this. |
| autoresearch-mlx | trevin-creator | MLX port of autoresearch for Apple Silicon. Muon+AdamW optimizer, 241 autonomous experiments, val_bpb 1.664→1.266. Provides our validation baseline and architecture exploration data. |
| Paper | Authors | Relevance |
|---|
| Orion (arXiv 2603.06728) | — | ANE training/inference on M4 Max. 110M model trains 1000 steps in 22min. 8.5x speedup from weight-reload optimization. Validates dynamic weight pipeline approach. |
| Crate / Framework | Use |
|---|
| objc2 | Safe Rust Obj-C FFI. All ANE private API calls go through this. |
| objc2-foundation | NSString, NSDictionary, NSData — needed for ANE model loading. |
| objc2-io-surface | IOSurface creation and locking for ANE weight staging. |
| half | f16 type for CPU-side weight manipulation. |
| safetensors | Weight interchange format (MLX ↔ Rustane ↔ candle). |
| MLX | Apple's GPU ML framework. Architecture exploration baseline. |
| Accelerate.framework | vDSP (vectorized f32 ops) + cblas_sgemm (CPU matmul) for non-ANE training ops. |