VoxRT ASR streaming model weights

May 28, 2026 · View on GitHub

Pre-compiled streaming-medium-pc ASR model weights in .vxrt format, packaged for the VoxRT on-device inference runtime.

Pair with one of the consumer libraries:

This repo is a thin distribution layer — the actual files live as GitHub Release attachments per version, not in the git tree. The README below is the index.


What is a .vxrt file?

A self-contained, binary, framework-agnostic model file:

  • Topology + weights encoded in a compact format (no ONNX / PyTorch dependency at consume time).
  • AES-256-GCM encrypted at rest, decrypted on load by the runtime (ADR-0023).
  • Versioned — newer .vxrt files may add encoded features that older runtimes refuse to load; check the version compatibility table below.

You don't need to know the format to use it — feed the bytes to VoxrtAsrStreamingSession (iOS) / VoxrtAsrStreamingEngine (Android) and it Just Works.

Downloads

v0.1.2

FileSizeSHA-256
streaming_medium_pc.vxrt61M0d723e429157a8a8cb58739a1f090574f2f23db311ca7916b43411f5f727c79c

Compatible with: VoxRT/voxrt-asr-ios@v0.1.2, VoxRT/voxrt-asr-android@v0.1.2.

How to use

curl -L -o streaming_medium_pc.vxrt \
  https://github.com/VoxRT/voxrt-asr-models/releases/download/v0.1.2/streaming_medium_pc.vxrt
# verify checksum
echo "0d723e429157a8a8cb58739a1f090574f2f23db311ca7916b43411f5f727c79c  streaming_medium_pc.vxrt" | shasum -a 256 -c

Then bundle (in assets/ for Android, in Xcode resources for iOS) or download at runtime — see the platform repos for code examples.

Provenance + license

The encoded weights are derived from nvidia/stt_en_fastconformer_hybrid_medium_streaming_80ms_pc, released by NVIDIA under the CC-BY-4.0 license. Ported into the .vxrt format by VoxRT's exporter tool — no fine-tuning, no architectural changes — purely a re-packaging into a runtime-native format with fp16 weights and AES-GCM at-rest encryption.

Per CC-BY-4.0, attribution to NVIDIA is required when this model file is redistributed or used in derivative products. The LICENSE file in this repo carries the full upstream notice.

The VoxRT runtime that consumes the file is proprietary, distributed under a separate license — but this model file itself, the upstream weights, is free under CC-BY-4.0 with attribution.

Architecture

  • FastConformer encoder, 16 layers, d_model = 256, 80 ms cache-aware streaming look-ahead
  • Hybrid decoder: RNN-T (Transducer) + CTC heads; consumer picks one at session-create time
  • 1024-piece SentencePiece tokenizer
  • 16 kHz mono PCM input
  • P&C output (capitalisation + punctuation directly from the model — no post-processing layer required)

Versioning

A model release at vX.Y.Z is built to work with VoxRT runtime vX.Y.Z and (typically) all earlier minor versions sharing the same major. Breaking format changes bump the major. See the iOS and Android release notes for the exact compatibility per release.