candle-video
January 9, 2026 · View on GitHub
candle-video
Rust library for AI video generation built on the Candle ML framework. High-performance, standalone video generation inference without Python runtime dependencies.
📚 Table of Contents
- What is this?
- Key Features
- Demonstration
- System Requirements
- Installation & Setup
- How to Start Using
- CLI Options
- Supported Model Versions
- Memory Optimization
- Project Structure
- Acknowledgments
- License
✨ What is this?
candle-video is a Rust-native implementation of video generation models, targeting deployment scenarios where startup time, binary size, and memory efficiency matter. It provides inference for state-of-the-art text-to-video models without requiring a Python runtime.
Supported Models
- LTX-Video — Text-to-video generation using DiT (Diffusion Transformer) architecture
- 2B and 13B parameter variants
- Standard and distilled versions (0.9.5 – 0.9.8)
- T5-XXL text encoder with GGUF quantization support
- 3D VAE for video encoding/decoding
- Flow Matching scheduler
🚀 Key Features
- High Performance — Native Rust with GPU acceleration via CUDA/cuDNN
- Memory Efficient — BF16 inference, VAE tiling/slicing, GGUF quantized text encoders
- Flexible — Run on CPU or GPU, with optional Flash Attention v2
- Standalone — No Python runtime required in production
- Fast Startup — ~2 seconds vs ~15-30 seconds for Python/PyTorch
Hardware Acceleration
| Feature | Description |
|---|---|
flash-attn | Flash Attention v2 for efficient attention (default) |
cudnn | cuDNN for faster convolutions (default) |
mkl | Intel MKL for optimized CPU operations (x86_64) |
accelerate | Apple Accelerate for Metal (macOS) |
nccl | Multi-GPU support via NCCL |
🎬 Demonstration
| Model | Video | Prompt |
|---|---|---|
| LTX-Video-0.9.5 | ![]() | The waves crash against the jagged rocks of the shoreline, sending spray high into the air... |
| LTX-Video-0.9.8-2b-distilled | ![]() | A woman with blood on her face and a white tank top looks down and to her right... |
More examples in examples.
🖥️ System Requirements
Prerequisites
- Rust 1.82+ (Edition 2024)
- CUDA Toolkit 12.x (for GPU acceleration)
- cuDNN 8.x/9.x (optional, for faster convolutions)
- hf
Approximate VRAM Requirements (512×768, 97 frames)
- Full model: ~8-12GB
- With VAE tiling: ~8GB
- With GGUF T5: saves ~8GB additional
🛠️ Installation & Setup
Add to your project
[dependencies]
candle-video = { git = "https://github.com/FerrisMind/candle-video" }
Build from source
# Clone the repository
git clone https://github.com/FerrisMind/candle-video.git
cd candle-video
# Default build (CUDA + cuDNN + Flash Attention)
cargo build --release
# CPU-only build
cargo build --release --no-default-features
# With specific features
cargo build --release --features "cudnn,flash-attn"
Model Weights
Download from oxide-lab/LTX-Video-0.9.8-2B-distilled:
huggingface-cli download oxide-lab/LTX-Video-0.9.8-2B-distilled --local-dir ./models/ltx-video
Note: This is the same official version of
Lightricks/LTX-Videomodel, , but the repository contains all the necessary files at once. You don't need to individually search for everything
Required files for diffusers model versions::
transformer/diffusion_pytorch_model.safetensors— DiT modelvae/diffusion_pytorch_model.safetensors— 3D VAEtext_encoder_gguf/t5-v1_1-xxl-encoder-Q5_K_M.gguf— Quantized T5text_encoder_gguf/tokenizer.json— T5 tokenizer
Required files for official model versions:
- ltxv-2b-0.9.8-distilled.safetensors — DiT + 3D VAE in single file
text_encoder_gguf/t5-v1_1-xxl-encoder-Q5_K_M.gguf— Quantized T5text_encoder_gguf/tokenizer.json— T5 tokenizer
📖 How to Start Using
Using Local Weights Examples (Recommended)
For diffusers model versions:
cargo run --example ltx-video --release --features flash-attn,cudnn -- \
--local-weights ./models/ltx-video \
--ltxv-version 0.9.5 \
--prompt "A cat playing with a ball of yarn"
For official model versions:
cargo run --example ltx-video --release --features flash-attn,cudnn -- \
--local-weights ./models/ltx-video-model \
--unified-weights ./models/ltx-video-model.safetensors \
--ltxv-version 0.9.8-2b-distilled \
--prompt "A cat playing with a ball of yarn"
Fast Preview (Lower Resolution)
cargo run --example ltx-video --release --features flash-attn,cudnn -- \
--local-weights ./models/ltx-video-model \
--unified-weights ./models/ltx-video-model.safetensors \
--ltxv-version 0.9.8-2b-distilled \
--prompt "A cat playing with a ball of yarn" \
--height 256 --width 384 --num-frames 25
Low VRAM Mode
cargo run --example ltx-video --release --features flash-attn,cudnn -- \
--local-weights ./models/ltx-video \
--prompt "A majestic eagle soaring over mountains" \
--vae-tiling --vae-slicing
CLI Options
| Argument | Default | Description |
|---|---|---|
--prompt | "A video of a cute cat..." | Text prompt for generation |
--negative-prompt | "" | Negative prompt |
--height | 512 | Video height (divisible by 32) |
--width | 768 | Video width (divisible by 32) |
--num-frames | 97 | Number of frames (should be 8n + 1) |
--steps | (from version config) | Diffusion steps |
--guidance-scale | (from version config) | Classifier-free guidance scale |
--ltxv-version | "0.9.5" | Model version |
--local-weights | (None) | Path to local weights |
--output-dir | "output" | Directory to save results |
--seed | random | Random seed for reproducibility |
--vae-tiling | false | Enable VAE tiling for memory efficiency |
--vae-slicing | false | Enable VAE batch slicing |
--frames | false | Save individual PNG frames |
--gif | false | Save as GIF animation |
--cpu | false | Run on CPU instead of GPU |
--use-bf16-t5 | false | Use BF16 T5 instead of GGUF quantized |
--unified-weights | (None) | Path to unified safetensors file |
Supported Model Versions
| Version | Parameters | Steps | Guidance | Notes |
|---|---|---|---|---|
0.9.5 | 2B | 40 | 3.0 | Standard model |
0.9.6-dev | 2B | 40 | 3.0 | Development version |
0.9.6-distilled | 2B | 8 | 1.0 | Fast inference |
0.9.8-2b-distilled | 2B | 7 | 1.0 | Latest distilled |
0.9.8-13b-dev | 13B | 30 | 8.0 | Large model |
0.9.8-13b-distilled | 13B | 7 | 1.0 | Large distilled |
Memory Optimization
For limited VRAM:
# VAE tiling - processes image in tiles
--vae-tiling
# VAE slicing - processes batches sequentially
--vae-slicing
# Lower resolution
--height 256 --width 384
# Fewer frames
--num-frames 25
Project Structure
candle-video/
├── src/
│ ├── lib.rs # Library entry point
│ └── models/
│ └── ltx_video/ # LTX-Video implementation
│ ├── ltx_transformer.rs # DiT transformer
│ ├── vae.rs # 3D VAE
│ ├── text_encoder.rs # T5 text encoder
│ ├── quantized_t5_encoder.rs # GGUF T5 encoder
│ ├── scheduler.rs # Flow matching scheduler
│ ├── t2v_pipeline.rs # Text-to-video pipeline
│ ├── loader.rs # Weight loading
│ └── configs.rs # Model version configs
├── examples/
│ └── ltx-video/ # Main CLI example
├── tests/ # Parity and unit tests
├── scripts/ # Python reference generators
└── benches/ # Performance benchmarks
🙏 Acknowledgments
- Candle — Minimalist ML framework for Rust
- Lightricks LTX-Video — Original LTX-Video model
- diffusers — Reference implementation
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Copyright 2025 FerrisMind

