candle-video

January 9, 2026 · View on GitHub

English Русский Português


candle-video

License Rust

Rust library for AI video generation built on the Candle ML framework. High-performance, standalone video generation inference without Python runtime dependencies.


📚 Table of Contents


✨ What is this?

candle-video is a Rust-native implementation of video generation models, targeting deployment scenarios where startup time, binary size, and memory efficiency matter. It provides inference for state-of-the-art text-to-video models without requiring a Python runtime.

Supported Models

  • LTX-Video — Text-to-video generation using DiT (Diffusion Transformer) architecture
    • 2B and 13B parameter variants
    • Standard and distilled versions (0.9.5 – 0.9.8)
    • T5-XXL text encoder with GGUF quantization support
    • 3D VAE for video encoding/decoding
    • Flow Matching scheduler

🚀 Key Features

  • High Performance — Native Rust with GPU acceleration via CUDA/cuDNN
  • Memory Efficient — BF16 inference, VAE tiling/slicing, GGUF quantized text encoders
  • Flexible — Run on CPU or GPU, with optional Flash Attention v2
  • Standalone — No Python runtime required in production
  • Fast Startup — ~2 seconds vs ~15-30 seconds for Python/PyTorch

Hardware Acceleration

FeatureDescription
flash-attnFlash Attention v2 for efficient attention (default)
cudnncuDNN for faster convolutions (default)
mklIntel MKL for optimized CPU operations (x86_64)
accelerateApple Accelerate for Metal (macOS)
ncclMulti-GPU support via NCCL

🎬 Demonstration

ModelVideoPrompt
LTX-Video-0.9.5Waves and RocksThe waves crash against the jagged rocks of the shoreline, sending spray high into the air...
LTX-Video-0.9.8-2b-distilledwoman_with_bloodA woman with blood on her face and a white tank top looks down and to her right...

More examples in examples.


🖥️ System Requirements

Prerequisites

  • Rust 1.82+ (Edition 2024)
  • CUDA Toolkit 12.x (for GPU acceleration)
  • cuDNN 8.x/9.x (optional, for faster convolutions)
  • hf

Approximate VRAM Requirements (512×768, 97 frames)

  • Full model: ~8-12GB
  • With VAE tiling: ~8GB
  • With GGUF T5: saves ~8GB additional

🛠️ Installation & Setup

Add to your project

[dependencies]
candle-video = { git = "https://github.com/FerrisMind/candle-video" }

Build from source

# Clone the repository
git clone https://github.com/FerrisMind/candle-video.git
cd candle-video

# Default build (CUDA + cuDNN + Flash Attention)
cargo build --release

# CPU-only build
cargo build --release --no-default-features

# With specific features
cargo build --release --features "cudnn,flash-attn"

Model Weights

Download from oxide-lab/LTX-Video-0.9.8-2B-distilled:

huggingface-cli download oxide-lab/LTX-Video-0.9.8-2B-distilled --local-dir ./models/ltx-video

Note: This is the same official version of Lightricks/LTX-Video model, , but the repository contains all the necessary files at once. You don't need to individually search for everything

Required files for diffusers model versions::

  • transformer/diffusion_pytorch_model.safetensors — DiT model
  • vae/diffusion_pytorch_model.safetensors — 3D VAE
  • text_encoder_gguf/t5-v1_1-xxl-encoder-Q5_K_M.gguf — Quantized T5
  • text_encoder_gguf/tokenizer.json — T5 tokenizer

Required files for official model versions:

  • ltxv-2b-0.9.8-distilled.safetensors — DiT + 3D VAE in single file
  • text_encoder_gguf/t5-v1_1-xxl-encoder-Q5_K_M.gguf — Quantized T5
  • text_encoder_gguf/tokenizer.json — T5 tokenizer

📖 How to Start Using

For diffusers model versions:

cargo run --example ltx-video --release --features flash-attn,cudnn -- \
    --local-weights ./models/ltx-video \
    --ltxv-version 0.9.5 \
    --prompt "A cat playing with a ball of yarn" 

For official model versions:

cargo run --example ltx-video --release --features flash-attn,cudnn -- \
    --local-weights ./models/ltx-video-model \
    --unified-weights ./models/ltx-video-model.safetensors \
    --ltxv-version 0.9.8-2b-distilled \
    --prompt "A cat playing with a ball of yarn" 

Fast Preview (Lower Resolution)

cargo run --example ltx-video --release --features flash-attn,cudnn -- \
    --local-weights ./models/ltx-video-model \
    --unified-weights ./models/ltx-video-model.safetensors \
    --ltxv-version 0.9.8-2b-distilled \
    --prompt "A cat playing with a ball of yarn" \
    --height 256 --width 384 --num-frames 25 

Low VRAM Mode

cargo run --example ltx-video --release --features flash-attn,cudnn -- \
    --local-weights ./models/ltx-video \
    --prompt "A majestic eagle soaring over mountains" \
    --vae-tiling --vae-slicing

CLI Options

ArgumentDefaultDescription
--prompt"A video of a cute cat..."Text prompt for generation
--negative-prompt""Negative prompt
--height512Video height (divisible by 32)
--width768Video width (divisible by 32)
--num-frames97Number of frames (should be 8n + 1)
--steps(from version config)Diffusion steps
--guidance-scale(from version config)Classifier-free guidance scale
--ltxv-version"0.9.5"Model version
--local-weights(None)Path to local weights
--output-dir"output"Directory to save results
--seedrandomRandom seed for reproducibility
--vae-tilingfalseEnable VAE tiling for memory efficiency
--vae-slicingfalseEnable VAE batch slicing
--framesfalseSave individual PNG frames
--giffalseSave as GIF animation
--cpufalseRun on CPU instead of GPU
--use-bf16-t5falseUse BF16 T5 instead of GGUF quantized
--unified-weights(None)Path to unified safetensors file

Supported Model Versions

VersionParametersStepsGuidanceNotes
0.9.52B403.0Standard model
0.9.6-dev2B403.0Development version
0.9.6-distilled2B81.0Fast inference
0.9.8-2b-distilled2B71.0Latest distilled
0.9.8-13b-dev13B308.0Large model
0.9.8-13b-distilled13B71.0Large distilled

Memory Optimization

For limited VRAM:

# VAE tiling - processes image in tiles
--vae-tiling

# VAE slicing - processes batches sequentially
--vae-slicing

# Lower resolution
--height 256 --width 384

# Fewer frames
--num-frames 25

Project Structure

candle-video/
├── src/
│   ├── lib.rs                    # Library entry point
│   └── models/
│       └── ltx_video/            # LTX-Video implementation
│           ├── ltx_transformer.rs    # DiT transformer
│           ├── vae.rs                # 3D VAE
│           ├── text_encoder.rs       # T5 text encoder
│           ├── quantized_t5_encoder.rs # GGUF T5 encoder
│           ├── scheduler.rs          # Flow matching scheduler
│           ├── t2v_pipeline.rs       # Text-to-video pipeline
│           ├── loader.rs             # Weight loading
│           └── configs.rs            # Model version configs
├── examples/
│   └── ltx-video/                # Main CLI example
├── tests/                        # Parity and unit tests
├── scripts/                      # Python reference generators
└── benches/                      # Performance benchmarks

🙏 Acknowledgments


License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Copyright 2025 FerrisMind