ComfyUI-FlashVSR_Stable
February 13, 2026 ยท View on GitHub
High-performance Video Super Resolution for ComfyUI with VRAM optimization.
Run FlashVSR on 8GB-24GB+ GPUs without artifacts. Features intelligent resource management, 5 VAE options, and auto-downloading models.
Registry Link: https://registry.comfy.org/publishers/naxci1/nodes/ComfyUI-FlashVSR_Stable
โจ Key Features
- ๐ฌ Video Super Resolution: 2x or 4x upscaling using FlashVSR diffusion models
- ๐ง 5 VAE Options: Choose from Wan2.1, Wan2.2, LightVAE, TAE variants for optimal VRAM/quality trade-off
- ๐ Pre-Flight Resource Check: Intelligent VRAM estimation with settings recommendations
- โก Auto-Download: Models download automatically from HuggingFace if missing
- ๐ก๏ธ OOM Protection: Automatic recovery with progressive fallback (tiled VAE โ tiled DiT โ chunking)
- ๐ง Unified Pipeline: All modes share optimized processing logic
๐ Quick Links
- Changelog - Full version history
- Sample Workflow
- HuggingFace Models
Performance & VRAM Optimization
This node is optimized for various hardware configurations. Here are some guidelines:
VRAM Tiers & Settings
| VRAM | Mode | Tiling | Chunk Size | Precision | Notes |
|---|---|---|---|---|---|
| 24GB+ | full or tiny | Disabled | 0 (All) | bf16/auto | Max quality/speed. |
| 16GB | tiny | tiled_vae=True | 0 or ~100 | bf16/auto | Enable keep_models_on_cpu. |
| 12GB | tiny | tiled_vae=True, tiled_dit=True | ~50 | fp16 | Use sparse_sage attention. |
| 8GB | tiny-long | Required | ~20 | fp16 | Must use tiling and chunking. |
Performance Enhancements
- Attention Mode: Use
sparse_sage_attentionfor the best balance of speed and memory.flash_attention_2is faster but requires specific hardware/installation. - Precision:
bf16(BFloat16) is recommended for RTX 3000/4000/5000 series. It is faster and preserves dynamic range better thanfp16. - Chunking: Use
frame_chunk_sizeto process videos in segments. This moves processed frames to CPU RAM, preventing VRAM saturation on long clips. - Resize Input: If the input video is large (e.g., 1080p), use the
resize_factorparameter to reduce input size to0.5xbefore processing. This drastically reduces VRAM usage and allows for 4x upscaling of the resized result (net 2x output). For small videos, leave at1.0.
Pre-Flight Resource Check (NEW)
Before processing, FlashVSR now performs an intelligent pre-flight check that:
- Estimates VRAM Requirements: Calculates approximate VRAM needed based on resolution, frames, scale, and settings.
- Checks Available Resources: Uses
torch.cuda.mem_get_info()for accurate real-time VRAM availability. - Provides Recommendations: If OOM is predicted, suggests optimal settings.
Example console output:
============================================================
๐ PRE-FLIGHT RESOURCE CHECK
๐ป RAM: 15.4GB / 95.8GB
๐พ VRAM Available: 14.2GB
๐ Estimated VRAM Required: 12.8GB
โ
Safe to proceed. Estimated ~12.8GB needed, 14.2GB available.
============================================================
If VRAM is insufficient:
โ ๏ธ Current settings require ~18.5GB but only 8.0GB available.
๐ก Recommended Optimal Settings:
โข chunk_size = 32
โข tiled_vae = True
โข tiled_dit = True
โข resize_factor = 0.6
๐จ VAE Model Selection
VAE Type Comparison
| VAE Type | VRAM Usage | Speed | Quality | Best For |
|---|---|---|---|---|
| Wan2.1 | 8-12 GB | Baseline | โญโญโญโญโญ | Maximum quality, 24GB+ VRAM |
| Wan2.2 | 8-12 GB | Baseline | โญโญโญโญโญ | Improved normalization for Wan2.2 models |
| LightVAE_W2.1 | 4-5 GB | 2-3x faster | โญโญโญโญ | 8-16GB VRAM, speed priority |
| TAE_W2.2 | 6-8 GB | 1.5x faster | โญโญโญโญ | Temporal consistency priority |
| LightTAE_HY1.5 | 3-4 GB | 3x faster | โญโญโญโญ | HunyuanVideo compatible, minimum VRAM |
VAE Selection Guide
| Your VRAM | Recommended VAE | Additional Settings |
|---|---|---|
| 8GB | LightTAE_HY1.5 or LightVAE_W2.1 | tiled_vae=True, tiled_dit=True, chunk_size=16 |
| 12GB | LightVAE_W2.1 or Wan2.1 | tiled_vae=True |
| 16GB | Any VAE | Optional tiling for long videos |
| 24GB+ | Wan2.1 or Wan2.2 | Maximum quality, no restrictions |
Auto-Download
All VAE models auto-download from HuggingFace if not found locally:
| VAE Selection | File | Direct Download Link |
|---|---|---|
| Wan2.1 | Wan2.1_VAE.pth | Download |
| Wan2.2 | Wan2.2_VAE.pth | Download |
| LightVAE_W2.1 | lightvaew2_1.pth | Download |
| TAE_W2.2 | taew2_2.safetensors | Download |
| LightTAE_HY1.5 | lighttaehy1_5.pth | Download |
๐ Best Practices / Settings Guide
Low VRAM (8-12GB) Configuration
Mode: tiny-long
VAE: LightVAE_W2.1 or LightTAE_HY1.5
Tiled VAE: โ
Enabled
Tiled DiT: โ
Enabled
Chunk Size: 16-32
Resize Factor: 0.5-0.8
Keep Models on CPU: โ
Enabled
Medium VRAM (16GB) Configuration
Mode: tiny
VAE: Wan2.1 or LightVAE_W2.1
Tiled VAE: โ
Enabled
Tiled DiT: Optional
Chunk Size: 50-100
Resize Factor: 1.0
Keep Models on CPU: Optional
High VRAM (24GB+) Configuration
Mode: full or tiny
VAE: Wan2.1 or Wan2.2
Tiled VAE: โ Disabled
Tiled DiT: โ Disabled
Chunk Size: 0 (all frames)
Resize Factor: 1.0
Keep Models on CPU: โ Disabled
Processing Summary
At the end of each run, you'll see a summary:
============================================================
๐ PROCESSING SUMMARY
โฑ๏ธ Total Processing Time: 130.08s (1.54 FPS)
๐ฅ Input Resolution: 276x206 (200 frames)
๐ค Output Resolution: 552x412 (200 frames)
๐ Peak VRAM Used: 12.4 GB
============================================================
๐ง Node Parameters
Hover over any input in ComfyUI to see tooltips. Full parameter list:
| Parameter | Description |
|---|---|
| model | FlashVSR model version |
| mode | tiny (fast), tiny-long (lowest VRAM), full (highest quality) |
| vae_model | VAE architecture (5 options, auto-download) |
| scale | Upscaling factor: 2x or 4x |
| color_fix | Wavelet color transfer. Highly recommended. |
| tiled_vae | Spatial tiling for VAE. Reduces VRAM, slower. |
| tiled_dit | Spatial tiling for DiT. Required for 4K output. |
| tile_size | Tile dimensions. Smaller = less VRAM. |
| overlap | Tile overlap for seamless blending. |
| unload_dit | Unload DiT before VAE decode. |
| frame_chunk_size | Process N frames at a time. 0 = all. |
| enable_debug | Verbose console logging. |
| keep_models_on_cpu | Offload to system RAM when idle. |
| resize_factor | To first reduce the size of large videos and then enlarge them, use a range of (0.3-1.0). |
| attention_mode | Attention kernel: sparse_sage, flash_attention_2, sdpa, block_sparse |
๐ป Command-Line Interface (CLI)
FlashVSR includes a full-featured CLI that mirrors all ComfyUI node parameters for standalone video upscaling.
Quick Start
# Basic 2x upscale
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 2
# 4x upscale with tiling for lower VRAM
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 4 \
--tiled_vae --tiled_dit --tile_size 256 --tile_overlap 24
# Long video with chunking to prevent OOM
python cli_main.py --input long_video.mp4 --output upscaled.mp4 \
--frame_chunk_size 50 --mode tiny-long
# Low VRAM mode (8GB GPUs)
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 2 \
--vae_model LightVAE_W2.1 --tiled_vae --tiled_dit \
--frame_chunk_size 20 --resize_factor 0.5
# Custom models directory
python cli_main.py --input video.mp4 --output upscaled.mp4 \
--models_dir /path/to/your/models
CLI Arguments Reference
All arguments map 1:1 with ComfyUI node inputs. Run python cli_main.py --help for full details.
Required Arguments
| Argument | Description |
|---|---|
--input, -i | Input video file path (e.g., video.mp4) |
--output, -o | Output video file path (e.g., upscaled.mp4) |
Pipeline Initialization (from FlashVSRNodeInitPipe)
| Argument | Type | Default | Description |
|---|---|---|---|
--model | choice | FlashVSR-v1.1 | Model version: FlashVSR, FlashVSR-v1.1 |
--mode | choice | tiny | Operation mode: tiny, tiny-long, full |
--vae_model | choice | Wan2.1 | VAE model: Wan2.1, Wan2.2, LightVAE_W2.1, TAE_W2.2, LightTAE_HY1.5 |
--force_offload | flag | True | Force offload models to CPU after execution |
--no_force_offload | flag | - | Disable force offloading |
--precision | choice | auto | Precision: fp16, bf16, auto |
--device | string | auto | Device: cuda:0, cuda:1, cpu, auto |
--attention_mode | choice | sparse_sage_attention | Attention: sparse_sage_attention, block_sparse_attention, flash_attention_2, sdpa |
Processing Parameters (from FlashVSRNodeAdv)
| Argument | Type | Default | Description |
|---|---|---|---|
--scale | int | 2 | Upscaling factor: 2 or 4 |
--color_fix | flag | True | Apply wavelet-based color correction |
--no_color_fix | flag | - | Disable color correction |
--tiled_vae | flag | False | Enable spatial tiling for VAE decoder |
--tiled_dit | flag | False | Enable spatial tiling for DiT |
--tile_size | int | 256 | Tile size for DiT processing (32-1024) |
--tile_overlap | int | 24 | Overlap pixels between tiles (8-512) |
--unload_dit | flag | False | Unload DiT before VAE decoding |
--sparse_ratio | float | 2.0 | Sparse attention control (1.5-2.0) |
--kv_ratio | float | 3.0 | Key/Value cache ratio (1.0-3.0) |
--local_range | int | 11 | Local attention window: 9 or 11 |
--seed | int | 0 | Random seed for reproducibility |
--frame_chunk_size | int | 0 | Process N frames at a time (0 = all) |
--enable_debug | flag | False | Enable verbose logging |
--keep_models_on_cpu | flag | True | Keep models in CPU RAM when idle |
--no_keep_models_on_cpu | flag | - | Keep models in VRAM |
--resize_factor | float | 1.0 | Resize input before processing (0.1-1.0) |
Video I/O Parameters
| Argument | Type | Default | Description |
|---|---|---|---|
--fps | float | input FPS | Output video FPS |
--codec | string | libx264 | Video codec: libx264, libx265, h264_nvenc |
--crf | int | 18 | Quality (0-51, lower = better) |
--start_frame | int | 0 | Start frame index (0-indexed) |
--end_frame | int | -1 | End frame index (-1 = all frames) |
--models_dir | string | ./models | Custom models directory path |
๐ Installation
Step 1: Install the Node
cd ComfyUI/custom_nodes
git clone https://github.com/naxci1/ComfyUI-FlashVSR_Stable.git
python -m pip install -r ComfyUI-FlashVSR_Stable/requirements.txt
๐ข Turing architecture or older GPUs (GTX 16 series, RTX 20 series, and earlier): Install
triton<3.3.0:# Windows python -m pip install -U triton-windows<3.3.0 # Linux python -m pip install -U triton<3.3.0
Step 2: Download Models
Download the FlashVSR folder from HuggingFace:
ComfyUI/models/FlashVSR/
โโโ LQ_proj_in.ckpt
โโโ TCDecoder.ckpt
โโโ diffusion_pytorch_model_streaming_dmd.safetensors
โโโ Wan2.1_VAE.pth (or auto-downloads)
๐ก VAE files auto-download from HuggingFace if not present. Only the DiT model and other components need manual download.
Step 3: Custom Model Paths (Optional)
By default, FlashVSR looks for models in ComfyUI/models/FlashVSR/. To use a different location (e.g., models on another drive):
- Edit
model_paths.yamlin theComfyUI-FlashVSR_Stabledirectory - Set
flashvsr_model_pathto your custom path - Restart ComfyUI
Example configurations:
# Windows (D: drive)
flashvsr_model_path: "D:/AI/Models/FlashVSR"
# Windows (alternative syntax)
flashvsr_model_path: "E:\\ComfyUI\\models\\FlashVSR"
# Linux/Mac
flashvsr_model_path: "/home/user/models/FlashVSR"
flashvsr_model_path: "/mnt/storage/AI/FlashVSR"
# Use default (leave empty)
flashvsr_model_path: ""
๐ Auto-Download Support: If model files don't exist, they will automatically download to the directory specified in
model_paths.yaml. The custom path will be created if needed.Example: If you set
flashvsr_model_path: "D:/AI/Models", models will automatically download toD:/AI/Models/FlashVSR/on first use.
๐ผ๏ธ Preview

Sample Workflow
๐ท๏ธ Recent Changes
See CHANGELOG.md for full version history.
๐ Acknowledgments
- FlashVSR @OpenImagingLab
- Sparse_SageAttention @jt-zhang
- ComfyUI @comfyanonymous
- Wan2.2 @Wan-Video
- LightX2V @ModelTC
- LightX2V Autoencoders @lightx2v
๐ License
MIT License - see LICENSE for details.