ComfyUI-FlashVSR_Stable

February 13, 2026 ยท View on GitHub

High-performance Video Super Resolution for ComfyUI with VRAM optimization.

Run FlashVSR on 8GB-24GB+ GPUs without artifacts. Features intelligent resource management, 5 VAE options, and auto-downloading models.

License ComfyUI


Registry Link: https://registry.comfy.org/publishers/naxci1/nodes/ComfyUI-FlashVSR_Stable


โœจ Key Features

  • ๐ŸŽฌ Video Super Resolution: 2x or 4x upscaling using FlashVSR diffusion models
  • ๐Ÿง  5 VAE Options: Choose from Wan2.1, Wan2.2, LightVAE, TAE variants for optimal VRAM/quality trade-off
  • ๐Ÿ“Š Pre-Flight Resource Check: Intelligent VRAM estimation with settings recommendations
  • โšก Auto-Download: Models download automatically from HuggingFace if missing
  • ๐Ÿ›ก๏ธ OOM Protection: Automatic recovery with progressive fallback (tiled VAE โ†’ tiled DiT โ†’ chunking)
  • ๐Ÿ”ง Unified Pipeline: All modes share optimized processing logic


Performance & VRAM Optimization

This node is optimized for various hardware configurations. Here are some guidelines:

VRAM Tiers & Settings

VRAMModeTilingChunk SizePrecisionNotes
24GB+full or tinyDisabled0 (All)bf16/autoMax quality/speed.
16GBtinytiled_vae=True0 or ~100bf16/autoEnable keep_models_on_cpu.
12GBtinytiled_vae=True, tiled_dit=True~50fp16Use sparse_sage attention.
8GBtiny-longRequired~20fp16Must use tiling and chunking.

Performance Enhancements

  • Attention Mode: Use sparse_sage_attention for the best balance of speed and memory. flash_attention_2 is faster but requires specific hardware/installation.
  • Precision: bf16 (BFloat16) is recommended for RTX 3000/4000/5000 series. It is faster and preserves dynamic range better than fp16.
  • Chunking: Use frame_chunk_size to process videos in segments. This moves processed frames to CPU RAM, preventing VRAM saturation on long clips.
  • Resize Input: If the input video is large (e.g., 1080p), use the resize_factor parameter to reduce input size to 0.5x before processing. This drastically reduces VRAM usage and allows for 4x upscaling of the resized result (net 2x output). For small videos, leave at 1.0.

Pre-Flight Resource Check (NEW)

Before processing, FlashVSR now performs an intelligent pre-flight check that:

  1. Estimates VRAM Requirements: Calculates approximate VRAM needed based on resolution, frames, scale, and settings.
  2. Checks Available Resources: Uses torch.cuda.mem_get_info() for accurate real-time VRAM availability.
  3. Provides Recommendations: If OOM is predicted, suggests optimal settings.

Example console output:

============================================================
๐Ÿ” PRE-FLIGHT RESOURCE CHECK
๐Ÿ’ป RAM: 15.4GB / 95.8GB
๐Ÿ’พ VRAM Available: 14.2GB
๐Ÿ“Š Estimated VRAM Required: 12.8GB
โœ… Safe to proceed. Estimated ~12.8GB needed, 14.2GB available.
============================================================

If VRAM is insufficient:

โš ๏ธ Current settings require ~18.5GB but only 8.0GB available.
๐Ÿ’ก Recommended Optimal Settings:
  โ€ข chunk_size = 32
  โ€ข tiled_vae = True
  โ€ข tiled_dit = True
  โ€ข resize_factor = 0.6

๐ŸŽจ VAE Model Selection

VAE Type Comparison

VAE TypeVRAM UsageSpeedQualityBest For
Wan2.18-12 GBBaselineโญโญโญโญโญMaximum quality, 24GB+ VRAM
Wan2.28-12 GBBaselineโญโญโญโญโญImproved normalization for Wan2.2 models
LightVAE_W2.14-5 GB2-3x fasterโญโญโญโญ8-16GB VRAM, speed priority
TAE_W2.26-8 GB1.5x fasterโญโญโญโญTemporal consistency priority
LightTAE_HY1.53-4 GB3x fasterโญโญโญโญHunyuanVideo compatible, minimum VRAM

VAE Selection Guide

Your VRAMRecommended VAEAdditional Settings
8GBLightTAE_HY1.5 or LightVAE_W2.1tiled_vae=True, tiled_dit=True, chunk_size=16
12GBLightVAE_W2.1 or Wan2.1tiled_vae=True
16GBAny VAEOptional tiling for long videos
24GB+Wan2.1 or Wan2.2Maximum quality, no restrictions

Auto-Download

All VAE models auto-download from HuggingFace if not found locally:

VAE SelectionFileDirect Download Link
Wan2.1Wan2.1_VAE.pthDownload
Wan2.2Wan2.2_VAE.pthDownload
LightVAE_W2.1lightvaew2_1.pthDownload
TAE_W2.2taew2_2.safetensorsDownload
LightTAE_HY1.5lighttaehy1_5.pthDownload

๐Ÿ“– Best Practices / Settings Guide

Low VRAM (8-12GB) Configuration

Mode: tiny-long
VAE: LightVAE_W2.1 or LightTAE_HY1.5
Tiled VAE: โœ… Enabled
Tiled DiT: โœ… Enabled
Chunk Size: 16-32
Resize Factor: 0.5-0.8
Keep Models on CPU: โœ… Enabled

Medium VRAM (16GB) Configuration

Mode: tiny
VAE: Wan2.1 or LightVAE_W2.1
Tiled VAE: โœ… Enabled
Tiled DiT: Optional
Chunk Size: 50-100
Resize Factor: 1.0
Keep Models on CPU: Optional

High VRAM (24GB+) Configuration

Mode: full or tiny
VAE: Wan2.1 or Wan2.2
Tiled VAE: โŒ Disabled
Tiled DiT: โŒ Disabled
Chunk Size: 0 (all frames)
Resize Factor: 1.0
Keep Models on CPU: โŒ Disabled

Processing Summary

At the end of each run, you'll see a summary:

============================================================
๐Ÿ“Š PROCESSING SUMMARY
โฑ๏ธ Total Processing Time: 130.08s (1.54 FPS)
๐Ÿ“ฅ Input Resolution: 276x206 (200 frames)
๐Ÿ“ค Output Resolution: 552x412 (200 frames)
๐Ÿ“ˆ Peak VRAM Used: 12.4 GB
============================================================

๐Ÿ”ง Node Parameters

Hover over any input in ComfyUI to see tooltips. Full parameter list:

ParameterDescription
modelFlashVSR model version
modetiny (fast), tiny-long (lowest VRAM), full (highest quality)
vae_modelVAE architecture (5 options, auto-download)
scaleUpscaling factor: 2x or 4x
color_fixWavelet color transfer. Highly recommended.
tiled_vaeSpatial tiling for VAE. Reduces VRAM, slower.
tiled_ditSpatial tiling for DiT. Required for 4K output.
tile_sizeTile dimensions. Smaller = less VRAM.
overlapTile overlap for seamless blending.
unload_ditUnload DiT before VAE decode.
frame_chunk_sizeProcess N frames at a time. 0 = all.
enable_debugVerbose console logging.
keep_models_on_cpuOffload to system RAM when idle.
resize_factorTo first reduce the size of large videos and then enlarge them, use a range of (0.3-1.0).
attention_modeAttention kernel: sparse_sage, flash_attention_2, sdpa, block_sparse

๐Ÿ’ป Command-Line Interface (CLI)

FlashVSR includes a full-featured CLI that mirrors all ComfyUI node parameters for standalone video upscaling.

Quick Start

# Basic 2x upscale
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 2

# 4x upscale with tiling for lower VRAM
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 4 \
    --tiled_vae --tiled_dit --tile_size 256 --tile_overlap 24

# Long video with chunking to prevent OOM
python cli_main.py --input long_video.mp4 --output upscaled.mp4 \
    --frame_chunk_size 50 --mode tiny-long

# Low VRAM mode (8GB GPUs)
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 2 \
    --vae_model LightVAE_W2.1 --tiled_vae --tiled_dit \
    --frame_chunk_size 20 --resize_factor 0.5

# Custom models directory
python cli_main.py --input video.mp4 --output upscaled.mp4 \
    --models_dir /path/to/your/models

CLI Arguments Reference

All arguments map 1:1 with ComfyUI node inputs. Run python cli_main.py --help for full details.

Required Arguments

ArgumentDescription
--input, -iInput video file path (e.g., video.mp4)
--output, -oOutput video file path (e.g., upscaled.mp4)

Pipeline Initialization (from FlashVSRNodeInitPipe)

ArgumentTypeDefaultDescription
--modelchoiceFlashVSR-v1.1Model version: FlashVSR, FlashVSR-v1.1
--modechoicetinyOperation mode: tiny, tiny-long, full
--vae_modelchoiceWan2.1VAE model: Wan2.1, Wan2.2, LightVAE_W2.1, TAE_W2.2, LightTAE_HY1.5
--force_offloadflagTrueForce offload models to CPU after execution
--no_force_offloadflag-Disable force offloading
--precisionchoiceautoPrecision: fp16, bf16, auto
--devicestringautoDevice: cuda:0, cuda:1, cpu, auto
--attention_modechoicesparse_sage_attentionAttention: sparse_sage_attention, block_sparse_attention, flash_attention_2, sdpa

Processing Parameters (from FlashVSRNodeAdv)

ArgumentTypeDefaultDescription
--scaleint2Upscaling factor: 2 or 4
--color_fixflagTrueApply wavelet-based color correction
--no_color_fixflag-Disable color correction
--tiled_vaeflagFalseEnable spatial tiling for VAE decoder
--tiled_ditflagFalseEnable spatial tiling for DiT
--tile_sizeint256Tile size for DiT processing (32-1024)
--tile_overlapint24Overlap pixels between tiles (8-512)
--unload_ditflagFalseUnload DiT before VAE decoding
--sparse_ratiofloat2.0Sparse attention control (1.5-2.0)
--kv_ratiofloat3.0Key/Value cache ratio (1.0-3.0)
--local_rangeint11Local attention window: 9 or 11
--seedint0Random seed for reproducibility
--frame_chunk_sizeint0Process N frames at a time (0 = all)
--enable_debugflagFalseEnable verbose logging
--keep_models_on_cpuflagTrueKeep models in CPU RAM when idle
--no_keep_models_on_cpuflag-Keep models in VRAM
--resize_factorfloat1.0Resize input before processing (0.1-1.0)

Video I/O Parameters

ArgumentTypeDefaultDescription
--fpsfloatinput FPSOutput video FPS
--codecstringlibx264Video codec: libx264, libx265, h264_nvenc
--crfint18Quality (0-51, lower = better)
--start_frameint0Start frame index (0-indexed)
--end_frameint-1End frame index (-1 = all frames)
--models_dirstring./modelsCustom models directory path

๐Ÿš€ Installation

Step 1: Install the Node

cd ComfyUI/custom_nodes
git clone https://github.com/naxci1/ComfyUI-FlashVSR_Stable.git
python -m pip install -r ComfyUI-FlashVSR_Stable/requirements.txt

๐Ÿ“ข Turing architecture or older GPUs (GTX 16 series, RTX 20 series, and earlier): Install triton<3.3.0:

# Windows
python -m pip install -U triton-windows<3.3.0
# Linux
python -m pip install -U triton<3.3.0

Step 2: Download Models

Download the FlashVSR folder from HuggingFace:

ComfyUI/models/FlashVSR/
โ”œโ”€โ”€ LQ_proj_in.ckpt
โ”œโ”€โ”€ TCDecoder.ckpt
โ”œโ”€โ”€ diffusion_pytorch_model_streaming_dmd.safetensors
โ””โ”€โ”€ Wan2.1_VAE.pth  (or auto-downloads)

๐Ÿ’ก VAE files auto-download from HuggingFace if not present. Only the DiT model and other components need manual download.

Step 3: Custom Model Paths (Optional)

By default, FlashVSR looks for models in ComfyUI/models/FlashVSR/. To use a different location (e.g., models on another drive):

  1. Edit model_paths.yaml in the ComfyUI-FlashVSR_Stable directory
  2. Set flashvsr_model_path to your custom path
  3. Restart ComfyUI

Example configurations:

# Windows (D: drive)
flashvsr_model_path: "D:/AI/Models/FlashVSR"

# Windows (alternative syntax)
flashvsr_model_path: "E:\\ComfyUI\\models\\FlashVSR"

# Linux/Mac
flashvsr_model_path: "/home/user/models/FlashVSR"
flashvsr_model_path: "/mnt/storage/AI/FlashVSR"

# Use default (leave empty)
flashvsr_model_path: ""

๐Ÿ“‚ Auto-Download Support: If model files don't exist, they will automatically download to the directory specified in model_paths.yaml. The custom path will be created if needed.

Example: If you set flashvsr_model_path: "D:/AI/Models", models will automatically download to D:/AI/Models/FlashVSR/ on first use.


๐Ÿ–ผ๏ธ Preview

Workflow Preview

Sample Workflow

Download Workflow JSON


๐Ÿท๏ธ Recent Changes

See CHANGELOG.md for full version history.


๐Ÿ™ Acknowledgments


๐Ÿ“„ License

MIT License - see LICENSE for details.