Supported Models

May 4, 2026 · View on GitHub

(support-matrix)=

Supported Models

The following is a table of supported models for the PyTorch backend:

ArchitectureModelHuggingFace Example
BertForSequenceClassificationBERT-basedtextattack/bert-base-uncased-yelp-polarity
Cohere2ForCausalLMCommand ACohereLabs/c4ai-command-a-03-2025
DeciLMForCausalLMNemotronnvidia/Llama-3_1-Nemotron-51B-Instruct
DeepseekV3ForCausalLMDeepSeek-V3, Kimi-K2deepseek-ai/DeepSeek-V3
DeepseekV32ForCausalLMDeepSeek-V3.2deepseek-ai/DeepSeek-V3.2
Exaone4ForCausalLMEXAONE 4.0LGAI-EXAONE/EXAONE-4.0-32B
ExaoneMoEForCausalLMK-EXAONELGAI-EXAONE/K-EXAONE-236B-A23B
Gemma3ForCausalLMGemma 3google/gemma-3-1b-it
Gemma3nForConditionalGeneration 1Gemma 3ngoogle/gemma-3n-E2B-it, google/gemma-3n-E4B-it
Gemma4ForConditionalGeneration 2Gemma 4google/gemma-4-26B-A4B-it, google/gemma-4-31B-it
Glm4MoeForCausalLMGLM-4.5, GLM-4.6, GLM-4.7THUDM/GLM-4-100B-A10B
Glm4MoeLiteForCausalLM 3GLM-4.7-Flashzai-org/GLM-4.7-Flash
GlmMoeDsaForCausalLMGLM-5zai-org/GLM-5
GptOssForCausalLMGPT-OSSopenai/gpt-oss-120b
KimiK25ForConditionalGenerationKimi-K2.5moonshotai/Kimi-K2.5
LlamaForCausalLMLlama 3.1, Llama 3, Llama 2, LLaMAmeta-llama/Meta-Llama-3.1-70B
Llama4ForConditionalGenerationLlama 4meta-llama/Llama-4-Scout-17B-16E-Instruct
MiniMaxM2ForCausalLM 4MiniMax M2/M2.1/M2.7MiniMaxAI/MiniMax-M2.7
MistralForCausalLMMistralmistralai/Mistral-7B-v0.1
MixtralForCausalLMMixtralmistralai/Mixtral-8x7B-v0.1
MllamaForConditionalGenerationLlama 3.2meta-llama/Llama-3.2-11B-Vision
NemotronForCausalLMNemotron-3, Nemotron-4, Minitronnvidia/Minitron-8B-Base
NemotronHForCausalLMNemotron-3-Nano, Nemotron-3-Supernvidia/nvidia-nemotron-v3
NemotronNASForCausalLMNemotronNASnvidia/Llama-3_3-Nemotron-Super-49B-v1
Phi3ForCausalLMPhi-4microsoft/Phi-4
Qwen2ForCausalLMQwQ, Qwen2Qwen/Qwen2-7B-Instruct
Qwen2ForProcessRewardModelQwen2-basedQwen/Qwen2.5-Math-PRM-7B
Qwen2ForRewardModelQwen2-basedQwen/Qwen2.5-Math-RM-72B
Qwen3ForCausalLMQwen3Qwen/Qwen3-8B
Qwen3MoeForCausalLMQwen3MoEQwen/Qwen3-30B-A3B
Qwen3NextForCausalLMQwen3NextQwen/Qwen3-Next-80B-A3B-Thinking
Qwen3_5MoeForCausalLM 5Qwen3.5-MoEQwen/Qwen3.5-397B-A17B

Model-Feature Support Matrix (Key Models)

Note: Support for other models may vary. Features marked "N/A" are not applicable to the model architecture.

Model Architecture/FeatureOverlap SchedulerCUDA GraphAttention Data ParallelismDisaggregated ServingChunked PrefillMTPEAGLE-3(One Model Engine) — LinearEAGLE-3(One Model Engine) — DynamicEAGLE-3(Two Model Engine)Torch SamplerTLLM C++ SamplerKV Cache ReuseSliding Window AttentionLogits Post ProcessorGuided Decoding
DeepseekV3ForCausalLMYesYesYesYesYes 6YesNoNoNoYesYesYes 7N/AYesYes
DeepseekV32ForCausalLMYesYesYesYesYesYesNoNoNoYesYesYesN/AYesYes
Glm4MoeForCausalLMYesYesYesUntestedYesYesNoNoNoYesYesUntestedN/AYesYes
Qwen3MoeForCausalLMYesYesYesYesYesNoYesYesYesYesYesYesN/AYesYes
Qwen3NextForCausalLM 8YesYesYesUntestedYesNoNoNoNoYesYesNoNoUntestedUntested
Llama4ForConditionalGenerationYesYesYesYesYesNoYesYesYesYesYesUntestedN/AYesYes
GptOssForCausalLMYesYesYesYesYesNoYesNoYes 9YesYesYesN/AYesYes
Qwen3_5MoeForCausalLM 5YesYesUntestedUntestedYesNoNoNoNoYesUntestedYesN/AUntestedUntested
Glm4MoeLiteForCausalLM 3YesYesUntestedUntestedYesNoNoNoNoYesUntestedUntestedN/AUntestedUntested
NemotronHForCausalLM (Super)YesYesUntestedUntestedYesYesNoNoNoYesYesUntestedN/AUntestedUntested

Multimodal Feature Support Matrix (PyTorch Backend)

Model Architecture/FeatureOverlap SchedulerCUDA GraphChunked PrefillTorch SamplerTLLM C++ SamplerKV Cache ReuseLogits Post ProcessorEPD Disaggregated ServingModality
Gemma3ForConditionalGenerationYesYesN/AYesYesN/AYesNoL + I
HCXVisionForCausalLMYesYesNoYesYesYesYesNoL + I
LlavaLlamaModel (VILA)YesYesNoYesYesNoYesNoL + I + V
LlavaNextForConditionalGenerationYesYesYesYesYesYesYesYesL + I
Llama4ForConditionalGenerationYesYesNoYesYesNoYesNoL + I
Mistral3ForConditionalGenerationYesYesYesYesYesYesYesNoL + I
NemotronH_Nano_VL_V2YesYesYesYesYesN/AYesNoL + I + V
Phi4MMForCausalLMYesYesYesYesYesYesYesNoL + I + A
Qwen2VLForConditionalGenerationYesYesYesYesYesYesYesNoL + I + V
Qwen2_5_VLForConditionalGenerationYesYesYesYesYesYesYesYesL + I + V
Qwen3VLForConditionalGenerationYesYesYesYesYesYesYesYesL + I + V
Qwen3VLMoeForConditionalGenerationYesYesYesYesYesYesYesYesL + I + V

Note:

  • L: Language
  • I: Image
  • V: Video
  • A: Audio

Visual Generation Models

TensorRT-LLM provides beta support for diffusion-based image and video generation. For full documentation, see the Visual Generation page.

Supported Models

HuggingFace Model IDTasks
black-forest-labs/FLUX.1-devText-to-Image
black-forest-labs/FLUX.2-devText-to-Image
Wan-AI/Wan2.1-T2V-1.3B-DiffusersText-to-Video
Wan-AI/Wan2.1-T2V-14B-DiffusersText-to-Video
Wan-AI/Wan2.1-I2V-14B-480P-DiffusersImage-to-Video
Wan-AI/Wan2.1-I2V-14B-720P-DiffusersImage-to-Video
Wan-AI/Wan2.2-T2V-A14B-DiffusersText-to-Video
Wan-AI/Wan2.2-I2V-A14B-DiffusersImage-to-Video
Wan-AI/Wan2.2-TI2V-5B-DiffusersText-to-Video, Image-to-Video
Lightricks/LTX-2Text-to-Video (with Audio), Image-to-Video (with Audio)

Feature Matrix

ModelTeaCacheCFG ParallelismUlysses ParallelismParallel VAECUDA Graphtorch.compiletrtllm-serve
FLUX.1YesNo 10YesNoYesYesYes
FLUX.2YesNo 10YesNoYesYesYes
Wan 2.1YesYesYesYesYesYesYes
Wan 2.2NoYesYesYesYesYesYes
LTX-2NoYesYesNoNoYesYes

Footnotes

  1. Text-only support via the AutoDeploy backend. See AD config.

  2. Text-only support via the AutoDeploy backend. See AD configs for MoE and dense.

  3. Supported via the AutoDeploy backend. See AD config. 2

  4. Supported via the AutoDeploy backend. See AD config.

  5. Supported via the AutoDeploy backend. See AD config. 2

  6. Chunked Prefill for MLA can only be enabled on SM100/SM103.

  7. KV cache reuse for MLA can only be enabled on SM90/SM100/SM103 and in BF16/FP8 KV cache dtype.

  8. Qwen3-Next-80B-A3B exhibits relatively low accuracy on the SciCode-AA-v2 benchmark.

  9. Overlap scheduler isn't supported when using EAGLE-3(Two Model Engine) for GPT-OSS.

  10. FLUX models use embedded guidance and do not have a separate negative prompt path, so CFG parallelism is not applicable. 2