TensorRT Supported Model List

June 2, 2026 · View on GitHub

This verified model matrix pairs with import_workflows.md. For each model family, it lists the dtype(s) used during validation.

Scope & Reading Guide

TensorRT is a general-purpose neural-network graph execution engine, not a model zoo. In principle any NN architecture can run on TensorRT as long as it is expressible through the workflows described in the Import Workflows Guide. The Custom Plugin section covers the escape hatch for ops TensorRT does not yet implement natively.

The table below is not an exhaustive support list. It is the subset of models NVIDIA has verified and benchmarked; we publish it so you know which configurations have a known-good baseline and where the current rough edges are. If your model is not listed, the expectation is still that it works — please file an issue if it does not.

Reading the Tables

  • Dtype lists the precision used for the verified baseline. Other precisions may also work.
  • Component-split models (diffusion pipelines, speech models with encoder/decoder) list one row per validated component.

Table of Contents


LLMs / Text Generation

Preferred path for LLM generation: TensorRT-LLM (KV-cache, paged attention, FP8/INT4, speculative decoding, tensor/pipeline parallelism). For production LLM serving, use TensorRT-LLM.

ModelDtype
meta-llama/Llama-3.1-8Bbfloat16
meta-llama/Llama-3.2-1Bbfloat16
Qwen/Qwen3-0.6Bbfloat16
deepseek-ai/Janus-Pro-7Bbfloat16

For TensorRT-LLM's own coverage, see the TensorRT-LLM model support matrix.


Encoder-only NLP (BERT family, embeddings)

ModelDtype
google-bert/bert-base-uncasedfloat32
google-bert/bert-base-multilingual-casedfloat16
FacebookAI/roberta-basefloat32
FacebookAI/roberta-largefloat32
FacebookAI/xlm-roberta-basefloat32
distilbert/distilbert-base-uncasedfloat32
sentence-transformers/all-MiniLM-L6-v2float32
sentence-transformers/all-mpnet-base-v2float32
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2float32
BAAI/bge-base-en-v1.5float32
nlpaueb/legal-bert-base-uncasedfloat32

Vision Classification & Embeddings

ModelDtype
torchvision/resnet50float32
timm/mobilenetv3_small_100.lamb_in1kfloat32
trpakov/vit-face-expressionfloat32
openai/clip-vit-base-patch32float32
openai/clip-vit-large-patch14float32
facebook/dinov2-basefloat32
Falconsai/nsfw_image_detectionfloat32
dima806/fairface_age_image_detectionfloat32

Speech / Audio

Model (Component)Dtype
openai/whisper-large-v3-turbo (Encoder)float32
openai/whisper-large-v3-turbo (Decoder)float32
openai/whisper-large-v3 (Encoder)float32
openai/whisper-large-v3 (Decoder)float32
laion/clap-htsat-fusedfloat32
sesame/csm-1b (Backbone)float32
neuphonic/neutts-airfloat32
LiquidAI/LFM2-Audio-1.5Bfloat32

Diffusion Models

Diffusion pipelines are evaluated per component (Text Encoder / UNet or DiT / VAE) because TRT does not ingest the pipeline object directly.

Pipeline (Component)Dtype
stabilityai/sd-turbofloat16
stabilityai/sdxl-turbo (UNet)float16
stabilityai/sdxl-turbo (VAE / Text Encoders)mixed
stabilityai/stable-diffusion-xl-base-1.0float16
CompVis/stable-diffusion-v1-4float16
stable-diffusion-v1-5/stable-diffusion-v1-5float16
stabilityai/stable-diffusion-2-1float16
playgroundai/playground-v2.5-1024px-aestheticfloat16
dataautogpt3/ProteusV0.3float16
black-forest-labs/FLUX.2-dev (Text Encoder)bfloat16
black-forest-labs/FLUX.2-dev (DiT)bfloat16
black-forest-labs/FLUX.2-dev (VAE)float16
black-forest-labs/FLUX.1-schnell (DiT / TextEnc / VAE)mixed
Wan-AI/Wan2.2-T2V-A14B-Diffusers (Text Encoder)float16
Wan-AI/Wan2.2-T2V-A14B-Diffusers (VAE)float16
Qwen/Qwen-Image (Text Encoder)bfloat16
Qwen/Qwen-Image (DiT / VAE)bfloat16
stabilityai/stable-diffusion-3-medium-diffusersbfloat16
stabilityai/stable-diffusion-3.5-medium / 3.5-largemixed
HiDream-ai/HiDream-I1-Fullbfloat16
stabilityai/stable-video-diffusion-img2vid-xtfloat16

Multimodal

ModelDtype
openai/clip-vit-base-patch32float32
deepseek-ai/Janus-Pro-7Bbfloat16
Datadog/Toto-Open-Base-1.0float32

Legacy / TRT Sample Models

TensorRT ships hand-validated C++/Python samples for these classic architectures and workflows:

  • MNIST digit classifiers, model parsing, dynamic-shape, plugin, and safe-runtime samples — see samples/ in this repo.

Requesting New Model Coverage

File a GitHub issue with:

  1. The Hugging Face ID or model source URL.
  2. The target dtype (fp32 / fp16 / bf16 / fp8 / int8 / int4).
  3. Any framework-level working example (helps us reproduce quickly).

The maintainers will benchmark the model and extend this table — no external contributor action needed for the benchmark step.