TensorRT Supported Model List
June 2, 2026 · View on GitHub
This verified model matrix pairs with import_workflows.md. For each model family, it lists the dtype(s) used during validation.
Scope & Reading Guide
TensorRT is a general-purpose neural-network graph execution engine, not a model zoo. In principle any NN architecture can run on TensorRT as long as it is expressible through the workflows described in the Import Workflows Guide. The Custom Plugin section covers the escape hatch for ops TensorRT does not yet implement natively.
The table below is not an exhaustive support list. It is the subset of models NVIDIA has verified and benchmarked; we publish it so you know which configurations have a known-good baseline and where the current rough edges are. If your model is not listed, the expectation is still that it works — please file an issue if it does not.
Reading the Tables
- Dtype lists the precision used for the verified baseline. Other precisions may also work.
- Component-split models (diffusion pipelines, speech models with encoder/decoder) list one row per validated component.
Table of Contents
- LLMs / Text Generation
- Encoder-only NLP (BERT family, embeddings)
- Vision Classification & Embeddings
- Speech / Audio
- Diffusion Models
- Multimodal
- Legacy / TRT Sample Models
- Requesting New Model Coverage
LLMs / Text Generation
Preferred path for LLM generation: TensorRT-LLM (KV-cache, paged attention, FP8/INT4, speculative decoding, tensor/pipeline parallelism). For production LLM serving, use TensorRT-LLM.
| Model | Dtype |
|---|---|
meta-llama/Llama-3.1-8B | bfloat16 |
meta-llama/Llama-3.2-1B | bfloat16 |
Qwen/Qwen3-0.6B | bfloat16 |
deepseek-ai/Janus-Pro-7B | bfloat16 |
For TensorRT-LLM's own coverage, see the TensorRT-LLM model support matrix.
Encoder-only NLP (BERT family, embeddings)
| Model | Dtype |
|---|---|
google-bert/bert-base-uncased | float32 |
google-bert/bert-base-multilingual-cased | float16 |
FacebookAI/roberta-base | float32 |
FacebookAI/roberta-large | float32 |
FacebookAI/xlm-roberta-base | float32 |
distilbert/distilbert-base-uncased | float32 |
sentence-transformers/all-MiniLM-L6-v2 | float32 |
sentence-transformers/all-mpnet-base-v2 | float32 |
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | float32 |
BAAI/bge-base-en-v1.5 | float32 |
nlpaueb/legal-bert-base-uncased | float32 |
Vision Classification & Embeddings
| Model | Dtype |
|---|---|
torchvision/resnet50 | float32 |
timm/mobilenetv3_small_100.lamb_in1k | float32 |
trpakov/vit-face-expression | float32 |
openai/clip-vit-base-patch32 | float32 |
openai/clip-vit-large-patch14 | float32 |
facebook/dinov2-base | float32 |
Falconsai/nsfw_image_detection | float32 |
dima806/fairface_age_image_detection | float32 |
Speech / Audio
| Model (Component) | Dtype |
|---|---|
openai/whisper-large-v3-turbo (Encoder) | float32 |
openai/whisper-large-v3-turbo (Decoder) | float32 |
openai/whisper-large-v3 (Encoder) | float32 |
openai/whisper-large-v3 (Decoder) | float32 |
laion/clap-htsat-fused | float32 |
sesame/csm-1b (Backbone) | float32 |
neuphonic/neutts-air | float32 |
LiquidAI/LFM2-Audio-1.5B | float32 |
Diffusion Models
Diffusion pipelines are evaluated per component (Text Encoder / UNet or DiT / VAE) because TRT does not ingest the pipeline object directly.
| Pipeline (Component) | Dtype |
|---|---|
stabilityai/sd-turbo | float16 |
stabilityai/sdxl-turbo (UNet) | float16 |
stabilityai/sdxl-turbo (VAE / Text Encoders) | mixed |
stabilityai/stable-diffusion-xl-base-1.0 | float16 |
CompVis/stable-diffusion-v1-4 | float16 |
stable-diffusion-v1-5/stable-diffusion-v1-5 | float16 |
stabilityai/stable-diffusion-2-1 | float16 |
playgroundai/playground-v2.5-1024px-aesthetic | float16 |
dataautogpt3/ProteusV0.3 | float16 |
black-forest-labs/FLUX.2-dev (Text Encoder) | bfloat16 |
black-forest-labs/FLUX.2-dev (DiT) | bfloat16 |
black-forest-labs/FLUX.2-dev (VAE) | float16 |
black-forest-labs/FLUX.1-schnell (DiT / TextEnc / VAE) | mixed |
Wan-AI/Wan2.2-T2V-A14B-Diffusers (Text Encoder) | float16 |
Wan-AI/Wan2.2-T2V-A14B-Diffusers (VAE) | float16 |
Qwen/Qwen-Image (Text Encoder) | bfloat16 |
Qwen/Qwen-Image (DiT / VAE) | bfloat16 |
stabilityai/stable-diffusion-3-medium-diffusers | bfloat16 |
stabilityai/stable-diffusion-3.5-medium / 3.5-large | mixed |
HiDream-ai/HiDream-I1-Full | bfloat16 |
stabilityai/stable-video-diffusion-img2vid-xt | float16 |
Multimodal
| Model | Dtype |
|---|---|
openai/clip-vit-base-patch32 | float32 |
deepseek-ai/Janus-Pro-7B | bfloat16 |
Datadog/Toto-Open-Base-1.0 | float32 |
Legacy / TRT Sample Models
TensorRT ships hand-validated C++/Python samples for these classic architectures and workflows:
- MNIST digit classifiers, model parsing, dynamic-shape, plugin, and safe-runtime samples — see
samples/in this repo.
Requesting New Model Coverage
File a GitHub issue with:
- The Hugging Face ID or model source URL.
- The target dtype (fp32 / fp16 / bf16 / fp8 / int8 / int4).
- Any framework-level working example (helps us reproduce quickly).
The maintainers will benchmark the model and extend this table — no external contributor action needed for the benchmark step.