⚠️ Notice: Limited Maintenance
February 28, 2025 · View on GitHub
This project is no longer actively maintained. While existing releases remain available, there are no planned updates, bug fixes, new features, or security patches. Users should be aware that vulnerabilities may not be addressed.
Examples showcasing TorchServe Features and Integrations
Security Changes
TorchServe now enforces token authorization and model API control by default. This change will impact the current examples so please check the following documentation for more information: Token Authorization, Model API control
TorchServe Internals
- Creating mar file for an eager mode model
- Creating mar file for torchscript mode model
- Serving custom model with custom service handler
- Serving model using Docker Container
- Creating a Workflow
- Custom Metrics
- Dynamic Batch Processing
- Dynamic Batched Async Requests
TorchServe Integrations
Kubernetes 
KServe 
Hugging Face 
PiPPy Serving Large Models with PyTorch Native Solution PiPPy
MLFlow 
Captum 
ONNX 
TensorRT
Microsoft DeepSpeed-MII 
Prometheus and mtail 
Intel® Extension for PyTorch
TorchRec DLRM
TorchData
PyTorch 2.0
Stable Diffusion 
HuggingFace Large Models with Accelerate 
UseCases
Vision
Image Classification
- Serving torchvision image classification models
- Serving Image Classifier model for on-premise near real-time video
Object Detection
GAN
Text
Neural Machine Translation
Text Classification
Text to Speech
MultiModal
TorchServe Examples
The following are examples on how to create and serve model archives with TorchServe.
Creating mar file for eager mode model
Following are the steps to create a torch-model-archive (.mar) to execute an eager mode torch model in TorchServe :
-
Pre-requisites to create a torch model archive (.mar) :
- serialized-file (.pt) : This file represents the
state_dictin case of eager mode model. - model-file (.py) : This file contains model class extended from
torch nn.modules representing the model architecture. This parameter is mandatory for eager mode models. This file must contain only one class definition extended from torch.nn.Module. - index_to_name.json : This file contains the mapping of predicted index to class. The default TorchServe handles returns the predicted index and probability. This file can be passed to model archiver using --extra-files parameter.
- version : Model's version.
- handler : TorchServe default handler's name or path to custom inference handler(.py)
- serialized-file (.pt) : This file represents the
-
Syntax
torch-model-archiver --model-name <model_name> --version <model_version_number> --model-file <path_to_model_architecture_file> --serialized-file <path_to_state_dict_file> --handler <path_to_custom_handler_or_default_handler_name> --extra-files <path_to_index_to_name_json_file>
Creating mar file for torchscript mode model
Following are the steps to create a torch-model-archive (.mar) to execute a torchscript mode torch model in TorchServe :
-
Pre-requisites to create a torch model archive (.mar) :
- serialized-file (.pt) : This file represents the state_dict in case of eager mode model or an executable
ScriptModulein case of TorchScript. - index_to_name.json : This file contains the mapping of predicted index to class. The default TorchServe handles returns the predicted index and probability. This file can be passed to model archiver using --extra-files parameter.
- version : Model's version.
- handler : TorchServe default handler's name or path to custom inference handler(.py)
- serialized-file (.pt) : This file represents the state_dict in case of eager mode model or an executable
-
Syntax
torch-model-archiver --model-name <model_name> --version <model_version_number> --serialized-file <path_to_executable_script_module> --extra-files <path_to_index_to_name_json_file> --handler <path_to_custom_handler_or_default_handler_name>
Serving image classification models
The following example demonstrates how to create image classifier model archive, serve it on TorchServe and run image prediction using TorchServe's default image_classifier handler :
Serving custom model with custom service handler
The following example demonstrates how to create and serve a custom NN model with custom handler archives in TorchServe :
Serving text classification model
The following example demonstrates how to create and serve a custom text_classification NN model with default text_classifier handler provided by TorchServe :
Serving text classification model with scriptable tokenizer
This example shows how to combine a text classification model with a scriptable tokenizer into a single, scripted artifact to serve with TorchServe. A scriptable tokenizer is a tokenizer compatible with TorchScript.
Serving object detection model
The following example demonstrates how to create and serve a pretrained fast-rcnn NN model with default object_detector handler provided by TorchServe :
Serving image segmentation model
The following example demonstrates how to create and serve a pretrained fcn NN model with default image_segmenter handler provided by TorchServe :
Serving Huggingface Transformers
The following example demonstrates how to create and serve a pretrained transformer models from Huggingface such as BERT, RoBERTA, XLM
Captum Integration
The following example demonstrates TorchServe's integration with Captum, an open source, extensible library for model interpretability built on PyTorch
Example to serve GAN model
The following example demonstrates how to create and serve a pretrained DCGAN model from facebookresearch/pytorch_GAN_zoo
Serving Neural Machine Translation
The following example demonstrates how to create and serve a neural translation model using fairseq
Serving Waveglow text to speech synthesizer
The following example demonstrates how to create and serve the waveglow text to speech synthesizer
Serving Multi modal model
The following example demonstrates how to create and serve a multi modal model including audio, text and video
Serving Image Classification Workflow
The following example demonstrates how to create and serve a complex image classification workflow for dog breed classification
Serving Neural Machine Translation Workflow
The following example demonstrates how to create and serve a complex neural machine translation workflow
Serving Torchrec DLRM (Recommender Model)
This example shows how to deploy a Deep Learning Recommendation Model (DLRM) with TorchRec
Serving Image Classifier Model for on-premise near real-time video
The following example demonstrates how to serve an image classification model with batching for near real-time video
Serving Image Classifier Model with TorchData datapipes
The following example demonstrates how to integrate TorchData with torchserve