Model export and LightSeq inference

November 9, 2022 ยท View on GitHub

This repo contains examples of exporting models (LightSeq, Fairseq based, Hugging Face, etc.) to protobuf/hdf5 format, and then use LightSeq for fast inference. For each model, we provide normal float model export, quantized model export (QAT, quantization aware training) and PTQ (post training quantization) model export.

Before doing anything, you need to switch to the current directory:

cd examples/inference/python

Model export

We provide the following export examples. All Fairseq based models are trained using the scripts in examples/training/fairseq. The first two LightSeq Transformer models are trained using the scripts in examples/training/custom.

ModelTypeCommandResourceDescription
LightSeq TransformerFloatpython3 export/ls_transformer_export.py -m ckpt_ls_custom.ptlinkExport LightSeq Transformer models to protobuf format.
LightSeq Transformer + PTQInt8python3 export/ls_transformer_ptq_export.py -m ckpt_ls_custom.ptlinkExport LightSeq Transformer models to int8 protobuf format using post training quantization.
Hugging Face BARTFloatpython3 export/huggingface/hf_bart_export.py/Export Hugging Face BART models to protobuf/hdf5 format.
Hugging Face BERTFloatpython3 export/huggingface/hf_bert_export.py/Export Hugging Face BERT models to hdf5 format.
Hugging Face + custom Torch layer BERT + QATInt8python3 export/huggingface/ls_torch_hf_quant_bert_export.py -m ckpt_ls_torch_hf_quant_bert_ner.bin/Export Hugging Face BERT training with custom Torch layers to hdf5 format.
Hugging Face GPT2Floatpython3 export/huggingface/hf_gpt2_export.py/Export Hugging Face GPT2 models to hdf5 format.
Hugging Face + custom Torch layer GPT2 + QATInt8python3 export/huggingface/ls_torch_hf_quant_gpt2_export.py -m ckpt_ls_torch_hf_quant_gpt2_ner.bin/Export Hugging Face GPT2 training with custom Torch layers to hdf5 format.
Hugging Face ViTFloatpython3 export/huggingface/hf_vit_export.py/Export Hugging Face ViT models to hdf5 format.
Native Fairseq TransformerFloatpython3 export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.ptlinkExport native Fairseq Transformer models to protobuf/hdf5 format.
Native Fairseq Transformer + PTQInt8python3 export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.ptlinkExport native Fairseq Transformer models to int8 protobuf format using post training quantization.
Fairseq + LightSeq TransformerFloatpython3 export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.ptlinkExport Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format.
Fairseq + LightSeq Transformer + PTQInt8python3 export/fairseq/ls_fs_transformer_ptq_export.py -m ckpt_ls_fairseq_31.17.ptlinkExport Fairseq Transformer models training with LightSeq modules to int8 protobuf format using post training quantization.
Fairseq + custom Torch layerFloatpython3 export/fairseq/ls_torch_fs_transformer_export.py -m ckpt_ls_torch_fairseq_31.16.ptlinkExport Fairseq Transformer models training with custom Torch layers and other LightSeq modules to protobuf format.
Fairseq + custom Torch layer + PTQInt8python3 export/fairseq/ls_torch_fs_transformer_ptq_export.py -m ckpt_ls_torch_fairseq_31.16.ptlinkExport Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format using post training quantization.
Fairseq + custom Torch layer + QATInt8python3 export/fairseq/ls_torch_fs_quant_transformer_export.py -m ckpt_ls_torch_fairseq_quant_31.09.ptlinkExport quantized Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format.
Native Fairseq MoE TransformerFloatpython3 export/fairseq/native_fs_moe_transformer_export.py/Export Fairseq MoE Transformer models to protobuf/hdf5 format.

LightSeq inference

Hugging Face models

  1. BART
python3 test/ls_bart.py
  1. BERT
python3 test/ls_bert.py
  1. GPT2
python3 test/ls_gpt2.py
  1. ViT
python3 test/ls_vit.py
  1. Quantized BERT
python3 test/ls_quant_bert.py
  1. Quantized GPT2
python3 test/ls_quant_gpt.py

Fairseq based models

After exporting the Fairseq based models to protobuf/hdf5 format using above scripts, we can use the following script for fast LightSeq inference on wmt14 en2de dateset, compatible with fp16 and int8 models:

bash test/ls_fairseq.sh --model ${model_path}