Model Compression

March 7, 2023 · View on GitHub

In PaddleDetection, a complete tutorial and benchmarks for model compression based on PaddleSlim are provided. Currently supported methods:

It is recommended that you use a combination of pruning and distillation training, or use pruning and quantization for test model compression. The following takes YOLOv3 as an example to carry out cutting, distillation and quantization experiments.

Experimental Environment

  • Python 3.7+
  • PaddlePaddle >= 2.1.0
  • PaddleSlim >= 2.1.0
  • CUDA 10.1+
  • cuDNN >=7.6.5

Version Dependency between PaddleDetection, Paddle and PaddleSlim Version

PaddleDetection VersionPaddlePaddle VersionPaddleSlim VersionNote
release/2.1>= 2.1.02.1Quantitative model exports rely on the latest Paddle Develop branch, available inPaddlePaddle Daily version
release/2.0>= 2.0.12.0Quantization depends on Paddle 2.1 and PaddleSlim 2.1

Install PaddleSlim

  • Method 1: Install it directly:
pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
  • Method 2: Compile and install:
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd PaddleSlim
python setup.py install

Quick Start

Train

python tools/train.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml}
  • -c: Specify the model configuration file.
  • --slim_config: Specify the compression policy profile.
  • If you want to use distillation, please refer to Distillation Doc for specific distillation methods and more distillation of detection models.

Evaluation

python tools/eval.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final
  • -c: Specify the model configuration file.
  • --slim_config: Specify the compression policy profile.
  • -o weights: Specifies the path of the model trained by the compression algorithm.

Test

python tools/infer.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} \
    -o weights=output/{SLIM_CONFIG}/model_final
    --infer_img={IMAGE_PATH}
  • -c: Specify the model configuration file.
  • --slim_config: Specify the compression policy profile.
  • -o weights: Specifies the path of the model trained by the compression algorithm.
  • --infer_img: Specifies the test image path.

Full Chain Deployment

the model is derived from moving to static

python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final
  • -c: Specify the model configuration file.
  • --slim_config: Specify the compression policy profile.
  • -o weights: Specifies the path of the model trained by the compression algorithm.

prediction and deployment

Benchmark

Pruning

Pascal VOC Benchmark

ModelCompression StrategyGFLOPsModel Volume(MB)Input SizePredict Delay(SD855)Box APDownloadModel Configuration FileCompression Algorithm Configuration File
YOLOv3-MobileNetV1baseline24.1393608332.0ms75.1linkconfiguration file-
YOLOv3-MobileNetV1剪裁-l1_norm(sensity)15.78(-34.49%)66(-29%)608-78.4(+3.3)linkconfiguration fileslim configuration file

COCO Benchmark

ModeCompression StrategyGFLOPsModel Volume(MB)Input SizePredict Delay(SD855)Box APDownloadModel Configuration FileCompression Algorithm Configuration File
PP-YOLO-MobileNetV3_largebaseline--18.560825.1ms23.2linkconfiguration file-
PP-YOLO-MobileNetV3_large剪裁-FPGM-37%12.6608-22.3linkconfiguration fileslim configuration file
YOLOv3-DarkNet53baseline--238.2608-39.0linkconfiguration file-
YOLOv3-DarkNet53剪裁-FPGM-24%-608-37.6linkconfiguration fileslim configuration file
PP-YOLO_R50vdbaseline--183.3608-44.8linkconfiguration file-
PP-YOLO_R50vd剪裁-FPGM-35%-608-42.1linkconfiguration fileslim configuration file

Description:

  • Currently, all models except RCNN series models are supported.
  • The SD855 predicts the delay for deployment using Paddle Lite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.

Quantitative

COCO Benchmark

ModelCompression StrategyInput SizeModel Volume(MB)Prediction Delay(V100)Prediction Delay(SD855)Box APDownloadDownload of Inference ModelModel Configuration FileCompression Algorithm Configuration File
PP-YOLOE-lbaseline640-11.2ms(trt_fp32) | 7.7ms(trt_fp16)--50.9link-Configuration File-
PP-YOLOE-lCommon Online quantitative640-6.7ms(trt_int8)--48.8link-Configuration FileConfiguration File
PP-YOLOv2_R50vdbaseline640208.619.1ms--49.1linklinkConfiguration File-
PP-YOLOv2_R50vdPACT Online quantitative640--17.3ms--48.1linklinkConfiguration File Configuration File
PP-YOLO_R50vdbaseline608183.317.4ms--44.8linklinkConfiguration File -
PP-YOLO_R50vdPACT Online quantitative60867.313.8ms--44.3linklinkConfiguration File Configuration File
PP-YOLO-MobileNetV3_largebaseline32018.52.7ms27.9ms23.2linklinkConfiguration File -
PP-YOLO-MobileNetV3_largeCommon Online quantitative3205.6--25.1ms24.3linklinkConfiguration File Configuration File
YOLOv3-MobileNetV1baseline60894.28.9ms332ms29.4linklinkConfiguration File -
YOLOv3-MobileNetV1Common Online quantitative60825.46.6ms248ms30.5linklinkConfiguration File slim Configuration File
YOLOv3-MobileNetV3baseline60890.39.4ms367.2ms31.4linklinkConfiguration File -
YOLOv3-MobileNetV3PACT Online quantitative60824.48.0ms280.0ms31.1linklinkConfiguration File slim Configuration File
YOLOv3-DarkNet53baseline608238.216.0ms--39.0linklinkConfiguration File -
YOLOv3-DarkNet53Common Online quantitative60878.812.4ms--38.8linklinkConfiguration File slim Configuration File
SSD-MobileNet_v1baseline30022.54.4ms26.6ms73.8linklinkConfiguration File -
SSD-MobileNet_v1Common Online quantitative3007.1--21.5ms72.9linklinkConfiguration File slim Configuration File
Mask-ResNet50-FPNbaseline(800, 1333)174.1359.5ms--39.2/35.6linklinkConfiguration File -
Mask-ResNet50-FPNCommon Online quantitative(800, 1333)------39.7(+0.5)/35.9(+0.3)linklinkConfiguration File slim Configuration File

Description:

  • The above V100 prediction delay non-quantified model is tested by TensorRT FP32, and the quantified model is tested by TensorRT INT8, and both of them include NMS time.
  • The SD855 predicts the delay for deployment using PaddleLite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.

Distillation

COCO Benchmark

ModelCompression StrategyInput SizeBox APDownloadModel Configuration FileCompression Strategy Configuration File
YOLOv3-MobileNetV1baseline60829.4linkConfiguration File -
YOLOv3-MobileNetV1Distillation60831.0(+1.6)linkConfiguration File slimConfiguration File
  • For the specific distillation method and more distillation detection models, please refer to distill.

Distillation Pruning Combined Strategy

COCO Benchmark

ModelCompression StrategyInput SizeGFLOPsModel Volume(MB)Prediction Delay(SD855)Box APDownloadModel Configuration FileCompression Algorithm Configuration File
YOLOv3-MobileNetV1baseline60824.6594.2332.0ms29.4linkConfiguration File -
YOLOv3-MobileNetV1Distillation + Tailoring6087.54(-69.4%)30.9(-67.2%)166.1ms28.4(-1.0)linkConfiguration File slimConfiguration File