Model Compression

March 7, 2023 · View on GitHub

In PaddleDetection, a complete tutorial and benchmarks for model compression based on PaddleSlim are provided. Currently supported methods:

It is recommended that you use a combination of pruning and distillation training, or use pruning and quantization for test model compression. The following takes YOLOv3 as an example to carry out cutting, distillation and quantization experiments.

Experimental Environment

Python 3.7+
PaddlePaddle >= 2.1.0
PaddleSlim >= 2.1.0
CUDA 10.1+
cuDNN >=7.6.5

Version Dependency between PaddleDetection, Paddle and PaddleSlim Version

PaddleDetection Version	PaddlePaddle Version	PaddleSlim Version	Note
release/2.1	>= 2.1.0	2.1	Quantitative model exports rely on the latest Paddle Develop branch, available inPaddlePaddle Daily version
release/2.0	>= 2.0.1	2.0	Quantization depends on Paddle 2.1 and PaddleSlim 2.1

Install PaddleSlim

Method 1: Install it directly：

pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple

Method 2: Compile and install：

git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd PaddleSlim
python setup.py install

Quick Start

Train

python tools/train.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml}

-c: Specify the model configuration file.
--slim_config: Specify the compression policy profile.
If you want to use distillation, please refer to Distillation Doc for specific distillation methods and more distillation of detection models.

Evaluation

python tools/eval.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final

-c: Specify the model configuration file.
--slim_config: Specify the compression policy profile.
-o weights: Specifies the path of the model trained by the compression algorithm.

Test

python tools/infer.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} \
    -o weights=output/{SLIM_CONFIG}/model_final
    --infer_img={IMAGE_PATH}

-c: Specify the model configuration file.
--slim_config: Specify the compression policy profile.
-o weights: Specifies the path of the model trained by the compression algorithm.
--infer_img: Specifies the test image path.

Full Chain Deployment

the model is derived from moving to static

python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final

-c: Specify the model configuration file.
--slim_config: Specify the compression policy profile.
-o weights: Specifies the path of the model trained by the compression algorithm.

prediction and deployment

Paddle-Inference Prediction：
Server deployment: UsedPaddleServing
Mobile deployment: UsePaddle-Lite Deploy it on the mobile terminal.

Benchmark

Pruning

Pascal VOC Benchmark

Model	Compression Strategy	GFLOPs	Model Volume(MB)	Input Size	Predict Delay(SD855)	Box AP	Download	Model Configuration File	Compression Algorithm Configuration File
YOLOv3-MobileNetV1	baseline	24.13	93	608	332.0ms	75.1	link	configuration file	-
YOLOv3-MobileNetV1	剪裁-l1_norm(sensity)	15.78(-34.49%)	66(-29%)	608	-	78.4(+3.3)	link	configuration file	slim configuration file

COCO Benchmark

Mode	Compression Strategy	GFLOPs	Model Volume(MB)	Input Size	Predict Delay(SD855)	Box AP	Download	Model Configuration File	Compression Algorithm Configuration File
PP-YOLO-MobileNetV3_large	baseline	--	18.5	608	25.1ms	23.2	link	configuration file	-
PP-YOLO-MobileNetV3_large	剪裁-FPGM	-37%	12.6	608	-	22.3	link	configuration file	slim configuration file
YOLOv3-DarkNet53	baseline	--	238.2	608	-	39.0	link	configuration file	-
YOLOv3-DarkNet53	剪裁-FPGM	-24%	-	608	-	37.6	link	configuration file	slim configuration file
PP-YOLO_R50vd	baseline	--	183.3	608	-	44.8	link	configuration file	-
PP-YOLO_R50vd	剪裁-FPGM	-35%	-	608	-	42.1	link	configuration file	slim configuration file

Description:

Currently, all models except RCNN series models are supported.
The SD855 predicts the delay for deployment using Paddle Lite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.

Quantitative

COCO Benchmark

Model	Compression Strategy	Input Size	Model Volume(MB)	Prediction Delay(V100)	Prediction Delay(SD855)	Box AP	Download	Download of Inference Model	Model Configuration File	Compression Algorithm Configuration File
PP-YOLOE-l	baseline	640	-	11.2ms(trt_fp32) \| 7.7ms(trt_fp16)	--	50.9	link	-	Configuration File	-
PP-YOLOE-l	Common Online quantitative	640	-	6.7ms(trt_int8)	--	48.8	link	-	Configuration File	Configuration File
PP-YOLOv2_R50vd	baseline	640	208.6	19.1ms	--	49.1	link	link	Configuration File	-
PP-YOLOv2_R50vd	PACT Online quantitative	640	--	17.3ms	--	48.1	link	link	Configuration File	Configuration File
PP-YOLO_R50vd	baseline	608	183.3	17.4ms	--	44.8	link	link	Configuration File	-
PP-YOLO_R50vd	PACT Online quantitative	608	67.3	13.8ms	--	44.3	link	link	Configuration File	Configuration File
PP-YOLO-MobileNetV3_large	baseline	320	18.5	2.7ms	27.9ms	23.2	link	link	Configuration File	-
PP-YOLO-MobileNetV3_large	Common Online quantitative	320	5.6	--	25.1ms	24.3	link	link	Configuration File	Configuration File
YOLOv3-MobileNetV1	baseline	608	94.2	8.9ms	332ms	29.4	link	link	Configuration File	-
YOLOv3-MobileNetV1	Common Online quantitative	608	25.4	6.6ms	248ms	30.5	link	link	Configuration File	slim Configuration File
YOLOv3-MobileNetV3	baseline	608	90.3	9.4ms	367.2ms	31.4	link	link	Configuration File	-
YOLOv3-MobileNetV3	PACT Online quantitative	608	24.4	8.0ms	280.0ms	31.1	link	link	Configuration File	slim Configuration File
YOLOv3-DarkNet53	baseline	608	238.2	16.0ms	--	39.0	link	link	Configuration File	-
YOLOv3-DarkNet53	Common Online quantitative	608	78.8	12.4ms	--	38.8	link	link	Configuration File	slim Configuration File
SSD-MobileNet_v1	baseline	300	22.5	4.4ms	26.6ms	73.8	link	link	Configuration File	-
SSD-MobileNet_v1	Common Online quantitative	300	7.1	--	21.5ms	72.9	link	link	Configuration File	slim Configuration File
Mask-ResNet50-FPN	baseline	(800, 1333)	174.1	359.5ms	--	39.2/35.6	link	link	Configuration File	-
Mask-ResNet50-FPN	Common Online quantitative	(800, 1333)	--	--	--	39.7(+0.5)/35.9(+0.3)	link	link	Configuration File	slim Configuration File

Description:

The above V100 prediction delay non-quantified model is tested by TensorRT FP32, and the quantified model is tested by TensorRT INT8, and both of them include NMS time.
The SD855 predicts the delay for deployment using PaddleLite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.

Distillation

COCO Benchmark

Model	Compression Strategy	Input Size	Box AP	Download	Model Configuration File	Compression Strategy Configuration File
YOLOv3-MobileNetV1	baseline	608	29.4	link	Configuration File	-
YOLOv3-MobileNetV1	Distillation	608	31.0(+1.6)	link	Configuration File	slimConfiguration File

For the specific distillation method and more distillation detection models, please refer to distill.

Distillation Pruning Combined Strategy

COCO Benchmark

Model	Compression Strategy	Input Size	GFLOPs	Model Volume(MB)	Prediction Delay(SD855)	Box AP	Download	Model Configuration File	Compression Algorithm Configuration File
YOLOv3-MobileNetV1	baseline	608	24.65	94.2	332.0ms	29.4	link	Configuration File	-
YOLOv3-MobileNetV1	Distillation + Tailoring	608	7.54(-69.4%)	30.9(-67.2%)	166.1ms	28.4(-1.0)	link	Configuration File	slimConfiguration File