PPDetection-YOLOv3 模型压缩方案

February 22, 2021 · View on GitHub

本方案使用蒸馏和剪枝两种方法结合对YOLOv3进行压缩。

教程内容参考：https://github.com/PaddlePaddle/PaddleDetection/tree/release/0.2/slim/extensions/distill_pruned_model

示例结果

时延单位均为ms/images
Tesla P4时延为单卡并开启TensorRT推理时延
高通835/高通855/麒麟970时延为使用PaddleLite部署，使用arm8架构并使用4线程(4 Threads)推理时延

骨架网络	数据集	剪裁策略	GFLOPs	模型体积(MB)	输入尺寸	Tesla P4	麒麟970	高通835	高通855
MobileNetV1	VOC	baseline	20.20	93.37	608	16.556	748.404	734.970	289.878
MobileNetV1	VOC	baseline	9.46	93.37	416	9.031	371.214	349.065	140.877
MobileNetV1	VOC	baseline	5.60	93.37	320	6.235	221.705	200.498	80.515
MobileNetV1	VOC	r578	6.15(-69.57%)	30.81(-67.00%)	608	10.064(-39.21%)	314.531(-57.97%)	323.537(-55.98%)	123.414(-57.43%)
MobileNetV1	VOC	r578	2.88(-69.57%)	30.81(-67.00%)	416	5.478(-39.34%)	151.562(-59.17%)	146.014(-58.17%)	56.420(-59.95%)
MobileNetV1	VOC	r578	1.70(-69.57%)	30.81(-67.00%)	320	3.880(-37.77%)	91.132(-58.90%)	87.440(-56.39%)	31.470(-60.91%)

在使用r578剪裁策略下，YOLOv3-MobileNetV1模型减少了69.57%的FLOPs，输入图像尺寸为608时在单卡Tesla P4(TensorRT)推理时间减少39.21%，在麒麟970/高通835/高通855上推理时延分别减少57.97%, 55.98%和57.43%