PPDetection-YOLOv3 模型压缩方案

February 22, 2021 · View on GitHub

本方案使用蒸馏和剪枝两种方法结合对YOLOv3进行压缩。

教程内容参考:https://github.com/PaddlePaddle/PaddleDetection/tree/release/0.2/slim/extensions/distill_pruned_model

示例结果

  • 时延单位均为ms/images
  • Tesla P4时延为单卡并开启TensorRT推理时延
  • 高通835/高通855/麒麟970时延为使用PaddleLite部署,使用arm8架构并使用4线程(4 Threads)推理时延
骨架网络数据集剪裁策略GFLOPs模型体积(MB)输入尺寸Tesla P4麒麟970高通835高通855
MobileNetV1VOCbaseline20.2093.3760816.556748.404734.970289.878
MobileNetV1VOCbaseline9.4693.374169.031371.214349.065140.877
MobileNetV1VOCbaseline5.6093.373206.235221.705200.49880.515
MobileNetV1VOCr5786.15(-69.57%)30.81(-67.00%)60810.064(-39.21%)314.531(-57.97%)323.537(-55.98%)123.414(-57.43%)
MobileNetV1VOCr5782.88(-69.57%)30.81(-67.00%)4165.478(-39.34%)151.562(-59.17%)146.014(-58.17%)56.420(-59.95%)
MobileNetV1VOCr5781.70(-69.57%)30.81(-67.00%)3203.880(-37.77%)91.132(-58.90%)87.440(-56.39%)31.470(-60.91%)
  • 在使用r578剪裁策略下,YOLOv3-MobileNetV1模型减少了69.57%的FLOPs,输入图像尺寸为608时在单卡Tesla P4(TensorRT)推理时间减少39.21%,在麒麟970/高通835/高通855上推理时延分别减少57.97%, 55.98%57.43%