Keypoint Inference Benchmark

December 23, 2021 · View on GitHub

Benchmark on Server

We tested benchmarks in different runtime environments。 See the table below for details.

ModelCPU + MKLDNN (thread=1)CPU + MKLDNN (thread=4)GPUTensorRT (FP32)TensorRT (FP16)
LiteHRNet-18-256x19288.8 ms40.7 ms4.4 ms2.0 ms1.8 ms
LiteHRNet-18-384x288188.0 ms79.3 ms4.8 ms3.6 ms3.2 ms
LiteHRNet-30-256x192148.4 ms69.0 ms7.1 ms3.1 ms2.8 ms
LiteHRNet-30-384x288309.8 ms133.5 ms8.2 ms6.0 ms5.3 ms
PP-TinyPose-128x9625.2 ms14.1 ms2.7 ms0.9 ms0.8 ms
PP-TinyPose-256x19282.4 ms36.1 ms3.0 ms1.5 ms1.1 ms

Notes:

  • These tests above are based Python deployment.
  • The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6.
  • The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8.
  • The time only includes inference time.
ModelCPU + MKLDNN (thread=1)CPU + MKLDNN (thread=4)GPUTensorRT (FP32)TensorRT (FP16)
DARK_HRNet_w32-256x192363.93 ms97.38 ms4.13 ms3.74 ms1.75 ms
DARK_HRNet_w32-384x288823.71 ms218.55 ms9.44 ms8.91 ms2.96 ms
HRNet_w32-256x192363.67 ms97.64 ms4.11 ms3.71 ms1.72 ms
HRNet_w32-256x256_mpii485.56 ms131.48 ms4.81 ms4.26 ms2.00 ms
HRNet_w32-384x288822.73 ms215.48 ms9.40 ms8.81 ms2.97 ms
PP-TinyPose-128x9624.06 ms13.05 ms2.43 ms0.75 ms0.72 ms
PP-TinyPose-256x19282.73 ms36.25 ms2.57 ms1.38 ms1.15 ms

Notes:

  • These tests above are based C++ deployment.
  • The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6.
  • The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8.
  • The time only includes inference time.

Benchmark on Mobile

We tested benchmarks on Kirin and Qualcomm Snapdragon devices. See the table below for details.

ModelKirin 980 (1-thread)Kirin 980 (4-threads)Qualcomm Snapdragon 845 (1-thread)Qualcomm Snapdragon 845 (4-threads)Qualcomm Snapdragon 660 (1-thread)Qualcomm Snapdragon 660 (4-threads)
PicoDet-s-192x192 (det)14.85 ms5.45 ms17.50 ms7.56 ms80.08 ms27.36 ms
PicoDet-s-320x320 (det)38.09 ms12.00 ms45.26 ms17.07 ms232.81 ms58.68 ms
PP-TinyPose-128x96 (pose)12.03 ms5.09 ms13.14 ms6.73 ms71.87 ms20.04 ms

Notes:

  • These tests above are based Paddle Lite deployment, and version is v2.10-rc.
  • The time only includes inference time.