README_en.md

January 28, 2023 · View on GitHub

简体中文 | English

A High-Efficient Development Toolkit for Object Detection based on PaddlePaddle

Product Update

  • 🔥 2022.11.15:SOTA rotated object detector and small object detector based on PP-YOLOE

    • Rotated object detector PP-YOLOE-R
      • SOTA Anchor-free rotated object detection model with high accuracy and efficiency
      • A series of models, named s/m/l/x, for cloud and edge devices
      • Avoiding using special operators to be deployed friendly with TensorRT.
    • Small object detector PP-YOLOE-SOD
      • End-to-end detection pipeline based on sliced images
      • SOTA model on VisDrone based on original images.
  • 2022.8.26:PaddleDetection releasesrelease/2.5 version

    • 🗳 Model features:

      • Release PP-YOLOE+: Increased accuracy by a maximum of 2.4% mAP to 54.9% mAP, 3.75 times faster model training convergence rate, and up to 2.3 times faster end-to-end inference speed; improved generalization for multiple downstream tasks
      • Release PicoDet-NPU model which supports full quantization deployment of models; add PicoDet layout analysis model
      • Release PP-TinyPose Plus. With 9.1% AP accuracy improvement in physical exercise, dance, and other scenarios, our PP-TinyPose Plus supports unconventional movements such as turning to one side, lying down, jumping, and high lifts
    • 🔮 Functions in different scenarios

      • Release the pedestrian analysis tool PP-Human v2. It introduces four new behavior recognition: fighting, telephoning, smoking, and trespassing. The underlying algorithm performance is optimized, covering three core algorithm capabilities: detection, tracking, and attributes of pedestrians. Our model provides end-to-end development and model optimization strategies for beginners and supports online video streaming input.
      • First release PP-Vehicle, which has four major functions: license plate recognition, vehicle attribute analysis (color, model), traffic flow statistics, and violation detection. It is compatible with input formats, including pictures, online video streaming, and video. And we also offer our users a comprehensive set of tutorials for customization.
    • 💡 Cutting-edge algorithms:

      • Release PaddleYOLO which overs classic and latest models of YOLO family: YOLOv3, PP-YOLOE (a real-time high-precision object detection model developed by Baidu PaddlePaddle), and cutting-edge detection algorithms such as YOLOv4, YOLOv5, YOLOX, YOLOv6, YOLOv7 and YOLOv8
      • Newly add high precision detection model based on ViT backbone network, with a 55.7% mAP accuracy on COCO dataset; newly add multi-object tracking model OC-SORT; newly add ConvNeXt backbone network.
    • 📋 Industrial applications: Newly add Smart Fitness, Fighting recognition, and Visitor Analysis.

  • 2022.3.24:PaddleDetection releasedrelease/2.4 version

    • Release high-performanace SOTA object detection model PP-YOLOE. It integrates cloud and edge devices and provides S/M/L/X versions. In particular, Verson L has the accuracy as 51.4% on COCO test 2017 dataset, inference speed as 78.1 FPS on a single Test V100. It supports mixed precision training, 33% faster than PP-YOLOv2. Its full range of multi-sized models can meet different hardware arithmetic requirements, and adaptable to server, edge-device GPU and other AI accelerator cards on servers.
    • Release ultra-lightweight SOTA object detection model PP-PicoDet Plus with 2% improvement in accuracy and 63% improvement in CPU inference speed. Add PicoDet-XS model with a 0.7M parameter, providing model sparsification and quantization functions for model acceleration. No specific post processing module is required for all the hardware, simplifying the deployment.
    • Release the real-time pedestrian analysis tool PP-Human. It has four major functions: pedestrian tracking, visitor flow statistics, human attribute recognition and falling detection. For falling detection, it is optimized based on real-life data with accurate recognition of various types of falling posture. It can adapt to different environmental background, light and camera angle.
    • Add YOLOX object detection model with nano/tiny/S/M/L/X. X version has the accuracy as 51.8% on COCO Val2017 dataset.
  • More releases

Brief Introduction

PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle. Providing over 30 model algorithm and over 300 pre-trained models, it covers object detection, instance segmentation, keypoint detection, multi-object tracking. In particular, PaddleDetection offers high- performance & light-weight industrial SOTA models on servers and mobile devices, champion solution and cutting-edge algorithm. PaddleDetection provides various data augmentation methods, configurable network components, loss functions and other advanced optimization & deployment schemes. In addition to running through the whole process of data processing, model development, training, compression and deployment, PaddlePaddle also provides rich cases and tutorials to accelerate the industrial application of algorithm.

Features

  • Rich model library: PaddleDetection provides over 250 pre-trained models including object detection, instance segmentation, face recognition, multi-object tracking. It covers a variety of global competition champion schemes.
  • Simple to use: Modular design, decoupling each network component, easy for developers to build and try various detection models and optimization strategies, quick access to high-performance, customized algorithm.
  • Getting Through End to End: PaddlePaddle gets through end to end from data augmentation, constructing models, training, compression, depolyment. It also supports multi-architecture, multi-device deployment for cloud and edge device.
  • High Performance: Due to the high performance core, PaddlePaddle has clear advantages in training speed and memory occupation. It also supports FP16 training and multi-machine training.

Exchanges

  • If you have any question or suggestion, please give us your valuable input via GitHub Issues

    Welcome to join PaddleDetection user groups on WeChat (scan the QR code, add and reply "D" to the assistant)

Kit Structure

Architectures Backbones Components Data Augmentation
    Object Detection
    • Faster RCNN
    • FPN
    • Cascade-RCNN
    • PSS-Det
    • RetinaNet
    • YOLOv3
    • YOLOF
    • YOLOX
    • YOLOv5
    • YOLOv6
    • YOLOv7
    • YOLOv8
    • RTMDet
    • PP-YOLO
    • PP-YOLO-Tiny
    • PP-PicoDet
    • PP-YOLOv2
    • PP-YOLOE
    • PP-YOLOE+
    • PP-YOLOE-SOD
    • PP-YOLOE-R
    • SSD
    • CenterNet
    • FCOS
    • FCOSR
    • TTFNet
    • TOOD
    • GFL
    • GFLv2
    • DETR
    • Deformable DETR
    • Swin Transformer
    • Sparse RCNN
    Instance Segmentation
    • Mask RCNN
    • Cascade Mask RCNN
    • SOLOv2
    Face Detection
    • BlazeFace
    Multi-Object-Tracking
    • JDE
    • FairMOT
    • DeepSORT
    • ByteTrack
    • OC-SORT
    • BoT-SORT
    • CenterTrack
    KeyPoint-Detection
    • HRNet
    • HigherHRNet
    • Lite-HRNet
    • PP-TinyPose
Details
  • ResNet(&vd)
  • Res2Net(&vd)
  • CSPResNet
  • SENet
  • Res2Net
  • HRNet
  • Lite-HRNet
  • DarkNet
  • CSPDarkNet
  • MobileNetv1/v3
  • ShuffleNet
  • GhostNet
  • BlazeNet
  • DLA
  • HardNet
  • LCNet
  • ESNet
  • Swin-Transformer
  • ConvNeXt
  • Vision Transformer
Common
  • Sync-BN
  • Group Norm
  • DCNv2
  • EMA
KeyPoint
  • DarkPose
FPN
  • BiFPN
  • CSP-PAN
  • Custom-PAN
  • ES-PAN
  • HRFPN
Loss
  • Smooth-L1
  • GIoU/DIoU/CIoU
  • IoUAware
  • Focal Loss
  • CT Focal Loss
  • VariFocal Loss
Post-processing
  • SoftNMS
  • MatrixNMS
Speed
  • FP16 training
  • Multi-machine training
Details
  • Resize
  • Lighting
  • Flipping
  • Expand
  • Crop
  • Color Distort
  • Random Erasing
  • Mixup
  • AugmentHSV
  • Mosaic
  • Cutmix
  • Grid Mask
  • Auto Augment
  • Random Perspective

Model Performance

Performance comparison of Cloud models

The comparison between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones.

Clarification:

  • ViT stands for ViT-Cascade-Faster-RCNN, which has highest mAP on COCO as 55.7%
  • Cascade-Faster-RCNNstands for Cascade-Faster-RCNN-ResNet50vd-DCN, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% in PaddleDetection models
  • PP-YOLOE are optimized PP-YOLO v2. It reached accuracy as 51.4% on COCO dataset, inference speed as 78.1 FPS on Tesla V100
  • PP-YOLOE+ are optimized PP-YOLOE. It reached accuracy as 53.3% on COCO dataset, inference speed as 78.1 FPS on Tesla V100
  • The models in the figure are available in the model library
Performance omparison on mobiles

The comparison between COCO mAP and FPS on Qualcomm Snapdragon 865 processor of models on mobile devices.

Clarification:

  • Tests were conducted on Qualcomm Snapdragon 865 (4 *A77 + 4 *A55) batch_size=1, 4 thread, and NCNN inference library, test script see MobileDetBenchmark
  • PP-PicoDet and PP-YOLO-Tiny are self-developed models of PaddleDetection, and other models are not tested yet.

Model libraries

1. General detection
ModelCOCO Accuracy(mAP)V100 TensorRT FP16 Speed(FPS)ConfigurationDownload
PP-YOLOE+_s43.9333.3linkdownload
PP-YOLOE+_m50.0208.3linkdownload
PP-YOLOE+_l53.3149.2linkdownload
PP-YOLOE+_x54.995.2linkdownload
ModelCOCO Accuracy(mAP)Snapdragon 865 four-thread speed (ms)ConfigurationDownload
PicoDet-XS23.57.81LinkDownload
PicoDet-S29.19.56LinkDownload
PicoDet-M34.417.68LinkDownload
PicoDet-L36.125.21LinkDownload

Frontier detection algorithm

ModelCOCO Accuracy(mAP)V100 TensorRT FP16 speed(FPS)ConfigurationDownload
YOLOX-l50.1107.5LinkDownload
YOLOv5-l48.6136.0LinkDownload
YOLOv7-l51.0135.0链接下载地址

Other general purpose models doc

2. Instance segmentation
ModelIntroductionRecommended ScenariosCOCO Accuracy(mAP)ConfigurationDownload
Mask RCNNTwo-stage instance segmentation algorithm
Edge-Cloud end
box AP: 41.4
mask AP: 37.5
LinkDownload
Cascade Mask RCNNTwo-stage instance segmentation algorithm
Edge-Cloud end
box AP: 45.7
mask AP: 39.7
LinkDownload
SOLOv2Lightweight single-stage instance segmentation algorithm
Edge-Cloud end
mask AP: 38.0LinkDownload
3. Keypoint detection
ModelIntroductionRecommended scenariosCOCO Accuracy(AP)SpeedConfigurationDownload
HRNet-w32 + DarkPose
Top-down Keypoint detection algorithm
Input size: 384x288
Edge-Cloud end
78.3T4 TensorRT FP16 2.96msLinkDownload
HRNet-w32 + DarkPoseTop-down Keypoint detection algorithm
Input size: 256x192
Edge-Cloud end78.0T4 TensorRT FP16 1.75msLinkDownload
PP-TinyPoseLight-weight keypoint algorithm
Input size: 256x192
Mobile68.8Snapdragon 865 four-thread 6.30msLinkDownload
PP-TinyPoseLight-weight keypoint algorithm
Input size: 128x96
Mobile58.1Snapdragon 865 four-thread 2.37msLinkDownload

Other keypoint detection models doc

4. Multi-object tracking PP-Tracking
ModelIntroductionRecommended scenariosAccuracyConfigurationDownload
ByteTrackSDE Multi-object tracking algorithm with detection model onlyEdge-Cloud endMOT-17 half val: 77.3LinkDownload
FairMOTJDE multi-object tracking algorithm multi-task learningEdge-Cloud endMOT-16 test: 75.0LinkDownload
OC-SORTSDE multi-object tracking algorithm with detection model onlyEdge-Cloud endMOT-16 half val: 75.5Link-

Other multi-object tracking models docs

5. Industrial real-time pedestrain analysis tool-PP Human
TaskEnd-to-End Speed(ms)ModelSize
Pedestrian detection (high precision)25.1msMulti-object tracking182M
Pedestrian detection (lightweight)16.2msMulti-object tracking27M
Pedestrian tracking (high precision)31.8msMulti-object tracking182M
Pedestrian tracking (lightweight)21.0msMulti-object tracking27M
Attribute recognition (high precision)Single person8.5msObject detection
Attribute recognition
Object detection:182M
Attribute recognition:86M
Attribute recognition (lightweight)Single person 7.1msObject detection
Attribute recognition
Object detection:182M
Attribute recognition:86M
Falling detectionSingle person 10msMulti-object tracking
Keypoint detection
Behavior detection based on key points
Multi-object tracking:182M
Keypoint detection:101M
Behavior detection based on key points: 21.8M
Intrusion detection31.8msMulti-object tracking182M
Fighting detection19.7msVideo classification90M
Smoking detectionSingle person 15.1msObject detection
Object detection based on Human Id
Object detection:182M
Object detection based on Human ID: 27M
Phoning detectionSingle person msObject detection
Image classification based on Human ID
Object detection:182M
Image classification based on Human ID:45M

Please refer to docs for details.

6. Industrial real-time vehicle analysis tool-PP Vehicle
TaskEnd-to-End Speed(ms)ModelSize
Vehicle detection (high precision)25.7msobject detection182M
Vehicle detection (lightweight)13.2msobject detection27M
Vehicle tracking (high precision)40msmulti-object tracking182M
Vehicle tracking (lightweight)25msmulti-object tracking27M
Plate Recognition4.68msplate detection
plate recognition
Plate detection:3.9M
Plate recognition:12M
Vehicle attribute7.31msattribute recognition7.2M

Please refer to docs for details.

Document tutorials

Introductory tutorials

Advanced tutorials

Courses

  • [Theoretical foundation] Object detection 7-day camp: Overview of object detection tasks, details of RCNN series object detection algorithm and YOLO series object detection algorithm, PP-YOLO optimization strategy and case sharing, introduction and practice of AnchorFree series algorithm

  • [Industrial application] AI Fast Track industrial object detection technology and application: Super object detection algorithms, real-time pedestrian analysis system PP-Human, breakdown and practice of object detection industrial application

  • [Industrial features] 2022.3.26 Smart City Industry Seven-Day Class : Urban planning, Urban governance, Smart governance service, Traffic management, community governance.

  • [Academic exchange] 2022.9.27 YOLO Vision Event: As the first YOLO-themed event, PaddleDetection was invited to communicate with the experts in the field of Computer Vision around the world.

Industrial tutorial examples

Applications

Version updates

Please refer to the Release note for more details about the updates

License

PaddlePaddle is provided under the Apache 2.0 license

Contribute your code

We appreciate your contributions and your feedback!

  • Thank Mandroide for code cleanup and
  • Thank FL77N for Sparse-RCNNmodel
  • Thank Chen-Song for Swin Faster-RCNNmodel
  • Thank yangyudong, hchhtc123 for developing PP-Tracking GUI interface
  • Thank Shigure19 for developing PP-TinyPose fitness APP
  • Thank manangoel99 for Wandb visualization methods

Quote

@misc{ppdet2019,
title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.},
author={PaddlePaddle Authors},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}},
year={2019}
}