Pytorch-HarDNet

September 29, 2020 ยท View on GitHub

Harmonic DenseNet: A low memory traffic network (ICCV 2019 paper)

See also CenterNet-HarDNet for Object Detection in 44.3 mAP / 45 fps on COCO dataset

and FC-HarDNet for Semantic Segmentation

  • Fully utilize your cuda cores!
  • Unlike CNN models using a lot of Conv1x1 to reduce model size and number of MACs, HarDNet mainly uses Conv3x3 (with only one Conv1x1 layer for each HarDNet block) to increase the computational density.
  • Increased computational density changes a model from Memory-Bound to Compute-Bound

Architecture

HarDNet Block:

  • k = growth rate (as in DenseNet)
  • m = channel weighting factor (1.6~1.7)
  • Conv3x3 for all layers (no bottleneck layer)
  • Conv-BN-ReLU for all layers intead of BN-ReLU-Conv used in DenseNet
  • See MIPT-Oulu/pytorch_bn_fusion to get rid of BatchNorm for inference.
  • No global dense connection (input of a HarDBlk is NOT reused as a part of output)

HarDNet68/85:

  • Enhanced local feature extraction to benefit the detection of small objects
  • A transitional Conv1x1 layer is employed after each HarDNet block (HarDBlk)

Results

MethodMParamGMACsInference
Time*
ImageNet
Top-1
COCO mAP
with SSD512
HarDNet6817.64.322.5 ms76.531.7
ResNet-5025.64.131.0 ms76.2-
HarDNet8536.79.138.0 ms78.035.1
ResNet-10144.67.851.2 ms78.031.2
VGG-1613815.549 ms73.428.8

* Inference time measured on an NVidia 1080ti with pytorch 1.1.0
300 iteraions of random 1024x1024 input images are averaged.

Results of Depthwise Separable (DS) version of HarDNet

MethodMParamGMACsInference
Time**
ImageNet
Top-1
HarDNet39DS3.50.4432.5 ms72.1
MobileNetV23.50.337.9 ms72.0
HarDNet68DS4.20.852.6 ms74.3
MobileNetV2 1.4x6.10.657.8 ms74.7

** Inference time measured on an NVidia Jetson nano with TensorRT
500 iteraions of random 320x320 input images are averaged.

Train HarDNet models for ImageNet

Training prodedure is branched from https://github.com/pytorch/examples/tree/master/imagenet

Training:

python main.py -a hardnet68 [imagenet-folder with train and val folders]

arch = hardnet39ds | hardnet68ds | hardnet68 | hardnet85

Evaluating:

python main.py -a hardnet68 --pretrained -e [imagenet-folder with train and val folders]

for HarDNet85, please download pretrained weights from here

Hyperparameters

  • epochs 150 ~ 250
  • initial lr = 0.05
  • batch size = 256
  • weight decay = 6e-5
  • cosine learning rate decay
  • nestrov = True