1xN Pattern for Pruning Convolutional Neural Networks (paper) .

July 30, 2022 · View on GitHub

Pytorch implementation of our paper accepted by TPAMI 2022 -- "1xN Pattern for Pruning Convolutional Neural Networks".

1) 1×N Block Pruning

Requirements

  • Python 3.7
  • Pytorch >= 1.0.1
  • CUDA = 10.0.0

Code Running

To reproduce our experiments, please use the following command:

python imagenet.py \
--gpus 0 \
--arch mobilenet_v1 (or mobilenet_v2 or mobilenet_v3_large or mobilenet_v3_small) \
--job_dir ./experiment/ \
--data_path [DATA_PATH] \
--pretrained_model [PRETRAIN_MODEL_PATH] \
--pr_target 0.5 \
--N 4 (or 2, 8, 16, 32) \
--conv_type BlockL1Conv \
--train_batch_size 256 \
--eval_batch_size 256 \
--rearrange \

The pre-trained models can be downloaded at MobileNet-V1, MobileNet-V2, MobileNet-V3-Large, MobileNet-V3-Small and ResNet-50.

Accuracy Performance

Table 1: Performance comparison of our 1×N block sparsity against weight pruning and filter pruning (p = 50%).

MobileNet-V1Top-1 Acc.Top-5 Acc.Model Link
Weight Pruning70.76489.592Pruned Model
Filter Pruning65.34886.264Pruned Model
1 x 2 Block70.28189.370Pruned Model
1 x 4 Block70.05289.056Pruned Model
1 x 8 Block69.90889.027Pruned Model
1 x 16 Block69.55988.933Pruned Model
1 x 32 Block69.54188.801Pruned Model
MobileNet-V2Top-1 Acc.Top-5 Acc.Model Link
Weight Pruning71.14689.872Pruned Model
Filter Pruning66.73087.190Pruned Model
1 x 2 Block70.23389.417Pruned Model
1 x 4 Block60.70689.165Pruned Model
1 x 8 Block69.37288.862Pruned Model
1 x 16 Block69.35288.708Pruned Model
1 x 32 Block68.76288.425Pruned Model
MobileNet-V3-smallTop-1 Acc.Top-5 Acc.Model Link
Weight Pruning66.37686.868Pruned Model
Filter Pruning59.05481.713Pruned Model
1 x 2 Block65.38086.060Pruned Model
1 x 4 Block64.46585.495Pruned Model
1 x 8 Block64.10185.274Pruned Model
1 x 16 Block63.12684.203Pruned Model
1 x 32 Block62.88183.982Pruned Model
MobileNet-V3-largeTop-1 Acc.Top-5 Acc.Model Link
Weight Pruning72.89791.093Pruned Model
Filter Pruning69.13789.097Pruned Model
1 x 2 Block72.12090.677Pruned Model
1 x 4 Block71.93590.458Pruned Model
1 x 8 Block71.47890.163Pruned Model
1 x 16 Block71.11290.129Pruned Model
1 x 32 Block70.76989.696Pruned Model

Besides, we provide the raw data for plotting the above figures in ./raw_data_fig4. For example, run python ./raw_data_fig4/resnet50_top1.py to plot top-1 accuracy of ResNet-50 pruned by different methods.

More links for pruned models under different pruning rates and their training logs can be found in MobileNet-V2 and ResNet-50.

Table 2: Performance studies of our 1×N pruning with kernel-wise pruning.

ResNet-50Top-1 Acc.Top-5 Acc.Model Link
1x4 Block76.50693.239Pruned Model
kernel (random)74.83492.178Pruned Model
kernel (1\ell_1)75.37092.582Pruned Model

Evaluate our models

To verify the performance of our pruned models, download our pruned models from the links provided above and run the following command:

python imagenet.py \
--gpus 0 \
--arch mobilenet_v1 (or mobilenet_v2 or mobilenet_v3_large or mobilenet_v3_small) \
--data_path [DATA_PATH] \
--conv_type DenseConv \
--evaluate [PRUNED_MODEL_PATH] \
--eval_batch_size 256 \

Arguments

optional arguments:
  -h, --help            show this help message and exit
  --gpus                Select gpu_id to use. default:[0]
  --data_path           The dictionary where the data is stored.
  --job_dir             The directory where the summaries will be stored.
  --resume              Load the model from the specified checkpoint.
  --pretrain_model      Path of the pre-trained model.
  --pruned_model        Path of the pruned model to evaluate.
  --arch                Architecture of model. For ImageNet :mobilenet_v1, mobilenet_v2, mobilenet_v3_small, mobilenet_v3_large
  --num_epochs          The num of epochs to train. default:180
  --train_batch_size    Batch size for training. default:256
  --eval_batch_size     Batch size for validation. default:100
  --momentum            Momentum for Momentum Optimizer. default:0.9
  --lr LR               Learning rate. default:1e-2
  --lr_decay_step       The iterval of learn rate decay for cifar. default:100 150
  --lr_decay_freq       The frequecy of learn rate decay for Imagenet. default:30
  --weight_decay        The weight decay of loss. default:4e-5
  --lr_type             lr scheduler. default: cos. optional:exp/cos/step/fixed
  --use_dali            If this parameter exists, use dali module to load ImageNet data (benefit in training acceleration).
  --conv_type           Importance criterion of filters. Default: BlockL1Conv. optional: BlockRandomConv, DenseConv
  --pr_target           Pruning rate. default:0.5
  --full                If this parameter exists, prune fully-connected layer.
  --N                   Consecutive N kernels for removal (see paper for details).
  --rearrange           If this parameter exists, filters will be rearranged (see paper for details).
  --export_onnx         If this parameter exists, export onnx model.

2)Filter Rearrangement

Table 2: Performance studies of our 1×N block sparsity with and without filter rearrangement (p=50%).

N = 2Top-1 Acc.Top-5 Acc.Model Link
w/o Rearange69.90089.296Pruned Model
Rearrange70.23389.417Pruned Model
N = 4Top-1 Acc.Top-5 Acc.Model Link
w/o Rearange69.52188.920Pruned Model
Rearrange69.57988.944Pruned Model
N = 8Top-1 Acc.Top-5 Acc.Model Link
w/o Rearange69.20688.608Pruned Model
Rearrange69.37288.862Pruned Model
N = 16Top-1 Acc.Top-5 Acc.Model Link
w/o Rearange68.97188.399Pruned Model
Rearrange69.35288.708Pruned Model
N = 32Top-1 Acc.Top-5 Acc.Model Link
w/o Rearange68.43188.315Pruned Model
Rearrange68.76288.425Pruned Model

3)Encoding and Decoding Efficiency

Performance and latency comparison

Our sparse convolution implementation has been released to TVM community.

To verify the performance of our pruned models, convert onnx model and run the following command:

python model_tune.py \
--onnx_path [ONNX_MODEL_PATH] \
--bsr 4 \
--bsc 1 \
--sparsity 0.5

The detail tuning setting is referred to TVM.

4)Contact

Any problem regarding this code re-implementation, please contact the first author: lmbxmu@stu.xmu.edu.cn or the second author: yuxinzhang@stu.xmu.edu.cn.

Any problem regarding the sparse convolution implementation, please contact the third author: xiamenlyc@gmail.com.