APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy

June 16, 2020 · View on GitHub

@inproceedings{Wang2020APQ,
  title={APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy},
  author={Tianzhe Wang and Kuan Wang and Han Cai and Ji Lin and Zhijian Liu and Song Han},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Overview

We release the PyTorch code for the APQ. [Paper|Video|Competition]:

Jointly Search for Optimal Model

Save Orders of Magnitude Searching Cost

Better Performance than Sequential Design

How to Use

Prerequisites

- Pytorch version >= 1.0
- Python version >= 3.6
- Progress >= 1.5
- For getting new models, you'll need the NVIDIA GPU

Dataset and Model Preparation

Codebase Structure

apq
- dataset (imagenet data path)
- elastic_nn (super network builder , w/ or w/o quantization)
    - modules (define the layers, w/ or w/o quantization)
    - networks (define the networks, w/ or w/o quantization)
    utils.py (some utility functions for elastic_nn folder)
- models (quantzation-aware predictor and once-for-all network checkpoint path)
- imagenet_codebase (training codebase for imagenet)
- lut (latency lookup table path)
- methods (methods to find the mixed-precision network)
    - evolution (evolution search code)
- utils (some utility functions, including converter)
    accuracy_predictor.py (construction of accuracy predictor)
    latency_predictor.py (construction of latency predictor)
    converter.py (encode a subnetwork in to 1-hot vector)
    quant-aware.py (code for quantization-aware training)
main.py
Readme.md

Testing

For instance, if you want to test the model under exps/test folder.

Run the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py \
    --exp_dir=exps/test

You will get the exact information (latency/energy) running on BitFusion platform and ImageNet Top-1 accuracy.

Example

For instance, if you want to search a model under 12.80ms latency constraint.

Run the following command:

CUDA_VISIBLE_DEVICES=0 python search.py \
    --mode=evolution \
    --acc_predictor_dir=models \
    --exp_name=test \
    --constraint=12.80 \
    --type=latency

You will get the candidate under the resource constraints (latency or energy), which is stored in exps/test folder.

Quantization-aware finetune on imagenet

For instance, if you want to quantization-aware finetuning for the model under exps/test folder.

Run the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3 python quant_aware.py \
    --exp_name=test

You will get a mixed-precision model under the resource constraints (latency or energy) with considerable performance.

Models

We provide the checkpoints for our APQ reported in the paper:

LatencyEnergyBitOpsAccuracyModel
6.11ms9.14mJ12.7G72.8%download
8.45ms11.81mJ14.6G73.8%download
8.40ms12.18mJ16.5G74.1%download
12.17ms14.14mJ23.6G75.1%download

You can download the models and put it into exps folder to test the performance. Note that the bold item means the search under that constraint.

Once for All: Train One Network and Specialize it for Efficient Deployment (ICLR'20, code)

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)

HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)

Defenstive Quantization: When Efficiency Meets Robustness (ICLR'19)