4-bit Shampoo
October 29, 2024 · View on GitHub
Preparation
-
Requirements:
For running python code: pytorch+torchvision+timm+numpy.
For compiling C code: gcc+nvcc+cmake (Linux), msvc+nvcc+cmake (Windows).
-
Compilation of the dynamic library used for quantization:
See "README.md" in path "./cudaC_python for the compilation process.
After compilation, move the dynamic library to path "./qtensor".
Usage
File "main_demo.py" shows the basic usage of our optimizer codes, and the vision models used in our paper.
File "shampoo1.py" in path "./optimizers" implements naive 4-bit Shampoo.
File "shampoo2.py" in path "./optimizers" implements our 4-bit Shampoo.
Results
(a) Swin-Tiny on CIFAR-100:
(b) ViT-Base/32 on ImageNet-1k:
Citation
@article{Wang_NeurIPS_2024,
author = {Sike Wang and Pan Zhou and Jia Li and Hua Huang},
title = {4-bit {Shampoo} for Memory-Efficient Network Training},
journal = {Advances in Neural Information Processing Systems},
year = {2024},
url = {https://arxiv.org/abs/2405.18144},
}