QuantFour_AdamW

February 4, 2024 ยท View on GitHub

Triton does not support thread indexing and so had to move to Cuda for parallelized binary search support with quantization.
Will HIP'ify for AMD support.

This is a productionized implementation of the paper:
"Memory Efficient Optimizers with 4-bit States"
Bingrui Li, Jianfei Chen, Jun Zhu
https://arxiv.org/abs/2309.01507