Small Integer Matrix Multiply

March 23, 2023 ยท View on GitHub

.. code::

                                      _
                                     | |
  __ _  ___ _ __ ___  _ __ ___   ___ | | ___   __ _ _   _
 / _` |/ _ \ '_ ` _ \| '_ ` _ \ / _ \| |/ _ \ / _` | | | |
| (_| |  __/ | | | | | | | | | | (_) | | (_) | (_| | |_| |
 \__, |\___|_| |_| |_|_| |_| |_|\___/|_|\___/ \__, |\__, |
  __/ |                                        __/ | __/ |
 |___/                                        |___/ |___/

                                                version 0.1

Small Integer Matrix Multiply

Gemmology is a rewrite of intgemm <https://github.com/kpu/intgemm>_ with a focus on 8-bit integer matrix multiplication and using xsimd <https://github.com/QuantStack/xsimd>_ as an abstract vector instructrion set when possible.

The original algorithm and API are left mostly untouched, appart from a few namespace changes.

Usage

Gemmology consists in a single header file, just drop it in your project to use it, then mostly follow intgemm_ API:

.. code:: c++

#include "gemmology.h"

float alpha = 25;
float quant_mult = 127/alpha;
gemmology::Shift::PrepareA(A, A_prepared, quant_mult, A_rows, width);
gemmology::PrepareB(B, B_prepared, quant_mult, width, B_cols);

/* Prepare the bias (inplace) */
float unquant_mult_forprep = (-1)*(alpha)*(alpha)/(127.0f);
gemmology::Shift::PrepareBias(B_prepared, width, B_cols,
                              callbacks::UnquantizeAndAddBiasAndWrite(unquant_mult_forprep, inputBias, inputBias));
/* Multiply */
gemmology::Shift::Multiply(A_prepared, B_prepared, A_rows, width, B_cols,
                           callbacks::UnquantizeAndAddBiasAndWrite(unquant_mult_forprep, bias, C));

Difference with intgemm_

Gemmology only handles quantized matrix of 8-bit integers.

Gemmology provides an SSE2 implementation of the original algorithm, while intgemm_ stops at SSSE3. The SSE2 version is roughly 2.5 slower than the SSSE3 version.

Gemmology provides a suboptimal implementation using NEON instructions for arm32 and aarch64.

All Gemmology functions are parametrized by a target architecture (e.g. xsimd::sse4_2) which is set to the best available at compile time. It's up to the user to handle the dynamic dispatch (eventually using xsimd generic mechanism <https://xsimd.readthedocs.io/en/latest/api/dispatching.html>_ to do so.

Testing

All tests lie in the test directory, a sample test invocation (provided xsimd_ and sde64 <https://www.intel.fr/content/www/fr/fr/download/684897/intel-software-development-emulator.html>_ are available on your system.

.. code::

make -C test XSIMD_INCLUDE_DIR=/source/xsimd/include SDE64=/Downloads/sde-external-9.14.0-2022-10-25-lin/sde64

Acknowledgments

This is really mostly a portage of intgemm_ to xsimd. So big thanks to intgemm authors for the original work.