Matrix Multiply

February 8, 2023 ยท View on GitHub

Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA. The performance benefits of each optimization method were simply tested.

CPU

  • naive
  • reordering
  • tiling
  • strassen
  • coppersmith-winograd

Nvidia GPU

  • cublas
  • naive
  • kahan
  • shared_memory

Compile

Environment

  • OS: Linux
  • Cmake Version: >= 3.8
  • GCC Version: >= 4.8
  • CUDA Version: 11.4 (best)
  • CUDA Driver Version: 470.129.06 (best)

Clone

git clone https://github.com/Bruce-Lee-LY/matrix_multiply.git

Build

cd matrix_multiply
./build.sh -t Release -b OFF
./build.sh -t Debug -b ON

Run Sample

./run_sample.sh

Performance

  • OS: Ubuntu 20.04.4
  • CPU: i5-9400F
  • GPU: NVIDIA GeForce GTX 1080 Ti
  • CUDA Version: 11.4
  • CUDA Driver Version: 470.129.06
  • Matrix (float): A (512 * 512) * B (512 * 512) = C (512 * 512)

CPU

MethodCost / ms
naive1238.647
reordering984.445
tiling1000.095
strassen57429.407
coppersmith-winograd77668.238

Nvidia GPU

MethodCost / ms
cublas0.100
naive0.613
kahan0.616
shared_memory0.153