Profiling Results

July 16, 2024 ยท View on GitHub

ModelDeviceNUM_THREADSllama.cpp (CPU) (tokens/sec)T-MAC (CPU)
BitNet-3BM2-Ultra16.4922.08
BitNet-3BM2-Ultra422.0954.46
Llama-2-7B (W2)M2-Ultra13.8216.68
Llama-2-7B (W2)M2-Ultra822.0651.01
Llama-2-7B (W4)M2-Ultra15.658.97
Llama-2-7B (W4)M2-Ultra831.5735.65
BitNet-3BAGX Orin11.628.18
BitNet-3BAGX Orin1212.3426.02
Llama-2-7B (W2)AGX Orin10.794.36
Llama-2-7B (W2)AGX Orin127.0815.62
Llama-2-7B (W4)AGX Orin11.042.46
Llama-2-7B (W4)AGX Orin127.428.09
BitNet-3BRaspberry Pi 511.378.03
BitNet-3BRaspberry Pi 522.7111.09
Llama-2-7B (W2)Raspberry Pi 510.664.40
Llama-2-7B (W2)Raspberry Pi 521.315.92
Llama-2-7B (W4)Raspberry Pi 510.852.42
Llama-2-7B (W4)Raspberry Pi 521.633.35
BitNet-3BSurface Book 315.6512.65
BitNet-3BSurface Book 3414.8528.60
Llama-2-7B (W2)Surface Book 312.706.77
Llama-2-7B (W2)Surface Book 347.5016.82
Llama-2-7B (W4)Surface Book 312.503.74
Llama-2-7B (W4)Surface Book 346.529.34