Benchmark Performance

August 16, 2023 ยท View on GitHub

Performance on Nvidia GPU

ModelPrecisionDeviceGPU VRAMSpeed (tokens/sec)load time (s)
Llama-2-7b-chat-hf16 bit
Llama-2-7b-chat-hf8bitNVIDIA RTX 2080 Ti7.7 GB VRAM3.76641.36
Llama-2-7b-Chat-GPTQ4bitNVIDIA RTX 2080 Ti5.8 GB VRAM18.85192.91
Llama-2-7b-Chat-GPTQ4bitNVIDIA GTX 1660 Super4.8 GB VRAM8.5262.74
Llama-2-7b-Chat-GPTQ4 bitGoogle Colab T45.8 GB VRAM18.1937.44
Llama-2-13b-chat-hf16 bit

Performance on CPU / OpenBLAS / cuBLAS / CLBlast / Metal

ModelPrecisionDeviceRAM / GPU VRAMSpeed (tokens/sec)load time (s)
llama-2-7b-chat.ggmlv3.q2_K2 bitIntel i7-87004.5 GB RAM7.8831.90
llama-2-7b-chat.ggmlv3.q2_K2 bitApple M2 CPU4.5 GB RAM11.100.10
llama-2-7b-chat.ggmlv3.q2_K2 bitApple M2 Metal4.5 GB RAM12.100.12
llama-2-7b-chat.ggmlv3.q4_04 bitIntel i7-87005.4 GB RAM6.27173.15
llama-2-7b-chat.ggmlv3.q4_04 bitIntel i7-97004.8 GB RAM4.287.9
llama-2-7b-chat.ggmlv3.q4_04 bitApple M1 Pro CPU5.4 GB RAM17.900.18
llama-2-7b-chat.ggmlv3.q4_04 bitApple M2 CPU5.4 GB RAM13.700.13
llama-2-7b-chat.ggmlv3.q4_04 bitApple M2 Metal5.4 GB RAM12.600.10
llama-2-7b-chat.ggmlv3.q4_04 bitAMD Ryzen 9 5900HS4.1 GB RAM6.010.15
llama-2-7b-chat.ggmlv3.q4_04 bitIntel vServer 4 threads, eth services8 GB RAM1.310.5
llama-2-7b-chat.ggmlv3.q8_08 bitIntel i7-87008.6 GB RAM2.63336.57
llama-2-7b-chat.ggmlv3.q8_08 bitIntel i7-97007.6 GB RAM2.05302.9