GGML Low-Level API Documentation
November 2, 2025 ยท View on GitHub
This document describes the GGML (Georgi Gerganov Machine Learning) low-level tensor library bindings available in gollama.cpp.
Overview
GGML is the tensor library that powers llama.cpp. While most users will interact with the high-level llama.cpp API (gollama.go), the GGML bindings (goggml.go) provide direct access to low-level tensor operations and backend management for advanced use cases.
Important Note: GGML functions may not be exported in all llama.cpp builds. The library gracefully handles missing functions without errors, allowing code to compile and run even when GGML symbols are not available.
When to Use GGML Bindings
Use GGML bindings when you need:
- Type Information: Query tensor type sizes, block sizes, or quantization status
- Backend Management: Enumerate available compute backends (CPU, GPU, etc.)
- Memory Management: Direct buffer allocation and management
- Quantization: Access to low-level quantization utilities
- Advanced Integration: Building custom tensor operations or tools
For most LLM inference tasks, use the high-level llama.cpp API in gollama.go.
Available Features
Tensor Types (31 types)
GGML supports various data types for tensors:
Floating Point Types
GGML_TYPE_F32- 32-bit float (4 bytes)GGML_TYPE_F16- 16-bit float (2 bytes)GGML_TYPE_F64- 64-bit float (8 bytes)GGML_TYPE_BF16- BFloat16 (2 bytes)
Integer Types
GGML_TYPE_I8- 8-bit integer (1 byte)GGML_TYPE_I16- 16-bit integer (2 bytes)GGML_TYPE_I32- 32-bit integer (4 bytes)GGML_TYPE_I64- 64-bit integer (8 bytes)
Quantized Types (K-quants)
GGML_TYPE_Q4_0,GGML_TYPE_Q4_1- 4-bit quantizationGGML_TYPE_Q5_0,GGML_TYPE_Q5_1- 5-bit quantizationGGML_TYPE_Q8_0,GGML_TYPE_Q8_1- 8-bit quantizationGGML_TYPE_Q2_K- 2-bit K-quantGGML_TYPE_Q3_K- 3-bit K-quantGGML_TYPE_Q4_K- 4-bit K-quantGGML_TYPE_Q5_K- 5-bit K-quantGGML_TYPE_Q6_K- 6-bit K-quantGGML_TYPE_Q8_K- 8-bit K-quant
Importance Quantization (IQ)
GGML_TYPE_IQ1_S,GGML_TYPE_IQ1_M- 1-bit importance quantizationGGML_TYPE_IQ2_XXS,GGML_TYPE_IQ2_XS,GGML_TYPE_IQ2_S- 2-bit IQ variantsGGML_TYPE_IQ3_XXS,GGML_TYPE_IQ3_S- 3-bit IQ variantsGGML_TYPE_IQ4_NL,GGML_TYPE_IQ4_XS- 4-bit IQ variants
API Functions
Type Utilities
Ggml_type_size
func Ggml_type_size(typ GgmlType) (uint64, error)
Returns the size in bytes of a GGML type element.
Example:
size, err := gollama.Ggml_type_size(gollama.GGML_TYPE_F32)
if err != nil {
log.Fatal(err)
}
fmt.Printf("F32 size: %d bytes\n", size) // Output: F32 size: 4 bytes
Ggml_blck_size
func Ggml_blck_size(typ GgmlType) (int32, error)
Returns the block size of a GGML type (relevant for quantized types).
Ggml_type_is_quantized
func Ggml_type_is_quantized(typ GgmlType) (bool, error)
Returns whether a GGML type is quantized.
Example:
isQuantized, err := gollama.Ggml_type_is_quantized(gollama.GGML_TYPE_Q4_0)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Q4_0 is quantized: %v\n", isQuantized) // Output: Q4_0 is quantized: true
Ggml_type_name
func Ggml_type_name(typ GgmlType) (string, error)
Returns the string name of a GGML type.
Example:
name, err := gollama.Ggml_type_name(gollama.GGML_TYPE_F32)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Type name: %s\n", name) // Output: Type name: f32
Backend Device Management
Ggml_backend_dev_count
func Ggml_backend_dev_count() (uint64, error)
Returns the number of available backend devices.
Example:
count, err := gollama.Ggml_backend_dev_count()
if err != nil {
log.Fatal(err)
}
fmt.Printf("Found %d backend device(s)\n", count)
Ggml_backend_dev_get
func Ggml_backend_dev_get(index uint64) (GgmlBackendDevice, error)
Returns a backend device by index.
Ggml_backend_dev_name
func Ggml_backend_dev_name(device GgmlBackendDevice) (string, error)
Returns the name of a backend device.
Example:
count, _ := gollama.Ggml_backend_dev_count()
for i := uint64(0); i < count; i++ {
dev, err := gollama.Ggml_backend_dev_get(i)
if err != nil {
continue
}
name, err := gollama.Ggml_backend_dev_name(dev)
if err != nil {
continue
}
fmt.Printf("Device %d: %s\n", i, name)
}
Ggml_backend_dev_description
func Ggml_backend_dev_description(device GgmlBackendDevice) (string, error)
Returns the description of a backend device.
Ggml_backend_dev_memory
func Ggml_backend_dev_memory(device GgmlBackendDevice) (free uint64, total uint64, err error)
Returns the memory statistics of a backend device (free and total memory in bytes).
Buffer Management
Ggml_backend_cpu_buffer_type
func Ggml_backend_cpu_buffer_type() (GgmlBackendBufferType, error)
Returns the CPU buffer type.
Ggml_backend_buffer_free
func Ggml_backend_buffer_free(buffer GgmlBackendBuffer) error
Frees a backend buffer.
Ggml_backend_buffer_get_size
func Ggml_backend_buffer_get_size(buffer GgmlBackendBuffer) (uint64, error)
Returns the size of a backend buffer in bytes.
Ggml_backend_buffer_is_host
func Ggml_backend_buffer_is_host(buffer GgmlBackendBuffer) (bool, error)
Checks if a buffer is in host memory (RAM).
Ggml_backend_buffer_name
func Ggml_backend_buffer_name(buffer GgmlBackendBuffer) (string, error)
Returns the name of a backend buffer.
Complete Example
Here's a comprehensive example using GGML bindings:
package main
import (
"fmt"
"log"
"github.com/dianlight/gollama.cpp"
)
func main() {
// Initialize the library
if err := gollama.Backend_init(); err != nil {
log.Fatal(err)
}
defer gollama.Backend_free()
// Query type information
fmt.Println("=== Type Information ===")
types := []gollama.GgmlType{
gollama.GGML_TYPE_F32,
gollama.GGML_TYPE_F16,
gollama.GGML_TYPE_Q4_0,
gollama.GGML_TYPE_Q8_0,
}
for _, typ := range types {
// Get type size
size, err := gollama.Ggml_type_size(typ)
if err != nil {
fmt.Printf("Type %s: size unavailable\n", typ.String())
continue
}
// Check if quantized
isQuant, _ := gollama.Ggml_type_is_quantized(typ)
// Get type name
name, _ := gollama.Ggml_type_name(typ)
fmt.Printf("Type: %-10s | Size: %2d bytes | Quantized: %v | Name: %s\n",
typ.String(), size, isQuant, name)
}
// Enumerate backend devices
fmt.Println("\n=== Backend Devices ===")
count, err := gollama.Ggml_backend_dev_count()
if err != nil {
fmt.Println("Backend device enumeration not available")
return
}
if count == 0 {
fmt.Println("No backend devices available")
return
}
for i := uint64(0); i < count; i++ {
dev, err := gollama.Ggml_backend_dev_get(i)
if err != nil {
continue
}
name, err := gollama.Ggml_backend_dev_name(dev)
if err != nil {
continue
}
desc, _ := gollama.Ggml_backend_dev_description(dev)
fmt.Printf("Device %d: %s\n", i, name)
if desc != "" {
fmt.Printf(" Description: %s\n", desc)
}
// Try to get memory info (may not be supported)
free, total, err := gollama.Ggml_backend_dev_memory(dev)
if err == nil {
fmt.Printf(" Memory: %.2f MB free / %.2f MB total\n",
float64(free)/(1024*1024),
float64(total)/(1024*1024))
}
}
}
Expected Output:
=== Type Information ===
Type: f32 | Size: 4 bytes | Quantized: false | Name: f32
Type: f16 | Size: 2 bytes | Quantized: false | Name: f16
Type: q4_0 | Size: 2 bytes | Quantized: true | Name: q4_0
Type: q8_0 | Size: 1 bytes | Quantized: true | Name: q8_0
=== Backend Devices ===
Device 0: CPU
Description: CPU backend
Type Conversions
The GgmlType enum provides a String() method for easy display:
typ := gollama.GGML_TYPE_Q4_0
fmt.Println(typ.String()) // Output: q4_0
Error Handling
All GGML functions return an error that should be checked:
size, err := gollama.Ggml_type_size(gollama.GGML_TYPE_F32)
if err != nil {
// Function not available in this build or library not loaded
log.Printf("Warning: %v", err)
return
}
// Use size...
Testing
The GGML bindings include comprehensive tests in goggml_test.go:
# Run all GGML tests
go test -v -run TestGgml
# Run specific test
go test -v -run TestGgmlTypeSize
# Run benchmarks
go test -v -bench=BenchmarkGgml
Limitations
-
Optional Functions: GGML functions may not be exported in all llama.cpp builds. The library handles this gracefully by returning errors instead of panicking.
-
Platform Differences: Some functions may have different behavior or availability across platforms.
-
Build Variants: Different llama.cpp builds (CPU-only vs GPU-enabled) may export different GGML symbols.
-
Version Compatibility: GGML API may change between llama.cpp versions. Always use the version of gollama.cpp that matches your llama.cpp build.
Related Documentation
- Main README - High-level overview and quick start
- Build Guide - Building from source
- GPU Setup - GPU acceleration configuration
- API Reference - Full Go API documentation
Support
If you encounter issues with GGML bindings:
- Check that your llama.cpp build exports GGML symbols
- Verify you're using a compatible gollama.cpp version
- Report issues at: https://github.com/dianlight/gollama.cpp/issues