Kronk
May 23, 2026 · View on GitHub

Copyright 2025-2026 Ardan Labs
Kronk
This project lets you use Go for hardware accelerated local inference with llama.cpp and whisper.cpp directly integrated into your Go applications via the yzma and bucky modules. Kronk provides a high-level API that feels similar to using an OpenAI compatible API.
This project also provides a model server for chat completions, responses, messages, embeddings, reranking, and audio transcription. The server is compatible with OpenWebUI, OpenCode, and the Claude Code project.
To see all the documentation, clone the project and run the Kronk Model Server:
$ make kronk-server
$ make website
You can also install Kronk, run the Kronk Model Server, and open the browser to localhost:11435
On macOS or Linux with Homebrew:
$ brew tap ardanlabs/kronk
$ brew install kronk
$ kronk server start
Or with Go:
$ go install github.com/ardanlabs/kronk/cmd/kronk@latest
$ kronk server start
Read the Manual to learn more about running the Kronk Model Server.
Project Status
Sometimes there are breaking changes to llama.cpp that require an update to yzma and Kronk. Here are some of the known compatible versions:
As of May 15th, 2026 please use version b9163 until we can fix the problems with b9165+
You can use this environment variable: export KRONK_LIB_VERSION=b9163
| llama.cpp | yzma | kronk |
|---|---|---|
| b8864 | v1.12.0 | 1.23.1 |
| b8865+ | v1.13.0 | 1.23.2 |
| b9180+ | v1.14.0 | 1.25.8 |
Owner Information
Name: Bill Kennedy
Company: Ardan Labs
Title: Managing Partner
Email: bill@ardanlabs.com
BlueSky: https://bsky.app/profile/goinggo.net
LinkedIn: www.linkedin.com/in/william-kennedy-5b318778/
Twitter: https://x.com/goinggodotnet
Install Kronk
The recommended way to install Kronk on macOS or Linux is with Homebrew:
$ brew tap ardanlabs/kronk
$ brew install kronk
$ kronk --help
To upgrade later:
$ brew upgrade kronk
You can also install via Go on any supported platform:
$ go install github.com/ardanlabs/kronk/cmd/kronk@latest
$ kronk --help
Issues/Features
Here is the existing Issues/Features for the project and the things being worked on or things that would be nice to have.
If you are interested in helping in any way, please send an email to Bill Kennedy.
Architecture
The architecture of Kronk is designed to be simple and scalable.
Watch this video to learn more about the project and the architecture.
SDK
The Kronk SDK allows you to write applications that can diectly interact with local open source GGUF models (supported by llama.cpp) that provide inference for text and media (vision and audio). The Bucky SDK provides the same surface for speech-to-text via whisper.cpp — see the Bucky chapter.
Check out the examples section below.
Models
Kronk uses models in the GGUF format supported by llama.cpp. You can find many models in GGUF format on Hugging Face (over 147k at last count):
models?library=gguf&sort=trending
Support
Kronk currently has support for over 94% of llama.cpp functionality thanks to yzma. See the yzma ROADMAP.md for the complete list.
You can use multimodal models (image/audio) and text language models with full hardware acceleration on Linux, on macOS, and on Windows.
| OS | CPU | GPU |
|---|---|---|
| Linux | amd64, arm64 | CUDA, Vulkan, HIP, ROCm, SYCL |
| macOS | arm64 | Metal |
| Windows | amd64 | CUDA, Vulkan, HIP, SYCL, OpenCL |
Whenever there is a new release of llama.cpp, the tests for yzma are run automatically. Kronk runs tests once a day and will check for updates to llama.cpp. This helps us stay up to date with the latest code and models.
API Examples
There are examples in the examples direction:
The first time you run these programs the system will download and install the model and libraries.
AGENT - This example shows you how to write a small coding agent.
make example-agent
AUDIO - This example shows you how to execute a simple prompt against an audio model.
make example-audio
BUCKY - This example shows you how to transcribe an audio file with the bucky SDK (whisper.cpp under the hood). See the manual chapter Bucky (Audio Transcription) for the full subsystem reference.
make example-bucky
CHAT - This example shows you how to chat with the chat-completion api.
make example-chat
CONCURRENCY - This example shows you how to leverage concurrency using vision models.
make example-concurrency
EMBEDDING - This example shows you a basic program using Kronk to perform an embedding operation.
make example-embedding
GRAMMAR - This example shows how to use GBNF grammars to constrain model output.
make example-grammar
POOL - This example shows you how to use the pool package to manage multipl models in memory at the same time.
make example-pool
QUESTION - This example shows you how to ask a simple question with the chat-completion api.
make example-question
RAG - This example shows you a complete RAG application.
make example-rag
RERANK - This example shows you how to use a rerank model.
make example-rerank
RESPONSE - This example shows you how to chat with the response api.
make example-question
VISION - This example shows you how to execute a simple prompt against a vision model.
make example-vision
YZMA - This example shows you how to use the yzma api at it's basic level.
make example-yzma
You can find more examples in the ArdanLabs AI training repo at Example13.
Sample API Program - Question Example
package main
import (
"context"
"fmt"
"os"
"time"
"github.com/ardanlabs/kronk/sdk/kronk"
"github.com/ardanlabs/kronk/sdk/kronk/model"
"github.com/ardanlabs/kronk/sdk/tools/defaults"
"github.com/ardanlabs/kronk/sdk/tools/libs"
"github.com/ardanlabs/kronk/sdk/tools/models"
)
const modelSource = "unsloth/Qwen3-0.6B-Q8_0"
func main() {
if err := run(); err != nil {
fmt.Printf("\nERROR: %s\n", err)
os.Exit(1)
}
}
func run() error {
mp, err := installSystem()
if err != nil {
return fmt.Errorf("unable to installation system: %w", err)
}
krn, err := newKronk(mp)
if err != nil {
return fmt.Errorf("unable to init kronk: %w", err)
}
defer func() {
fmt.Println("\nUnloading Kronk")
if err := krn.Unload(context.Background()); err != nil {
fmt.Printf("failed to unload model: %v", err)
}
}()
if err := question(krn); err != nil {
fmt.Println(err)
}
return nil
}
func installSystem() (models.Path, error) {
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Minute)
defer cancel()
libs, err := libs.New(
libs.WithVersion(defaults.LibVersion("")),
)
if err != nil {
return models.Path{}, err
}
if _, err := libs.Download(ctx, kronk.FmtLogger); err != nil {
return models.Path{}, fmt.Errorf("unable to install llama.cpp: %w", err)
}
// -------------------------------------------------------------------------
mdls, err := models.New()
if err != nil {
return models.Path{}, fmt.Errorf("unable to init models: %w", err)
}
mp, err := mdls.Download(ctx, kronk.FmtLogger, modelSource)
if err != nil {
return models.Path{}, fmt.Errorf("unable to install model: %w", err)
}
return mp, nil
}
func newKronk(mp models.Path) (*kronk.Kronk, error) {
fmt.Println("loading model...")
if err := kronk.Init(); err != nil {
return nil, fmt.Errorf("unable to init kronk: %w", err)
}
krn, err := kronk.New(
model.WithModelFiles(mp.ModelFiles),
)
if err != nil {
return nil, fmt.Errorf("unable to create inference model: %w", err)
}
fmt.Print("- system info:\n\t")
for k, v := range krn.SystemInfo() {
fmt.Printf("%s:%v, ", k, v)
}
fmt.Println()
fmt.Println("- contextWindow :", krn.ModelConfig().ContextWindow())
fmt.Printf("- k/v : %s/%s\n", krn.ModelConfig().CacheTypeK, krn.ModelConfig().CacheTypeV)
fmt.Println("- flashAttention :", krn.ModelConfig().FlashAttention)
fmt.Println("- nBatch :", krn.ModelConfig().NBatch())
fmt.Println("- nuBatch :", krn.ModelConfig().NUBatch())
fmt.Println("- modelType :", krn.ModelInfo().Type)
fmt.Println("- isGPT :", krn.ModelInfo().IsGPTModel)
fmt.Println("- template :", krn.ModelInfo().Template.FileName)
fmt.Println("- grammar :", krn.ModelConfig().DefaultParams.Grammar != "")
fmt.Println("- nSeqMax :", krn.ModelConfig().NSeqMax())
fmt.Println("- vramTotal :", krn.ModelInfo().VRAMTotal/(1024*1024), "MiB")
fmt.Println("- slotMemory :", krn.ModelInfo().SlotMemory/(1024*1024), "MiB")
fmt.Println("- modelSize :", krn.ModelInfo().Size/(1000*1000), "MB")
fmt.Println("- imc :", krn.ModelConfig().IncrementalCache())
if n := krn.ModelConfig().PtrNGpuLayers; n != nil {
fmt.Println("- nGPULayers :", *n)
} else {
fmt.Println("- nGPULayers : all")
}
return krn, nil
}
func question(krn *kronk.Kronk) error {
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
defer cancel()
question := "Hello model"
fmt.Println()
fmt.Println("QUESTION:", question)
fmt.Println()
d := model.D{
"messages": model.DocumentArray(
model.TextMessage(model.RoleUser, question),
),
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"max_tokens": 2048,
}
ch, err := krn.ChatStreaming(ctx, d)
if err != nil {
return fmt.Errorf("chat streaming: %w", err)
}
// -------------------------------------------------------------------------
var reasoning bool
for resp := range ch {
switch resp.Choices[0].FinishReason() {
case model.FinishReasonError:
return fmt.Errorf("error from model: %s", resp.Choices[0].Delta.Content)
case model.FinishReasonStop:
return nil
default:
if resp.Choices[0].Delta.Reasoning != "" {
reasoning = true
fmt.Printf("\u001b[91m%s\u001b[0m", resp.Choices[0].Delta.Reasoning)
continue
}
if reasoning {
reasoning = false
fmt.Println()
continue
}
fmt.Printf("%s", resp.Choices[0].Delta.Content)
}
}
return nil
}
This example can produce the following output:
$ make example-question
go run examples/question/main.go
download-libraries: check libraries version information: arch[arm64] os[darwin] processor[cpu]
download-libraries: check llama.cpp installation: arch[arm64] os[darwin] processor[cpu] latest[b8189] current[b8189]
download-libraries: already installed: latest[b8189] current[b8189]
download-model: model-url[https://huggingface.co/unsloth/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf] proj-url[] model-id[Qwen3-0.6B-Q8_0]:
download-model: waiting to check model status...:
download-model: model already exists:
loading model...
- system info:
ACCELERATE:on, REPACK:on, MTL:EMBED_LIBRARY, CPU:NEON, ARM_FMA:on, FP16_VA:on, DOTPROD:on, LLAMAFILE:on,
- contextWindow: 8196
- k/v : q8_0/q8_0
- nBatch : 2048
- nuBatch : 512
- modelType : dense
- isGPT : false
- template : tokenizer.chat_template
QUESTION: Hello model
Okay, the user just said "Hello model." I need to respond appropriately. Since I'm an AI assistant, my initial response is friendly and helpful. Let me start by acknowledging their greeting. I should make sure to use a friendly tone and offer assistance. Maybe add something about being here to help with anything they need. Keep it simple and conversational. Let me check if there's any additional context needed, but since they just said hello, a basic reply should suffice.
! How can I assist you today? 😊
Unloading Kronk
Travel Schedule
Come find me in any of these cities or events this year. I will be giving workshops and talks about Kronk
| Dates | Event | Location | Comments |
|---|---|---|---|
| Jan 29th - 2nd | AI Plumbers Fringe, FOSDEM | Brussels, Belgium | Talk |
| Mar 4th - 5th | Ardan Connect | São Paulo, Brazil | Workshop |
| Apr 20th - 25th | Gophercamp 2026 | Brno, Czech Republic | Workshop, Talk |
| Apr 27th - 29th | AI Dev 26 | San Francisco, USA | Attendee |
| May 17th - 23rd | Gophercon Signapore | Singapore | Workshop, Talk |
| Jun 1st - 5th | Crusoe Corporate Training | San Francisco, USA | Workshop |
| Jun 8th - 12th | Genetec Corporate Training | Montreal, Canada | Workshop |
| Jun 14th - 19th | GopherCon EU | Berlin, Germany | Workshop, Talk |
| JULY | Summer Vacation | Huntsville, AL | Rest |
| Aug 3rd - 6th | GopherCon USA | Seattle, Washington | Workshop, Talk |
| Aug 11th - 13th | GopherCon UK | London, England | Workshop, Talk |
| Sep 1st - 4th | GopherCon LATAM | Florianópolis, Brazil | Workshop, Talk |
| Oct 12th - 18th | GopherCon Africa | Kenya, East Africa | Workshop, Talk |
| Oct 29th - 4th | GoLab (GopherCon Italy) | Bologna, Italy | Workshop, Talk |