AIKit โœจ

April 22, 2026 ยท View on GitHub


AIKit is a comprehensive platform to quickly get started to host, deploy, build and fine-tune large language models (LLMs).

AIKit offers three main capabilities:

  • Inference: AIKit uses LocalAI, which supports a wide range of inference capabilities and formats. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as Kubectl AI, Chatbot-UI and many more, to send requests to open LLMs!

  • Fine-Tuning: AIKit offers an extensible fine-tuning interface. It supports Unsloth for fast, memory efficient, and easy fine-tuning experience.

  • OCI Packaging: Package models as OCI artifacts for distribution through any OCI-compliant registry. Supports CNCF ModelPack specification and generic artifact packaging.

๐Ÿ‘‰ For full documentation, please see AIKit website!

Features

Quick Start

You can get started with AIKit quickly on your local machine without a GPU!

docker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/llama3.1:8b

After running this, navigate to http://localhost:8080/chat to access the WebUI!

API

AIKit provides an OpenAI API compatible endpoint, so you can use any OpenAI API compatible client to send requests to open LLMs!

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "llama-3.1-8b-instruct",
    "messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
  }'

Output should be similar to:

{
  // ...
    "model": "llama-3.1-8b-instruct",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of applications and services, allowing developers to focus on writing code rather than managing infrastructure."
            }
        }
    ],
  // ...
}

That's it! ๐ŸŽ‰ API is OpenAI compatible so this is a drop-in replacement for any OpenAI API compatible client.

Pre-made Models

AIKit comes with pre-made models that you can use out-of-the-box!

If it doesn't include a specific model, you can always create your own images, and host in a container registry of your choice!

CPU

Note

AIKit supports both AMD64 and ARM64 CPUs. You can run the same command on either architecture, and Docker will automatically pull the correct image for your CPU.

Depending on your CPU capabilities, AIKit will automatically select the most optimized instruction set.

ModelOptimizationParametersCommandModel NameLicense
๐Ÿฆ™ Llama 3.2Instruct1Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/llama3.2:1bllama-3.2-1b-instructLlama
๐Ÿฆ™ Llama 3.2Instruct3Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/llama3.2:3bllama-3.2-3b-instructLlama
๐Ÿฆ™ Llama 3.1Instruct8Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/llama3.1:8bllama-3.1-8b-instructLlama
๐Ÿฆ™ Llama 3.3Instruct70Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/llama3.3:70bllama-3.3-70b-instructLlama
โ“‚๏ธ MixtralInstruct8x7Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/mixtral:8x7bmixtral-8x7b-instructApache
๐Ÿ…ฟ๏ธ Phi 4Instruct14Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/phi4:14bphi-4-14b-instructMIT
๐Ÿ”ก Gemma 2Instruct2Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/gemma2:2bgemma-2-2b-instructGemma
QwQ32Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/qwq:32bqwq-32bApache 2.0
โŒจ๏ธ Codestral 0.1Code22Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/codestral:22bcodestral-22bMNLP
๐Ÿค– GPT-OSS20Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/gpt-oss:20bgpt-oss-20bApache 2.0
๐Ÿค– GPT-OSS120Bdocker run -d --rm -p 8080:8080 ghcr.io/kaito-project/aikit/gpt-oss:120bgpt-oss-120bApache 2.0

NVIDIA CUDA

Note

To enable NVIDIA GPU acceleration, please see GPU Acceleration.

Published pre-made GPU images include NVIDIA CUDA libraries. For the NVIDIA CUDA commands below, the only difference from the CPU section is the --gpus all flag.

ModelOptimizationParametersCommandModel NameLicense
๐Ÿฆ™ Llama 3.2Instruct1Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/llama3.2:1bllama-3.2-1b-instructLlama
๐Ÿฆ™ Llama 3.2Instruct3Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/llama3.2:3bllama-3.2-3b-instructLlama
๐Ÿฆ™ Llama 3.1Instruct8Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/llama3.1:8bllama-3.1-8b-instructLlama
๐Ÿฆ™ Llama 3.3Instruct70Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/llama3.3:70bllama-3.3-70b-instructLlama
โ“‚๏ธ MixtralInstruct8x7Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/mixtral:8x7bmixtral-8x7b-instructApache
๐Ÿ…ฟ๏ธ Phi 4Instruct14Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/phi4:14bphi-4-14b-instructMIT
๐Ÿ”ก Gemma 2Instruct2Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/gemma2:2bgemma-2-2b-instructGemma
QwQ32Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/qwq:32bqwq-32bApache 2.0
โŒจ๏ธ Codestral 0.1Code22Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/codestral:22bcodestral-22bMNLP
๐Ÿ“ธ Flux 1 DevText to image12Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/flux1:devflux-1-devFLUX.1 [dev] Non-Commercial License
๐Ÿค– GPT-OSS20Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/gpt-oss:20bgpt-oss-20bApache 2.0
๐Ÿค– GPT-OSS120Bdocker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/gpt-oss:120bgpt-oss-120bApache 2.0

AMD ROCm (experimental)

Note

AMD GPU acceleration is currently available for custom llama-cpp images built with runtime: rocm. Published pre-made model images are currently CUDA-based, so for AMD GPUs please create your own image and follow the ROCm instructions in GPU Acceleration.

ROCm support currently applies to the llama-cpp backend on linux/amd64.

Apple Silicon (experimental)

Note

To enable GPU acceleration on Apple Silicon, please see Podman Desktop documentation. For more information, please see GPU Acceleration.

Apple Silicon is an experimental runtime and it may change in the future. This runtime is specific to Apple Silicon only, and it will not work as expected on other architectures, including Intel Macs.

Only gguf models are supported on Apple Silicon.

ModelOptimizationParametersCommandModel NameLicense
๐Ÿฆ™ Llama 3.2Instruct1Bpodman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/kaito-project/aikit/applesilicon/llama3.2:1bllama-3.2-1b-instructLlama
๐Ÿฆ™ Llama 3.2Instruct3Bpodman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/kaito-project/aikit/applesilicon/llama3.2:3bllama-3.2-3b-instructLlama
๐Ÿฆ™ Llama 3.1Instruct8Bpodman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/kaito-project/aikit/applesilicon/llama3.1:8bllama-3.1-8b-instructLlama
๐Ÿ…ฟ๏ธ Phi 4Instruct14Bpodman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/kaito-project/aikit/applesilicon/phi4:14bphi-4-14b-instructMIT
๐Ÿ”ก Gemma 2Instruct2Bpodman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/kaito-project/aikit/applesilicon/gemma2:2bgemma-2-2b-instructGemma

Contributing

Want to contribute to AIKit? Check out our Contributing Guide for development setup, testing instructions, and contribution guidelines.

What's next?

๐Ÿ‘‰ For more information and how to fine tune models or create your own images, please see AIKit website!