README.md

March 3, 2026 · View on GitHub

Cross-Platform High-Level LLM Library

LlamaLib is a high-level C++ and C# library for running Large Language Models (LLMs) anywhere - from PCs to mobile devices and VR headsets.
It is built on top of the awesome llama.cpp library.

At a glance

✅ High-Level API
C++ and C# implementations with intuitive object-oriented design.
📦 Self-Contained and Embedded
Runs embedded within your application.
No need for a separate server, open ports or external processes.
Zero external dependencies.
🌍 Runs Anywhere
Cross-platform and cross-device.
Works on all major platforms:
- Desktop: Windows, macOS, Linux
- Mobile: Android, iOS
- VR/AR: Meta Quest, Apple Vision, Magic Leap
and hardware architectures:
- CPU: Intel, AMD, Apple Silicon
- GPU: NVIDIA, AMD, Metal
🔍 Architecture Detection at runtime
Automatically selects the optimal backend at runtime supporting all major GPU and CPU architectures.
💾 Small footprint
Integration requires around 100 MB for CPU architectures and offers GPU support with 70MB (Vulkan) / 370 MB (tinyBLAS) / 1.3 GB (cuBLAS).
🛠️ Production ready
Designed for easy integration into C++ and C# applications.
Supports both local and client-server deployment.

Why LlamaLib?

Developer API

Direct implementation of LLM operations (completion, tokenization, embeddings)
Clean architecture for services, clients, and agents
Simple server-client setup with built-in SSL and authentication support

Universal Deployment

The only library that lets you build for any hardware with runtime detection unlike alternatives limited to specific GPU vendors or CPU-only execution
GPU backend auto-selection: Automatically chooses NVIDIA, AMD, Metal or switch to CPU
CPU optimization: Identifies and uses optimal CPU instruction sets

Production Ready

Embedded deployment: No need for open ports or external processes
Small footprint: Compact builds ideal for PC or mobile deployment
Battle-tested: Powers LLM for Unity, the most widely used LLM integration for games

How to help

⭐ Star the repo and spread the word!
❤️ Sponsor development or support with a
💬 Join our Discord community
🐛 Contribute with feature requests, bug reports, or pull requests

Projects using LlamaLib

LLM for Unity: The most widely used solution to integrate LLMs in games

Quick Start

Documentation

Language Guides:

C++: API guide • Examples
C#: API guide • Examples

Core classes

LlamaLib provides three main classes for different use cases:

Class	Purpose	Best For
LLMService	LLM backend engine	Building standalone apps or servers
LLMClient	Local or remote LLM access	Connecting to existing LLM services
LLMAgent	Conversational AI with memory	Building chatbots or interactive AI

C++ Example

#include "LlamaLib.h"

int main() {
    // LlamaLib automatically detects your hardware and selects optimal backend
    LLMService llm("path/to/model.gguf");
    /* Optional parameters:
       threads=-1,     // CPU threads (-1 = auto)
       gpu_layers=0,   // GPU layers (0 = CPU only)
       num_slots=1     // parallel slots/clients
    */
    
    // Start service
    llm.start();
    
    // Generate completion
    std::string response = llm.completion("Hello, how are you?");
    std::cout << response << std::endl;
    
    // Supports streaming operation to your function:
    // llm.completion(prompt, streaming_callback);
    
    return 0;
}

📖 See the C++ guide for installation, building, and complete API reference.

C# Example

using LlamaLib;

class Program {
    static void Main() {
        // Same API, different language
        LLMService llm = new LLMService("path/to/model.gguf");
        /* Optional parameters:
           threads=-1,     // CPU threads (-1 = auto)
           gpu_layers=0,   // GPU layers (0 = CPU only)
           num_slots=1     // parallel slots/clients
        */
        
        llm.Start();
        
        string response = llm.Completion("Hello, how are you?");
        Console.WriteLine(response);
        
        // Supports streaming operation to your function:
        // llm.Completion(prompt, streamingCallback);
    }
}

📖 See the C# guide for installation, NuGet setup, and complete API reference.

License

LlamaLib is licensed under the Apache 2.0.