README.md

March 3, 2026 ยท View on GitHub

Cross-Platform High-Level LLM Library

License: Apache Reddit LinkedIn GitHub Repo stars Documentation llama.cpp

LlamaLib is a high-level C++ and C# library for running Large Language Models (LLMs) anywhere - from PCs to mobile devices and VR headsets.
It is built on top of the awesome llama.cpp library.


At a glance

  • โœ… High-Level API
    C++ and C# implementations with intuitive object-oriented design.

  • ๐Ÿ“ฆ Self-Contained and Embedded
    Runs embedded within your application.
    No need for a separate server, open ports or external processes.
    Zero external dependencies.

  • ๐ŸŒ Runs Anywhere
    Cross-platform and cross-device.
    Works on all major platforms:

    • Desktop: Windows, macOS, Linux
    • Mobile: Android, iOS
    • VR/AR: Meta Quest, Apple Vision, Magic Leap

    and hardware architectures:

    • CPU: Intel, AMD, Apple Silicon
    • GPU: NVIDIA, AMD, Metal
  • ๐Ÿ” Architecture Detection at runtime
    Automatically selects the optimal backend at runtime supporting all major GPU and CPU architectures.

  • ๐Ÿ’พ Small footprint
    Integration requires around 100 MB for CPU architectures and offers GPU support with 70MB (Vulkan) / 370 MB (tinyBLAS) / 1.3 GB (cuBLAS).

  • ๐Ÿ› ๏ธ Production ready
    Designed for easy integration into C++ and C# applications.
    Supports both local and client-server deployment.


Why LlamaLib?

Developer API

  • Direct implementation of LLM operations (completion, tokenization, embeddings)
  • Clean architecture for services, clients, and agents
  • Simple server-client setup with built-in SSL and authentication support

Universal Deployment

  • The only library that lets you build for any hardware with runtime detection unlike alternatives limited to specific GPU vendors or CPU-only execution
  • GPU backend auto-selection: Automatically chooses NVIDIA, AMD, Metal or switch to CPU
  • CPU optimization: Identifies and uses optimal CPU instruction sets

Production Ready

  • Embedded deployment: No need for open ports or external processes
  • Small footprint: Compact builds ideal for PC or mobile deployment
  • Battle-tested: Powers LLM for Unity, the most widely used LLM integration for games

How to help

  • โญ Star the repo and spread the word!
  • โค๏ธ Sponsor development or support with a Ko-fi
  • ๐Ÿ’ฌ Join our Discord community
  • ๐Ÿ› Contribute with feature requests, bug reports, or pull requests

Projects using LlamaLib

  • LLM for Unity: The most widely used solution to integrate LLMs in games

Quick Start

Documentation

Language Guides:

Core classes

LlamaLib provides three main classes for different use cases:

ClassPurposeBest For
LLMServiceLLM backend engineBuilding standalone apps or servers
LLMClientLocal or remote LLM accessConnecting to existing LLM services
LLMAgentConversational AI with memoryBuilding chatbots or interactive AI

C++ Example

#include "LlamaLib.h"

int main() {
    // LlamaLib automatically detects your hardware and selects optimal backend
    LLMService llm("path/to/model.gguf");
    /* Optional parameters:
       threads=-1,     // CPU threads (-1 = auto)
       gpu_layers=0,   // GPU layers (0 = CPU only)
       num_slots=1     // parallel slots/clients
    */
    
    // Start service
    llm.start();
    
    // Generate completion
    std::string response = llm.completion("Hello, how are you?");
    std::cout << response << std::endl;
    
    // Supports streaming operation to your function:
    // llm.completion(prompt, streaming_callback);
    
    return 0;
}

๐Ÿ“– See the C++ guide for installation, building, and complete API reference.

C# Example

using LlamaLib;

class Program {
    static void Main() {
        // Same API, different language
        LLMService llm = new LLMService("path/to/model.gguf");
        /* Optional parameters:
           threads=-1,     // CPU threads (-1 = auto)
           gpu_layers=0,   // GPU layers (0 = CPU only)
           num_slots=1     // parallel slots/clients
        */
        
        llm.Start();
        
        string response = llm.Completion("Hello, how are you?");
        Console.WriteLine(response);
        
        // Supports streaming operation to your function:
        // llm.Completion(prompt, streamingCallback);
    }
}

๐Ÿ“– See the C# guide for installation, NuGet setup, and complete API reference.


License

LlamaLib is licensed under the Apache 2.0.