llama.node

October 19, 2025 · View on GitHub

NPM Downloads

An another Node binding of llama.cpp to make same API with llama.rn as much as possible.

llama.cpp: Inference of LLaMA model in pure C/C++
llama.rn: React Native binding of llama.cpp

Platform Support

macOS
- arm64: CPU and Metal GPU acceleration
- x86_64: CPU only
Windows (x86_64 and arm64)
- CPU
- GPU acceleration via Vulkan
- GPU acceleration via CUDA (x86_64)
Linux (x86_64 and arm64)
- CPU
- GPU acceleration via Vulkan
- GPU acceleration via CUDA

Installation

npm install @fugood/llama.node

Usage

import { loadModel } from '@fugood/llama.node'

// Initial a Llama context with the model (may take a while)
const context = await loadModel({
  model: 'path/to/gguf/model',
  n_ctx: 2048,
  n_gpu_layers: 99, // > 0: enable GPU
  // lib_variant: 'vulkan', // Change backend
})

// Do completion
const { text } = await context.completion(
  {
    prompt: 'This is a conversation between user and llama, a friendly chatbot. respond in simple markdown.\n\nUser: Hello!\nLlama:',
    n_predict: 100,
    stop: ['</s>', 'Llama:', 'User:'],
    // n_threads: 4,
  },
  (data) => {
    // This is a partial completion callback
    const { token } = data
  },
)
console.log('Result:', text)

Lib Variants

default: General usage, not support GPU except macOS (Metal)
vulkan: Support GPU Vulkan (Windows/Linux), but some scenario might unstable
cuda: Support GPU CUDA (Windows/Linux), but only for limited capability

Linux: (x86_64: 8.9, arm64: 8.7) Windows: x86_64 - 12.0

License

MIT

Built and maintained by BRICKS.