llama.node

October 19, 2025 ยท View on GitHub

CI NPM Version NPM Downloads

An another Node binding of llama.cpp to make same API with llama.rn as much as possible.

Platform Support

  • macOS
    • arm64: CPU and Metal GPU acceleration
    • x86_64: CPU only
  • Windows (x86_64 and arm64)
    • CPU
    • GPU acceleration via Vulkan
    • GPU acceleration via CUDA (x86_64)
  • Linux (x86_64 and arm64)
    • CPU
    • GPU acceleration via Vulkan
    • GPU acceleration via CUDA

Installation

npm install @fugood/llama.node

Usage

import { loadModel } from '@fugood/llama.node'

// Initial a Llama context with the model (may take a while)
const context = await loadModel({
  model: 'path/to/gguf/model',
  n_ctx: 2048,
  n_gpu_layers: 99, // > 0: enable GPU
  // lib_variant: 'vulkan', // Change backend
})

// Do completion
const { text } = await context.completion(
  {
    prompt: 'This is a conversation between user and llama, a friendly chatbot. respond in simple markdown.\n\nUser: Hello!\nLlama:',
    n_predict: 100,
    stop: ['</s>', 'Llama:', 'User:'],
    // n_threads: 4,
  },
  (data) => {
    // This is a partial completion callback
    const { token } = data
  },
)
console.log('Result:', text)

Lib Variants

  • default: General usage, not support GPU except macOS (Metal)
  • vulkan: Support GPU Vulkan (Windows/Linux), but some scenario might unstable
  • cuda: Support GPU CUDA (Windows/Linux), but only for limited capability

    Linux: (x86_64: 8.9, arm64: 8.7) Windows: x86_64 - 12.0

License

MIT


Built and maintained by BRICKS.