README.md

June 9, 2026 · View on GitHub

llm4s

Experimental Scala 3 bindings for llama.cpp using Slinc.

Setup

Add llm4s to your build.sbt:

libraryDependencies += "com.donderom" %% "llm4s" % "0.16.0-b8608"

For JDK 17 add .jvmopts file in the project root:

--add-modules=jdk.incubator.foreign
--enable-native-access=ALL-UNNAMED

Compatibility

Scala: 3.3.0
JDK: 17 or 19
llama.cpp: The version suffix refers to the latest supported llama.cpp release (e.g. version 0.16.0-b8608 means that it supports the b8608 release). The newer releases are usually supported as well, provided there are no API changes.

Older versions

llm4s	Scala	JDK	llama.cpp (commit hash)
0.11+	3.3.0	17, 19	229ffff (May 8, 2024)
0.10+	3.3.0	17, 19	49e7cb5 (Jul 31, 2023)
0.6+	3.3.0-RC3	---	49e7cb5 (Jul 31, 2023)
0.4+	3.3.0-RC3	---	70d26ac (Jul 23, 2023)
0.3+	3.3.0-RC3	---	a6803ca (Jul 14, 2023)
0.1+	3.3.0-RC3	17, 19	447ccbe (Jun 25, 2023)

Usage

import java.nio.file.Paths
import com.donderom.llm4s.*

// Path to the llama.cpp shared library
System.load("./build/bin/libllama.dylib")

// Path to the model supported by llama.cpp
val model = Paths.get("Llama-3.2-3B-Instruct-Q6_K.gguf")
val prompt = "What is LLM?"

Completion

val llm = Llm(model)

// To print generation as it goes
llm(prompt).foreach: stream =>
  stream.foreach: token =>
    print(token)

// Or build a string
llm(prompt).foreach(stream => println(stream.mkString))

llm.close()

Embeddings

val llm = Llm(model)
llm.embeddings(prompt).foreach: embeddings =>
  embeddings.foreach: embd =>
    print(embd)
    print(' ')
llm.close()

Self-contained Scala CLI example (with basic Llama 3 model):

Run.scala:

//> using scala 3.3.0
//> using jvm adoptium:17
//> using java-opt --add-modules=jdk.incubator.foreign
//> using java-opt --enable-native-access=ALL-UNNAMED
//> using dep com.donderom::llm4s:0.16.0-b8608

import com.donderom.llm4s.Llm
import java.nio.file.Paths
import scala.util.Using

object Main extends App:
  System.load("./build/bin/libllama.dylib")
  val model = Paths.get("Llama-3.2-3B-Instruct-Q6_K.gguf")
  val prompt = "What is LLM?"
  Using(Llm(model)): llm =>         // llm : com.donderom.llm4s.Llm
    llm(prompt).foreach: stream =>  // stream : LazyList[String]
      stream.foreach: token =>      // token : String
        print(token)

scala-cli Run.scala

Self-contained Scala CLI example (with configured gpt-oss model):

Run.scala:

//> using scala 3.3.0
//> using jvm adoptium:17
//> using java-opt --add-modules=jdk.incubator.foreign
//> using java-opt --enable-native-access=ALL-UNNAMED
//> using dep com.donderom::llm4s:0.16.0-b8608

import com.donderom.llm4s.{ContextParams, FlashAttention, Llm, LlmParams}
import java.nio.file.Paths
import scala.util.Using

object Main extends App:
  System.load("./build/bin/libllama.dylib")
  val model = Paths.get("gpt-oss-20b-mxfp4.gguf")
  val prompt = "What is LLM?"
  // Use Flash attention and context size provided by the model
  val params = LlmParams(context = ContextParams(flashAttention = FlashAttention.On))
  Using(Llm(model)): llm =>                // llm : com.donderom.llm4s.Llm
   llm(prompt, params).foreach: stream =>  // stream : LazyList[String]
      stream.foreach: token =>             // token : String
        print(token)

scala-cli Run.scala