Ktoken

August 14, 2024 ยท View on GitHub

Maven Central License Documentation

Ktoken is a BPE tokenizer designed for seamless integration with OpenAI's models.

๐Ÿ“ฆ Setup

Install Ktoken by adding the dependency to your build.gradle file:

repositories {
    mavenCentral()
}

dependencies {
    implementation "com.aallam.ktoken:ktoken:0.4.0"
}

โšก๏ธ Getting Started

val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE)
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4")

val tokens = tokenizer.encode("hello world")
val text = tokenizer.decode(listOf(15339, 1917))

โš™๏ธ Usage Modes

Ktoken operates in two modes: Local (default for JVM) and Remote (default for JS/Native).

๐Ÿ“ Local Mode

Utilize LocalPbeLoader to retrieve encodings from local files:

val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.SYSTEM))
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = LocalPbeLoader(FileSystem.SYSTEM))
JVM Specifics:

Artifacts for JVM include encoding files. Use FileSystem.RESOURCES to load them:

val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.RESOURCES))

Note: this is the default behavior for JVM.

๐ŸŒ Remote Mode

  1. Add Engine: Include one of Ktor's engines to your dependencies.
  2. Use RemoteBpeLoader: To load encoding from remote sources:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = RemoteBpeLoader())

// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = RemoteBpeLoader())

๐Ÿ“‹ BOM Usage

You might alternatively use ktoken-bom by adding the following dependency to your build.gradle file:

dependencies {
    // Import Kotlin API client BOM
    implementation platform('com.aallam.ktoken:ktoken-bom:0.4.0')

    // Define dependencies without versions
    implementation 'com.aallam.ktoken:ktoken'
    runtimeOnly 'io.ktor:ktor-client-okhttp'
}

๐Ÿ”€ Multiplatform Projects

For multiplatform projects, add the ktoken dependency to commonMain, and select an engine for each target.

๐Ÿ“„ License

Ktoken is open-source software and distributed under the MIT license. This project is not affiliated with nor endorsed by OpenAI.