Chapter 3: Practical Applications
April 13, 2026 ยท View on GitHub
Welcome to Chapter 3: Practical Applications. In this part of tiktoken Tutorial: OpenAI Token Encoding & Optimization, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Use token counting to manage cost, context limits, and RAG chunking.
Cost Estimation
import tiktoken
PRICE_PER_1K = 0.0003
enc = tiktoken.encoding_for_model("gpt-4.1-mini")
prompt = "Summarize this incident timeline with actions and owners."
tokens = len(enc.encode(prompt))
estimated_cost = (tokens / 1000.0) * PRICE_PER_1K
print(tokens, round(estimated_cost, 6))
Safe Context Budgeting
MODEL_LIMIT = 128000
RESPONSE_BUDGET = 2000
prompt_tokens = len(enc.encode(prompt))
remaining = MODEL_LIMIT - RESPONSE_BUDGET - prompt_tokens
print("max_context_tokens=", max(0, remaining))
Token-Aware Chunking
def token_chunks(text: str, chunk_size: int, overlap: int):
ids = enc.encode(text)
i = 0
while i < len(ids):
window = ids[i:i + chunk_size]
yield enc.decode(window)
if i + chunk_size >= len(ids):
break
i += max(1, chunk_size - overlap)
Summary
You can now budget cost, enforce context limits, and chunk by tokens.
Next: Chapter 4: Educational Module
What Problem Does This Solve?
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for chunk_size, prompt, tokens so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 3: Practical Applications as an operating subsystem inside tiktoken Tutorial: OpenAI Token Encoding & Optimization, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around encode, tiktoken, PRICE_PER_1K as your checklist when adapting these patterns to your own repository.
How it Works Under the Hood
Under the hood, Chapter 3: Practical Applications usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
chunk_size. - Input normalization: shape incoming data so
promptreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
tokens. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Source Walkthrough
Use the following upstream sources to verify implementation details while reading this chapter:
- tiktoken repository
Why it matters: authoritative reference on
tiktoken repository(github.com).
Suggested trace strategy:
- search upstream code for
chunk_sizeandpromptto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production
Chapter Connections
- Tutorial Index
- Previous Chapter: Chapter 2: Tokenization Mechanics
- Next Chapter: Chapter 4: Educational Module
- Main Catalog
- A-Z Tutorial Directory
Source Code Walkthrough
src/lib.rs
The State interface in src/lib.rs handles a key part of this chapter's functionality:
}
struct State {
prev: usize,
end: usize,
next_end: usize,
next_rank: Rank,
cur_rank: Rank,
}
fn _byte_pair_merge_large(ranks: &HashMap<Vec<u8>, Rank>, piece: &[u8]) -> Vec<Rank> {
let mut state = Vec::with_capacity(piece.len());
state.push(State {
prev: usize::MAX,
end: 1,
next_end: 2,
next_rank: Rank::MAX,
cur_rank: Rank::MAX,
});
let mut heap = BinaryHeap::with_capacity(piece.len());
for i in 0..piece.len() - 1 {
if let Some(&rank) = ranks.get(&piece[i..i + 2]) {
heap.push(Merge { start: i, rank });
state[i].next_rank = rank;
}
// note this is happening offset by 1
state.push(State {
prev: i,
end: i + 2,
next_end: i + 3,
next_rank: Rank::MAX,
This interface is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
src/lib.rs
The FakeThreadId interface in src/lib.rs handles a key part of this chapter's functionality:
// to be hashing of two-tuples of ints, which looks like it may also be a couple percent faster.
struct FakeThreadId(NonZeroU64);
fn hash_current_thread() -> usize {
// It's easier to use unsafe than to use nightly. Rust has this nice u64 thread id counter
// that works great for our use case of avoiding collisions in our array. Unfortunately,
// it's private. However, there are only so many ways you can layout a u64, so just transmute
// https://github.com/rust-lang/rust/issues/67939
const _: [u8; 8] = [0; std::mem::size_of::<std::thread::ThreadId>()];
const _: [u8; 8] = [0; std::mem::size_of::<FakeThreadId>()];
let x = unsafe {
std::mem::transmute::<std::thread::ThreadId, FakeThreadId>(thread::current().id()).0
};
u64::from(x) as usize
}
#[derive(Debug, Clone)]
pub struct DecodeKeyError {
pub token: Rank,
}
impl std::fmt::Display for DecodeKeyError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "Invalid token for decoding: {}", self.token)
}
}
impl std::error::Error for DecodeKeyError {}
#[derive(Debug, Clone)]
pub struct DecodeError {
This interface is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
src/lib.rs
The DecodeKeyError interface in src/lib.rs handles a key part of this chapter's functionality:
#[derive(Debug, Clone)]
pub struct DecodeKeyError {
pub token: Rank,
}
impl std::fmt::Display for DecodeKeyError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "Invalid token for decoding: {}", self.token)
}
}
impl std::error::Error for DecodeKeyError {}
#[derive(Debug, Clone)]
pub struct DecodeError {
pub message: String,
}
impl std::fmt::Display for DecodeError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "Could not decode tokens: {}", self.message)
}
}
impl std::error::Error for DecodeError {}
#[derive(Debug, Clone)]
pub struct EncodeError {
pub message: String,
}
This interface is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
src/lib.rs
The DecodeError interface in src/lib.rs handles a key part of this chapter's functionality:
#[derive(Debug, Clone)]
pub struct DecodeError {
pub message: String,
}
impl std::fmt::Display for DecodeError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "Could not decode tokens: {}", self.message)
}
}
impl std::error::Error for DecodeError {}
#[derive(Debug, Clone)]
pub struct EncodeError {
pub message: String,
}
impl std::fmt::Display for EncodeError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "Could not encode string: {}", self.message)
}
}
impl std::error::Error for EncodeError {}
const MAX_NUM_THREADS: usize = 128;
#[cfg_attr(feature = "python", pyclass(frozen))]
#[derive(Clone)]
pub struct CoreBPE {
This interface is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
How These Components Connect
flowchart TD
A[State]
B[FakeThreadId]
C[DecodeKeyError]
D[DecodeError]
E[EncodeError]
A --> B
B --> C
C --> D
D --> E